Enhanced Robustness for Speech Emotion Recognition: Combining Acoustic and Linguistic Information

Sinishaw, Hana

Enhanced Robustness for Speech Emotion Recognition: Combining Acoustic and Linguistic Information

Files

Hana Sinishaw 2017.pdf (98.28 KB)

Date

10/5/2017

Authors

Sinishaw, Hana

Publisher

Addis Ababa University

Abstract

Affective computing is the area of Artificial Intillegence studies which focuses on the design and development of intelligent devices which can perceive, process and synthesize human emotions. Humans can interpret emotions in a number of different ways, for example, processing spoken utterances, non-verbal cues, facial expressions and also written communication. Changes in our nervous system indirectly alter spoken utterances which makes it possible for people to perceive how others feel by listening to them speak. These changes can also be interpreted by machines through the extraction of speech features. The field of speech emotion recognition (SER) takes advantage of this capability and has subsequently offered many approaches to recognize affect in spoken utterances. The majority of state of the art SER systems employ complex statistical algorithms to model the relationship between acoustic parameters extracted from spoken language. Studies also show that phrases, word senses and syntactic relations that convey linguistic attributes of a language play an important role in enhancing the prediction rates. Our research focuses on this problem of recognizing affect in spoken utterances and offers a contribution to state of the art systems with linguistic knowledge to enhance its efficiency instead of relying only on speech utterances. In this work, speech emotion recognition system is developed for Amharic language based on acoustic and linguistic features. The classification performance is based on extracted features. We used a baseline set of 384 acoustic features and for linguistic analysis techniques from text we used key word spotting, negation handling and sentiment analysis with emotion generation rules. Combining those features, we achieved an accuracy of 64..2% in identifying Happiness, Surprise, Anger, Sadness, Fear, Disgust and Neutral emotions.

Keywords

Speech Emotion Recognition, Acoustic Features, Linguistic Features, Feature Extraction, Feature Selection, Classification

URI

http://etd.aau.edu.et/handle/123456789/18207

Collections

Computer Science

Full item page

Enhanced Robustness for Speech Emotion Recognition: Combining Acoustic and Linguistic Information

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections