Enhanced Robustness for Speech Emotion Recognition: Combining Acoustic and Linguistic Information

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Affective computing is the area of Artificial Intillegence studies which focuses on the design and development of intelligent devices which can perceive, process and synthesize human emotions. Humans can interpret emotions in a number of different ways, for example, processing spoken utterances, non-verbal cues, facial expressions and also written communication. Changes in our nervous system indirectly alter spoken utterances which makes it possible for people to perceive how others feel by listening to them speak. These changes can also be interpreted by machines through the extraction of speech features. The field of speech emotion recognition (SER) takes advantage of this capability and has subsequently offered many approaches to recognize affect in spoken utterances. The majority of state of the art SER systems employ complex statistical algorithms to model the relationship between acoustic parameters extracted from spoken language. Studies also show that phrases, word senses and syntactic relations that convey linguistic attributes of a language play an important role in enhancing the prediction rates. Our research focuses on this problem of recognizing affect in spoken utterances and offers a contribution to state of the art systems with linguistic knowledge to enhance its efficiency instead of relying only on speech utterances. In this work, speech emotion recognition system is developed for Amharic language based on acoustic and linguistic features. The classification performance is based on extracted features. We used a baseline set of 384 acoustic features and for linguistic analysis techniques from text we used key word spotting, negation handling and sentiment analysis with emotion generation rules. Combining those features, we achieved an accuracy of 64..2% in identifying Happiness, Surprise, Anger, Sadness, Fear, Disgust and Neutral emotions.



Speech Emotion Recognition, Acoustic Features, Linguistic Features, Feature Extraction, Feature Selection, Classification