Enhanced Robustness for Speech Emotion Recognition: Combining Acoustic and Linguistic Information
No Thumbnail Available
Date
2017-10-05
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Addis Ababa University
Abstract
Affective computing is the area of Artificial Intillegence studies which focuses on the design and
development of intelligent devices which can perceive, process and synthesize human emotions.
Humans can interpret emotions in a number of different ways, for example, processing spoken
utterances, non-verbal cues, facial expressions and also written communication. Changes in our
nervous system indirectly alter spoken utterances which makes it possible for people to perceive
how others feel by listening to them speak. These changes can also be interpreted by machines
through the extraction of speech features. The field of speech emotion recognition (SER) takes
advantage of this capability and has subsequently offered many approaches to recognize affect in
spoken utterances.
The majority of state of the art SER systems employ complex statistical algorithms to model the
relationship between acoustic parameters extracted from spoken language. Studies also show that
phrases, word senses and syntactic relations that convey linguistic attributes of a language play
an important role in enhancing the prediction rates. Our research focuses on this problem of
recognizing affect in spoken utterances and offers a contribution to state of the art systems with
linguistic knowledge to enhance its efficiency instead of relying only on speech utterances. In
this work, speech emotion recognition system is developed for Amharic language based on
acoustic and linguistic features.
The classification performance is based on extracted features. We used a baseline set of 384
acoustic features and for linguistic analysis techniques from text we used key word spotting,
negation handling and sentiment analysis with emotion generation rules. Combining those
features, we achieved an accuracy of 64..2% in identifying Happiness, Surprise, Anger, Sadness,
Fear, Disgust and Neutral emotions.
Description
Keywords
Speech Emotion Recognition, Acoustic Features, Linguistic Features, Feature Extraction, Feature Selection, Classification