Laboratory team places first and second in Audio/Visual Emotion Challenge
The team developed technologies to evaluate an individual's emotional state and depression level.
by Kylie Foy | Technical Communications Group
A team of MIT Lincoln Laboratory staff took honors at the Audio/Visual Emotion Challenge (AVEC) and Workshop on 16 October, an extension of the Association for Computing Machinery (ACM) International Conference on Multimedia, held in Amsterdam, Netherlands. Now in its sixth year, AVEC challenges participants to use multimedia processing and machine learning to automate the analyses of subjects' emotional state and depression level using audio, visual, and physiological data. The Laboratory team was named the first-place winner in the emotion recognition subchallenge and second-place winner in the depression estimation subchallenge during the event, at which they presented their findings along with competitors from around the world.
"In a matter of several months we went from building a team and understanding the problem to developing novel individual solutions and finally to putting it all together into an evaluation system," said team member Kevin Brady. "That whole collaborative process was the rewarding part of the effort. Placing 1st and 2nd was a wonderful validation for the team." The team consisted of staff in Bioengineering Systems and Technologies Group, Human Language Technology Group, and Intelligence and Decision Technologies Group, as well as collaborators from the University of Illinois and Harvard University. The challenge spanned three months, beginning last May when the teams acquired the data for each subchallenge.
The emotion subchallenge tasked participants with continuously predicting emotion based on two dimensions: arousal and valence. Arousal describes the emotion's energy level, while valence discerns an emotion's negative or positive energy. Anger, for example, would be correlated with high arousal and low valence. To make their assessments, the team used audio, video, and physiological data (like electrocardiogram and heart rate data) of 27 individuals, recorded during a specific interaction with a partner. With benchmarks provided for 18 of the subjects, the team had to predict arousal and valence scores for the remaining 9 at various times throughout the recordings.
Of the 13 groups competing in the emotion subchallenge, the Laboratory's approach proved to be the most successful. The team used multiple machine learning pipelines for each data channel to derive arousal and valence scores individually. "We leveraged novel approaches exploiting recent advances in machine learning, such as sparse coding and deep learning," said Brady. The team's multimodal system then fused the scores derived from each source using a Kalman Filter framework. "In the end, Kalman smoothing provided optimal fusion for different outputs of the multiple approaches implemented in our system," said team member Dr. Youngjune Gwon.
The depression subchallenge asked participants to provide a binary depressed/not depressed diagnosis based on raw speech audio, video, physiological sensors, and text transcripts from an individual's interview with a human-controlled computer avatar. Competitors' assessments were compared to subjects' clinically diagnosed depressed state based on their PHQ-8 score, an evaluation tool used by mental-health professionals. The effort built on previous Laboratory research in vocal and facial biomarkers, for which staff won 1st place in the 2013 and 2014 AVEC depression subchallenges. This year, the team introduced novel features linked to vocal projection through control of the lower vocal tract, which is affected by the neurophysiological changes caused by major depressive disorder.
For the team, the impact of the text transcript analyses proved to be both impressive and surprising. The team conducted "semantic analyses," a method to analyze not only the meaning of words, but also their relationships in context with other words. "While semantics is an important component of human communications, capturing its subtle meaning for analyzing a person's emotional or depressive state is a difficult task. Our big breakthrough was in leveraging semantic embedding spaces as an improved representation for capturing this information by exploiting the inherent correlations and redundancies in human language," Brady said. For example, if a subject says the word "suicide," the system can correlate its significance based on if the subject also used words commonly associated with suicide, like "depressed," or "low."
Interestingly, the team was also able to exploit the interviewer as a sensor, since the semantic content of their questions included subtle insights into the depressive state of the subject. For example, if the interviewer asks the question "How have you been feeling lately?" and follows up to a subject's response with keywords like "sorry" or "tough," the system can infer some aspects of the subject's state. "Going forward, the semantic exploitation could also be useful for enabling an autonomous interviewer to better 'understand' and interact with a human subject," Brady said.
Capabilities stemming from the AVEC efforts are already transitioning to sponsors. "The Intelligence Community has a great interest in emotion because they need the ability to analyze criminals when they are questioned. Part of the code developed for the arousal portion of the challenge has been transitioned to them," Gwon said. The National Institute of Standards and Technology (NIST) is interested in how emotion and depression might impact the ability of a speaker recognition system to recognize an individual.
The AVEC effort also resulted in two publications describing the scientific findings of the emotion and depression challenges. Gwon reflected on the team's accomplishments: "Our success at AVEC feels great because what we believed and theorized was validated in a competition full of experts and renowned institutions of the world. I felt relieved at first, and still feel extremely grateful that I was a part of this great team."
Posted November 2016top of page