Publications

Refine Results

(Filters Applied) Clear All

Detecting depression using vocal, facial and semantic communication cues

Summary

Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In particular, MDD-induced neurophysiological changes are associated with a decline in dynamics and coordination of speech and facial motor control, while neurocognitive changes influence dialogue semantics. In this paper, biomarkers are derived from all of these modalities, drawing first from previously developed neurophysiologically motivated speech and facial coordination and timing features. In addition, a novel indicator of lower vocal tract constriction in articulation is incorporated that relates to vocal projection. Semantic features are analyzed for subject/avatar dialogue content using a sparse coded lexical embedding space, and for contextual clues related to the subject's present or past depression status. The features and depression classification system were developed for the 6th International Audio/Video Emotion Challenge (AVEC), which provides data consisting of audio, video-based facial action units, and transcribed text of individuals communicating with the human-controlled avatar. A clinical Patient Health Questionnaire (PHQ) score and binary depression decision are provided for each participant. PHQ predictions were obtained by fusing outputs from a Gaussian staircase regressor for each feature set, with results on the development set of mean F1=0.81, RMSE=5.31, and MAE=3.34. These compare favorably to the challenge baseline development results of mean F1=0.73, RMSE=6.62, and MAE=5.52. On test set evaluation, our system obtained a mean F1=0.70, which is similar to the challenge baseline test result. Future work calls for consideration of joint feature analyses across modalities in an effort to detect neurological disorders based on the interplay of motor, linguistic, affective, and cognitive components of communication.
READ LESS

Summary

Major depressive disorder (MDD) is known to result in neurophysiological and neurocognitive changes that affect control of motor, linguistic, and cognitive functions. MDD's impact on these processes is reflected in an individual's communication via coupled mechanisms: vocal articulation, facial gesturing and choice of content to convey in a dialogue. In...

READ MORE

Relation of automatically extracted formant trajectories with intelligibility loss and speaking rate decline in amyotrophic lateral sclerosis

Published in:
INTERSPEECH 2016: 16th Annual Conf. of the Int. Speech Communication Assoc., 8-12 September 2016.

Summary

Effective monitoring of bulbar disease progression in persons with amyotrophic lateral sclerosis (ALS) requires rapid, objective, automatic assessment of speech loss. The purpose of this work was to identify acoustic features that aid in predicting intelligibility loss and speaking rate decline in individuals with ALS. Features were derived from statistics of the first (F1) and second (F2) formant frequency trajectories and their first and second derivatives. Motivated by a possible link between components of formant dynamics and specific articulator movements, these features were also computed for low-pass and high-pass filtered formant trajectories. When compared to clinician-rated intelligibility and speaking rate assessments, F2 features, particularly mean F2 speed and a novel feature, mean F2 acceleration, were most strongly correlated with intelligibility and speaking rate, respectively (Spearman correlations > 0.70, p < 0.0001). These features also yielded the best predictions in regression experiments (r > 0.60, p < 0.0001). Comparable results were achieved using low-pass filtered F2 trajectory features, with higher correlations and lower prediction errors achieved for speaking rate over intelligibility. These findings suggest information can be exploited in specific frequency components of formant trajectories, with implications for automatic monitoring of ALS.
READ LESS

Summary

Effective monitoring of bulbar disease progression in persons with amyotrophic lateral sclerosis (ALS) requires rapid, objective, automatic assessment of speech loss. The purpose of this work was to identify acoustic features that aid in predicting intelligibility loss and speaking rate decline in individuals with ALS. Features were derived from statistics...

READ MORE

Relating estimated cyclic spectral peak frequency to measured epilarynx length using magnetic resonance imaging

Published in:
INTERSPEECH 2016: 16th Annual Conf. of the Int. Speech Communication Assoc., 8-12 September 2016.

Summary

The epilarynx plays an important role in speech production, carrying information about the individual speaker and manner of articulation. However, precise acoustic behavior of this lower vocal tract structure is difficult to establish. Focusing on acoustics observable in natural speech, recent spectral processing techniques isolate a unique resonance with characteristics of the epilarynx previously shown via simulation, specifically cyclicity (i.e. energy differences between the closed and open phases of the glottal cycle) in a 3-5kHz region observed across vowels. Using Magnetic Resonance Imaging (MRI), the present work relates this estimated cyclic peak frequency to measured epilarynx length. Assuming a simple quarter wavelength relationship, the cavity length estimated from the cyclic peak frequency is shown to be directly proportional (linear fit slope =1.1) and highly correlated (p = 0.85, pval<10^?4) to the measured epilarynx length across speakers. Results are discussed, as are implications in speech science and application domains.
READ LESS

Summary

The epilarynx plays an important role in speech production, carrying information about the individual speaker and manner of articulation. However, precise acoustic behavior of this lower vocal tract structure is difficult to establish. Focusing on acoustics observable in natural speech, recent spectral processing techniques isolate a unique resonance with characteristics...

READ MORE

A vocal modulation model with application to predicting depression severity

Published in:
13th IEEE Int. Conf. on Wearable and Implantable Body Sensor Networks, BSN 2016, 14-17 June 2016.

Summary

Speech provides a potential simple and noninvasive "on-body" means to identify and monitor neurological diseases. Here we develop a model for a class of vocal biomarkers exploiting modulations in speech, focusing on Major Depressive Disorder (MDD) as an application area. Two model components contribute to the envelope of the speech waveform: amplitude modulation (AM) from respiratory muscles, and AM from interaction between vocal tract resonances (formants) and frequency modulation in vocal fold harmonics. Based on the model framework, we test three methods to extract envelopes capturing these modulations of the third formant for synthesized sustained vowels. Using subsequent modulation features derived from the model, we predict MDD severity scores with a Gaussian Mixture Model. Performing global optimization over classifier parameters and number of principal components, we evaluate performance of the features by examining the root-mean-squared error (RMSE), mean absolute error (MAE), and Spearman correlation between the actual and predicted MDD scores. We achieved RMSE and MAE values 10.32 and 8.46, respectively (Spearman correlation=0.487, p<0.001), relative to a baseline RMSE of 11.86 and MAE of 10.05, obtained by predicting the mean MDD severity score. Ultimately, our model provides a framework for detecting and monitoring vocal modulations that could also be applied to other neurological diseases.
READ LESS

Summary

Speech provides a potential simple and noninvasive "on-body" means to identify and monitor neurological diseases. Here we develop a model for a class of vocal biomarkers exploiting modulations in speech, focusing on Major Depressive Disorder (MDD) as an application area. Two model components contribute to the envelope of the speech...

READ MORE

Assessing functional neural connectivity as an indicator of cognitive performance

Published in:
5th NIPS Workshop on Machine Learning and Interpretation in Neuroimaging, MLINI 2015, 11-12 December 2015.

Summary

Studies in recent years have demonstrated that neural organization and structure impact an individual's ability to perform a given task. Specifically, individuals with greater neural efficiency have been shown to outperform those with less organized functional structure. In this work, we compare the predictive ability of properties of neural connectivity on a working memory task. We provide two novel approaches for characterizing functional network connectivity from electroencephalography (EEG), and compare these features to the average power across frequency bands in EEG channels. Our first novel approach represents functional connectivity structure through the distribution of eigenvalues making up channel coherence matrices in multiple frequency bands. Our second approach creates a connectivity network at each frequency band, and assesses variability in average path lengths of connected components and degree across the network. Failures in digit and sentence recall on single trials are detected using a Gaussian classifier for each feature set, at each frequency band. The classifier results are then fused across frequency bands, with the resulting detection performance summarized using the area under the receiver operating characteristic curve (AUC) statistic. Fused AUC results of 0.63/0.58/0.61 for digit recall failure and 0.58/0.59/0.54 for sentence recall failure are obtained from the connectivity structure, graph variability, and channel power features respectively.
READ LESS

Summary

Studies in recent years have demonstrated that neural organization and structure impact an individual's ability to perform a given task. Specifically, individuals with greater neural efficiency have been shown to outperform those with less organized functional structure. In this work, we compare the predictive ability of properties of neural connectivity...

READ MORE

Estimating lower vocal tract features with closed-open phase spectral analyses

Published in:
INTERSPEECH 2015: 15th Annual Conf. of the Int. Speech Communication Assoc., 6-10 September 2015.

Summary

Previous studies have shown that, in addition to being speaker-dependent yet context-independent, lower vocal tract acoustics significantly impact the speech spectrum at mid-to-high frequencies (e.g 3-6kHz). The present work automatically estimates spectral features that exhibit acoustic properties of the lower vocal tract. Specifically aiming to capture the cyclicity property of the epilarynx tube, a novel multi-resolution approach to spectral analyses is presented that exploits significant differences between the closed and open phases of a glottal cycle. A prominent null linked to the piriform fossa is also estimated. Examples of the feature estimation on natural speech of the VOICES multi-speaker corpus illustrate that a salient spectral pattern indeed emerges between 3-6kHz across all speakers. Moreover, the observed pattern is consistent with that canonically shown for the lower vocal tract in previous works. Additionally, an instance of a speaker's formant (i.e. spectral peak around 3kHz that has been well-established as a characteristic of voice projection) is quantified here for the VOICES template speaker in relation to epilarynx acoustics. The corresponding peak is shown to be double the power on average compared to the other speakers (20 vs 10 dB).
READ LESS

Summary

Previous studies have shown that, in addition to being speaker-dependent yet context-independent, lower vocal tract acoustics significantly impact the speech spectrum at mid-to-high frequencies (e.g 3-6kHz). The present work automatically estimates spectral features that exhibit acoustic properties of the lower vocal tract. Specifically aiming to capture the cyclicity property of...

READ MORE

Speech enhancement using sparse convolutive non-negative matrix factorization with basis adaptation

Published in:
INTERSPEECH 2012: 13th Annual Conf. of the Int. Speech Communication Assoc., 9-13 September 2012.

Summary

We introduce a framework for speech enhancement based on convolutive non-negative matrix factorization that leverages available speech data to enhance arbitrary noisy utterances with no a priori knowledge of the speakers or noise types present. Previous approaches have shown the utility of a sparse reconstruction of the speech-only components of an observed noisy utterance. We demonstrate that an underlying speech representation which, in addition to applying sparsity, also adapts to the noisy acoustics improves overall enhancement quality. The proposed system performs comparably to a traditional Wiener filtering approach, and the results suggest that the proposed framework is most useful in moderate- to low-SNR scenarios.
READ LESS

Summary

We introduce a framework for speech enhancement based on convolutive non-negative matrix factorization that leverages available speech data to enhance arbitrary noisy utterances with no a priori knowledge of the speakers or noise types present. Previous approaches have shown the utility of a sparse reconstruction of the speech-only components of...

READ MORE

Vocal-source biomarkers for depression - a link to psychomotor activity

Published in:
INTERSPEECH 2012: 13th Annual Conf. of the Int. Speech Communication Assoc., 9-13 September 2012.

Summary

A hypothesis in characterizing human depression is that change in the brain's basal ganglia results in a decline of motor coordination. Such a neuro-physiological change may therefore affect laryngeal control and dynamics. Under this hypothesis, toward the goal of objective monitoring of depression severity, we investigate vocal-source biomarkers for depression; specifically, source features that may relate to precision in motor control, including vocal-fold shimmer and jitter, degree of aspiration, fundamental frequency dynamics, and frequency-dependence of variability and velocity of energy. We use a 35-subject database collected by Mundt et al. in which subjects were treated over a six-week period, and investigate correlation of our features with clinical (HAMD), as well as self-reported (QIDS) Total subject assessment scores. To explicitly address the motor aspect of depression, we compute correlations with the Psychomotor Retardation component of clinical and self-reported Total assessments. For our longitudinal database, most correlations point to statistical relationships of our vocal-source biomarkers with psychomotor activity, as well as with depression severity.
READ LESS

Summary

A hypothesis in characterizing human depression is that change in the brain's basal ganglia results in a decline of motor coordination. Such a neuro-physiological change may therefore affect laryngeal control and dynamics. Under this hypothesis, toward the goal of objective monitoring of depression severity, we investigate vocal-source biomarkers for depression...

READ MORE

Exploring the impact of advanced front-end processing on NIST speaker recognition microphone tasks

Summary

The NIST speaker recognition evaluation (SRE) featured microphone data in the 2005-2010 evaluations. The preprocessing and use of this data has typically been performed with telephone bandwidth and quantization. Although this approach is viable, it ignores the richer properties of the microphone data-multiple channels, high-rate sampling, linear encoding, ambient noise properties, etc. In this paper, we explore alternate choices of preprocessing and examine their effects on speaker recognition performance. Specifically, we consider the effects of quantization, sampling rate, enhancement, and two-channel speech activity detection. Experiments on the NIST 2010 SRE interview microphone corpus demonstrate that performance can be dramatically improved with a different preprocessing chain.
READ LESS

Summary

The NIST speaker recognition evaluation (SRE) featured microphone data in the 2005-2010 evaluations. The preprocessing and use of this data has typically been performed with telephone bandwidth and quantization. Although this approach is viable, it ignores the richer properties of the microphone data-multiple channels, high-rate sampling, linear encoding, ambient noise...

READ MORE

FY11 Line-Supported Bio-Next Program - Multi-modal Early Detection Interactive Classifier (MEDIC) for mild traumatic brain injury (mTBI) triage

Summary

The Multi-modal Early Detection Interactive Classifier (MEDIC) is a triage system designed to enable rapid assessment of mild traumatic brain injury (mTBI) when access to expert diagnosis is limited as in a battlefield setting. MEDIC is based on supervised classification that requires three fundamental components to function correctly; these are data, features, and truth. The MEDIC system can act as a data collection device in addition to being an assessment tool. Therefore, it enables a solution to one of the fundamental challenges in understanding mTBI: the lack of useful data. The vision of MEDIC is to fuse results from stimulus tests in each of four modalitites - auditory, occular, vocal, and intracranial pressure - and provide them to a classifier. With appropriate data for training, the MEDIC classifier is expected to provide an immediate decision of whether the subject has a strong likelihood of having sustained an mTBI and therefore requires an expert diagnosis from a neurologist. The tests within each modalitity were designed to balance the capacity of objective assessment and the maturity of the underlying technology against the ability to distinguish injured from non-injured subjects according to published results. Selection of existing modalities and underlying features represents the best available, low cost, portable technology with a reasonable chance of success.
READ LESS

Summary

The Multi-modal Early Detection Interactive Classifier (MEDIC) is a triage system designed to enable rapid assessment of mild traumatic brain injury (mTBI) when access to expert diagnosis is limited as in a battlefield setting. MEDIC is based on supervised classification that requires three fundamental components to function correctly; these are...

READ MORE