Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

Relation of automatically extracted formant trajectories with intelligibility loss and speaking rate decline in amyotrophic lateral sclerosis

September 8, 2016

Conference Paper

Author:

Rachelle Horwitz-Martin

…

Published in:

INTERSPEECH 2016: 16th Annual Conf. of the Int. Speech Communication Assoc., 8-12 September 2016.

Topic:

biometrics

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

Effective monitoring of bulbar disease progression in persons with amyotrophic lateral sclerosis (ALS) requires rapid, objective, automatic assessment of speech loss. The purpose of this work was to identify acoustic features that aid in predicting intelligibility loss and speaking rate decline in individuals with ALS. Features were derived from statistics of the first (F1) and second (F2) formant frequency trajectories and their first and second derivatives. Motivated by a possible link between components of formant dynamics and specific articulator movements, these features were also computed for low-pass and high-pass filtered formant trajectories. When compared to clinician-rated intelligibility and speaking rate assessments, F2 features, particularly mean F2 speed and a novel feature, mean F2 acceleration, were most strongly correlated with intelligibility and speaking rate, respectively (Spearman correlations > 0.70, p < 0.0001). These features also yielded the best predictions in regression experiments (r > 0.60, p < 0.0001). Comparable results were achieved using low-pass filtered F2 trajectory features, with higher correlations and lower prediction errors achieved for speaking rate over intelligibility. These findings suggest information can be exploited in specific frequency components of formant trajectories, with implications for automatic monitoring of ALS.

READ LESS

Summary

Relation of automatically extracted formant trajectories with intelligibility loss and speaking rate decline in amyotrophic lateral sclerosis

Relating estimated cyclic spectral peak frequency to measured epilarynx length using magnetic resonance imaging

September 8, 2016

Conference Paper

Author:

Elizabeth C. Godoy

…

Published in:

INTERSPEECH 2016: 16th Annual Conf. of the Int. Speech Communication Assoc., 8-12 September 2016.

Topic:

biometrics

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

The epilarynx plays an important role in speech production, carrying information about the individual speaker and manner of articulation. However, precise acoustic behavior of this lower vocal tract structure is difficult to establish. Focusing on acoustics observable in natural speech, recent spectral processing techniques isolate a unique resonance with characteristics of the epilarynx previously shown via simulation, specifically cyclicity (i.e. energy differences between the closed and open phases of the glottal cycle) in a 3-5kHz region observed across vowels. Using Magnetic Resonance Imaging (MRI), the present work relates this estimated cyclic peak frequency to measured epilarynx length. Assuming a simple quarter wavelength relationship, the cavity length estimated from the cyclic peak frequency is shown to be directly proportional (linear fit slope =1.1) and highly correlated (p = 0.85, pval<10^?4) to the measured epilarynx length across speakers. Results are discussed, as are implications in speech science and application domains.

READ LESS

Summary

Relating estimated cyclic spectral peak frequency to measured epilarynx length using magnetic resonance imaging

Speaker linking and applications using non-parametric hashing methods

September 8, 2016

Conference Paper

Author:

Douglas E. Sturim

…

William M. Campbell

Published in:

INTERSPEECH 2016: 16th Annual Conf. of the Int. Speech Communication Assoc., 8-12 September 2016.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Large unstructured audio data sets have become ubiquitous and present a challenge for organization and search. One logical approach for structuring data is to find common speakers and link occurrences across different recordings. Prior approaches to this problem have focused on basic methodology for the linking task. In this paper, we introduce a novel trainable nonparametric hashing method for indexing large speaker recording data sets. This approach leads to tunable computational complexity methods for speaker linking. We focus on a scalable clustering method based on hashing canopy-clustering. We apply this method to a large corpus of speaker recordings, demonstrate performance tradeoffs, and compare to other hashing methods.

READ LESS

Summary

Speaker linking and applications using non-parametric hashing methods

The AFRL-MITLL WMT16 news-translation task systems

August 11, 2016

Conference Paper

Author:

Jeremy Gwinnup

…

Published in:

Proc. First Conf. on Machine Translation, Vol. 2, 11-12 August 2016, pp. 296-302.

Topic:

human language technology

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper describes the AFRL-MITLL statistical machine translation systems and the improvements that were developed during the WMT16 evaluation campaign. New techniques applied this year include Neural Machine Translation, a unique selection process for language modelling data, additional out-of-vocabulary transliteration techniques, and morphology generation.

READ LESS

Summary

The AFRL-MITLL WMT16 news-translation task systems

Matching community structure across online social networks

August 3, 2016

Journal Article

Author:

Lin Li

…

William M. Campbell

Published in:

arXiv, 3 August 2016.

Topic:

social network

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

The discovery of community structure in networks is a problem of considerable interest in recent years. In online social networks, often times, users are simultaneously involved in multiple social media sites, some of which share common social relationships. It is of great interest to uncover a shared community structure across these networks. However, in reality, users typically identify themselves with different usernames across social media sites. This creates a great difficulty in detecting the community structure. In this paper, we explore several approaches for community detection across online social networks with limited knowledge of username alignment across the networks. We refer to the known alignment of usernames as seeds. We investigate strategies for seed selection and its impact on networks with a different fraction of overlapping vertices. The goal is to study the interplay between network topologies and seed selection strategies, and to understand how it affects the detected community structure. We also propose several measures to assess the performance of community detection and use them to measure the quality of the detected communities in both Twitter-Twitter networks and Twitter-Instagram networks.

READ LESS

Summary

Matching community structure across online social networks

Cross-domain entity resolution in social media

July 11, 2016

Conference Paper

Author:

William M. Campbell

…

Published in:

4th Int. Workshop on Natural Language Processing for Social Media, SocialNLP with IJCAI, 11 July 2016.

Topic:

social network

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

The challenge of associating entities across multiple domains is a key problem in social media understanding. Successful cross-domain entity resolution provides integration of information from multiple sites to create a complete picture of user and community activities, characteristics, and trends. In this work, we examine the problem of entity resolution across Twitter and Instagram using general techniques. Our methods fall into three categories: profile, content, and graph based. For the profile-based methods, we consider techniques based on approximate string matching. For content-based methods, we perform author identification. Finally, for graph-based methods, we apply novel cross-domain community detection methods and generate neighborhood-based features. The three categories of methods are applied to a large graph of users in Twitter and Instagram to understand challenges, determine performance, and understand fusion of multiple methods. Final results demonstrate an equal error rate less than 1%.

READ LESS

Summary

Cross-domain entity resolution in social media

The MITLL NIST LRE 2015 Language Recognition System

June 21, 2016

Conference Paper

Author:

Pedro A. Torres-Carrasquillo

…

Published in:

Odyssey 2016, 21-24 June 2016, pp. 196-203.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In this paper we describe the most recent MIT Lincoln Laboratory language recognition system developed for the NIST 2015 Language Recognition Evaluation (LRE). The submission features a fusion of five core classifiers, with most systems developed in the context of an i-vector framework. The 2015 evaluation presented new paradigms. First, the evaluation included fixed training and open training tracks for the first time; second, language classification performance was measured across 6 language clusters using 20 language classes instead of an N-way language task; and third, performance was measured across a nominal 3-30 second range. Results are presented for the overall performance across the six language clusters for both the fixed and open training tasks. On the 6-cluster metric the Lincoln system achieved overall costs of 0.173 and 0.168 for the fixed and open tasks respectively.

READ LESS

Summary

The MITLL NIST LRE 2015 Language Recognition System

Channel compensation for speaker recognition using MAP adapted PLDA and denoising DNNs

June 21, 2016

Conference Paper

Author:

Frederick S. Richardson

…

Published in:

Odyssey 2016, The Speaker and Language Recognition Workshop, 21-24 June 2016.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Over several decades, speaker recognition performance has steadily improved for applications using telephone speech. A big part of this improvement has been the availability of large quantities of speaker-labeled data from telephone recordings. For new data applications, such as audio from room microphones, we would like to effectively use existing telephone data to build systems with high accuracy while maintaining good performance on existing telephone tasks. In this paper we compare and combine approaches to compensate models parameters and features for this purpose. For model adaptation we explore MAP adaptation of hyper-parameters and for feature compensation we examine the use of denoising DNNs. On a multi-room, multi-microphone speaker recognition experiment we show a reduction of 61% in EER with a combination of these approaches while slightly improving performance on telephone data.

READ LESS

Summary

Channel compensation for speaker recognition using MAP adapted PLDA and denoising DNNs

A vocal modulation model with application to predicting depression severity

June 14, 2016

Conference Paper

Author:

Rachelle Horwitz-Martin

…

Published in:

13th IEEE Int. Conf. on Wearable and Implantable Body Sensor Networks, BSN 2016, 14-17 June 2016.

Topic:

biometrics

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

Speech provides a potential simple and noninvasive "on-body" means to identify and monitor neurological diseases. Here we develop a model for a class of vocal biomarkers exploiting modulations in speech, focusing on Major Depressive Disorder (MDD) as an application area. Two model components contribute to the envelope of the speech waveform: amplitude modulation (AM) from respiratory muscles, and AM from interaction between vocal tract resonances (formants) and frequency modulation in vocal fold harmonics. Based on the model framework, we test three methods to extract envelopes capturing these modulations of the third formant for synthesized sustained vowels. Using subsequent modulation features derived from the model, we predict MDD severity scores with a Gaussian Mixture Model. Performing global optimization over classifier parameters and number of principal components, we evaluate performance of the features by examining the root-mean-squared error (RMSE), mean absolute error (MAE), and Spearman correlation between the actual and predicted MDD scores. We achieved RMSE and MAE values 10.32 and 8.46, respectively (Spearman correlation=0.487, p<0.001), relative to a baseline RMSE of 11.86 and MAE of 10.05, obtained by predicting the mean MDD severity score. Ultimately, our model provides a framework for detecting and monitoring vocal modulations that could also be applied to other neurological diseases.

READ LESS

Summary

A vocal modulation model with application to predicting depression severity

Operational assessment of keyword search on oral history

May 23, 2016

Conference Paper

Author:

Elizabeth E. Salesky

…

Published in:

10th Language Resources and Evaluation Conf., LREC 2016, 23-8 May 2016.

Topic:

human language technology

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This project assesses the resources necessary to make oral history searchable by means of automatic speech recognition (ASR). There are many inherent challenges in applying ASR to conversational speech: smaller training set sizes and varying demographics, among others. We assess the impact of dataset size, word error rate and term-weighted value on human search capability through an information retrieval task on Mechanical Turk. We use English oral history data collected by StoryCorps, a national organization that provides all people with the opportunity to record, share and preserve their stories, and control for a variety of demographics including age, gender, birthplace, and dialect on four different training set sizes. We show comparable search performance using a standard speech recognition system as with hand-transcribed data, which is promising for increased accessibility of conversational speech and oral history archives.

READ LESS

Summary

Operational assessment of keyword search on oral history

Publications

Refine Results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

The MITLL NIST LRE 2015 Language Recognition System

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Showing Results