Publications

Refine Results

(Filters Applied) Clear All

R&D Areas

R&D Groups

Year

Items per page

Exploring the impact of advanced front-end processing on NIST speaker recognition microphone tasks

June 25, 2012

Conference Paper

Author:

William M. Campbell

…

Published in:

Odyssey 2012, the Speaker and Language Recognition Workshop, 25-28 June 2012.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

The NIST speaker recognition evaluation (SRE) featured microphone data in the 2005-2010 evaluations. The preprocessing and use of this data has typically been performed with telephone bandwidth and quantization. Although this approach is viable, it ignores the richer properties of the microphone data-multiple channels, high-rate sampling, linear encoding, ambient noise properties, etc. In this paper, we explore alternate choices of preprocessing and examine their effects on speaker recognition performance. Specifically, we consider the effects of quantization, sampling rate, enhancement, and two-channel speech activity detection. Experiments on the NIST 2010 SRE interview microphone corpus demonstrate that performance can be dramatically improved with a different preprocessing chain.

READ LESS

Summary

Exploring the impact of advanced front-end processing on NIST speaker recognition microphone tasks

Linear prediction modulation filtering for speaker recognition of reverberant speech

June 25, 2012

Conference Paper

Author:

Bengt J. Borgstrom

…

Alan V. McCree

Published in:

Odyssey 2012, The Speaker and Language Recognition Workshop, 25-28 June 2012.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper proposes a framework for spectral enhancement of reverberant speech based on inversion of the modulation transfer function. All-pole modeling of modulation spectra of clean and degraded speech are utilized to derive the linear prediction inverse modulation transfer function (LP-IMTF) solution as a low-order IIR filter in the modulation envelope domain. By considering spectral estimation under speech presence uncertainty, speech presence probabilities are derived for the case of reverberation. Aside from enhancement, the LP-IMTF framework allows for blind estimation of reverberation time by extracting a minimum phase approximation of the short-time spectral channel impulse response. The proposed speech enhancement method is used as a front-end processing step for speaker recognition. When applied to the microphone condition of the NISTSRE 2010 with artificially added reverberation, the proposed spectral enhancement method yields significant improvements across a variety of performance metrics.

READ LESS

Summary

Linear prediction modulation filtering for speaker recognition of reverberant speech

A stochastic system for large network growth

June 1, 2012

Journal Article

Author:

Benjamin A. Miller

…

Nadya T. Bliss

Published in:

IEEE Signal Process. Lett., Vol. 19, No. 6, June 2012, pp. 356-359.

Topic:

signal processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This letter proposes a new model for preferential attachment in dynamic directed networks. This model consists of a linear time-invariant system that uses past observations to predict future attachment rates, and an innovation noise process that induces growth on vertices that previously had no attachments. Analyzing a large citation network in this context, we show that the proposed model fits the data better than existing preferential attachment models. An analysis of the noise in the dataset reveals power-law degree distributions often seen in large networks, and polynomial decay with respect to age in the probability of citing yet-uncited documents.

READ LESS

Summary

A stochastic system for large network growth

FY11 Line-Supported Bio-Next Program - Multi-modal Early Detection Interactive Classifier (MEDIC) for mild traumatic brain injury (mTBI) triage

April 9, 2012

Project Report

Author:

Laurel R. Keyes

…

Published in:

MIT Lincoln Laboratory Report LSP-34

Topic:

biology

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

The Multi-modal Early Detection Interactive Classifier (MEDIC) is a triage system designed to enable rapid assessment of mild traumatic brain injury (mTBI) when access to expert diagnosis is limited as in a battlefield setting. MEDIC is based on supervised classification that requires three fundamental components to function correctly; these are data, features, and truth. The MEDIC system can act as a data collection device in addition to being an assessment tool. Therefore, it enables a solution to one of the fundamental challenges in understanding mTBI: the lack of useful data. The vision of MEDIC is to fuse results from stimulus tests in each of four modalitites - auditory, occular, vocal, and intracranial pressure - and provide them to a classifier. With appropriate data for training, the MEDIC classifier is expected to provide an immediate decision of whether the subject has a strong likelihood of having sustained an mTBI and therefore requires an expert diagnosis from a neurologist. The tests within each modalitity were designed to balance the capacity of objective assessment and the maturity of the underlying technology against the ability to distinguish injured from non-injured subjects according to published results. Selection of existing modalities and underlying features represents the best available, low cost, portable technology with a reasonable chance of success.

READ LESS

Summary

FY11 Line-Supported Bio-Next Program - Multi-modal Early Detection Interactive Classifier (MEDIC) for mild traumatic brain injury (mTBI) triage

Topic identification based extrinsic evaluation of summarization techniques applied to conversational speech

March 25, 2012

Journal Article

Author:

David F. Harwath

…

Timothy J. Hazen

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 25-30 March 2012, pp. 5073-6.

Topic:

topic identification

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Document summarization algorithms are most commonly evaluated according to the intrinsic quality of the summaries they produce. An alternate approach is to examine the extrinsic utility of a summary, measured by the ability of the summary to aid a human in the completion of a specific task. In this paper, we use topic identification as a proxy for relevancy determination in the context of an information retrieval task, and a summary is deemed effective if it enables a user to determine the topical content of a retrieved document. We utilize Amazon's Mechanical Turk service to perform a large-scale human study contrasting four different summarization systems applied to conversational speech from the Fisher Corpus. We show that these results appear to be correlated with the performance of an automated topic identification system, and argue that this automated system can act as a low-cost proxy for a human evaluation during the development stages of a summarization system.

READ LESS

Summary

Topic identification based extrinsic evaluation of summarization techniques applied to conversational speech

Goodness-of-fit statistics for anomaly detection in Chung-Lu random graphs

March 25, 2012

Conference Paper

Author:

Benjamin A. Miller

…

Published in:

ICASSP 2012, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 25-30 March 2012, pp. 3265-8.

Topic:

signal processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Anomaly detection in graphs is a relevant problem in numerous applications. When determining whether an observation is anomalous with respect to the model of typical behavior, the notion of "goodness of fit" is important. This notion, however, is not well understood in the context of graph data. In this paper, we propose three goodness-of-fit statistics for Chung-Lu random graphs, and analyze their efficacy in discriminating graphs generated by the Chung-Lu model from those with anomalous topologies. In the results of a Monte Carlo simulation, we see that the most powerful statistic for anomaly detection depends on the type of anomaly, suggesting that a hybrid statistic would be the most powerful.

READ LESS

Summary

Goodness-of-fit statistics for anomaly detection in Chung-Lu random graphs

Autoregressive HMM speech synthesis

March 25, 2012

Conference Paper

Author:

Carl B. Quillen

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 25-30 March 2012, pp. 4021-4.

Topic:

speech modification

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Autoregressive HMM modeling of spectral features has been proposed as a replacement for standard HMM speech synthesis. The merits of the approach are explored, and methods for enforcing stability of the estimated predictor coefficients are presented. It appears that rather than directly estimating autoregressive HMM parameters, greater synthesis accuracy is obtained by estimating the autoregressive HMM parameters by using a more traditional HMM recognition system to compute state-level posterior probabilities that are then used to accumulate statistics to estimate predictor coefficients. The result is a simplified mathematical framework that requires no modeling of derivatives and still provides smooth synthesis without unnatural spectral discontinuities. The resulting synthesis algorithm involves no matrix solves and may be formulated causally, and appears to result in quality very similar to that of more traditional HMM synthesis approaches. This paper describes the implementation of a complete Autoregressive HMM LVCSR system and its application for synthesis, and describes the preliminary synthesis results.

READ LESS

Summary

Autoregressive HMM speech synthesis

Topic modeling for spoken documents using only phonetic information

December 15, 2011

Conference Paper

Author:

Timothy J. Hazen

…

Published in:

ASRU 2011, IEEE Workshop on Automatic Speech Recognition & Understanding, 11-15 December 2011, pp. 395-400.

Topic:

topic identification

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper explores both supervised and unsupervised topic modeling for spoken audio documents using only phonetic information. In cases where word-based recognition is unavailable or infeasible, phonetic information can be used to indirectly learn and capture information provided by topically relevant lexical items. In some situations, a lack of transcribed data can prevent supervised training of a same-language phonetic recognition system. In these cases, phonetic recognition can use cross-language models or self-organizing units (SOUs) learned in a completely unsupervised fashion. This paper presents recent improvements in topic modeling using only phonetic information. We present new results using recently developed techniques for discriminative training for topic identification used in conjunction with recent improvements in SOU learning. A preliminary examination of the use of unsupervised latent topic modeling for unsupervised discovery of topics and topically relevant lexical items from phonetic information is also presented.

READ LESS

Summary

Topic modeling for spoken documents using only phonetic information

Investigating acoustic correlates of human vocal fold vibratory phase asymmetry through modeling and laryngeal high-speed videoendoscopy

December 1, 2011

Journal Article

Author:

Daryush Mehta

…

Published in:

J. Acoust. Soc. Am., Vol. 130, No. 6, December 2011, pp. 3999-4009.

Topic:

biometrics

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Vocal fold vibratory asymmetry is often associated with inefficient sound production through its impact on source spectral tilt. This association is investigated in both a computational voice production model and a group of 47 human subjects. The model provides indirect control over the degree of left-right phase asymmetry within a nonlinear source-filter framework, and high-speed videoendoscopy provides in vivo measures of vocal fold vibratory asymmetry. Source spectral tilt measures are estimated from the inverse-filtered spectrum of the simulated and recorded radiated acoustic pressure. As expected, model simulations indicated that increasing left-right phase asymmetry induces steeper spectral tilt. Subject data, however, reveal that none of the vibratory asymmetry measures correlates with spectral tilt measures. Probing further into physiological correlates of spectral tilt that might be affected by asymmetry, the glottal area waveform is parameterized to obtain measures of the open phase (open/plateau quotient) and closing phase (speed/closing quotient). Subjects' left-right phase asymmetry exhibits low, but statistically significant, correlations with speed quotient (r=0.45) and closing quotient (r=-0.39). Results call for future studies into the effect of asymmetric vocal fold vibrartion on glottal airflow and the associated impact on voice source spectral properties and vocal efficiency.

READ LESS

Summary

Investigating acoustic correlates of human vocal fold vibratory phase asymmetry through modeling and laryngeal high-speed videoendoscopy

Face recognition despite missing information

November 15, 2011

Journal Article

Author:

Cagri K. Dagli

…

Published in:

HST 2011, IEEE Int. Conf. on Technologies for Homeland Security, 15-17 November 2011, pp. 475-480.

Topic:

biometrics

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Missing or degraded information continues to be a significant practical challenge facing automatic face representation and recognition. Generally, existing approaches seek either to generatively invert the degradation process or find discriminative representations that are immune to it. Ideally, the solution to this problem exists between these two perspectives. To this end, in this paper we show the efficacy of using probabilistic linear subspace modes (in particular, variational probabilistic PCA) for both modeling and recognizing facial data under disguise or occlusion. From a discriminative perspective, we verify the efficacy of this approach for attenuating the effect of missing data due to disguise and non-linear speculars in several verification experiments. From a generative view, we show its usefulness in not only estimating missing information but also understanding facial covariates for image reconstruction. In addition, we present a least-squares connection to the maximum likelihood solution under missing data and show its intuitive connection to the geometry of the subspace learning problem.

READ LESS

Summary

Face recognition despite missing information

Publications

Refine Results

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Showing Results