Publications
High-performance low-complexity wordspotting using neural networks
Summary
Summary
A high-performance low-complexity neural network wordspotter was developed using radial basis function (RBF) neural networks in a hidden Markov model (HMM) framework. Two new complementary approaches substantially improve performance on the talker independent Switchboard corpus. Figure of Merit (FOM) training adapts wordspotter parameters to directly improve the FOM performance metric...
Noise reduction based on spectral change
Summary
Summary
A noise reduction algorithm is designed for the aural enhancement of short-duration wideband signals. The signal of interest contains components possibly only a few milliseconds in duration and corrupted by nonstationary noise background. The essence of the enhancement technique is a Weiner filter that uses a desired signal spectrum whose...
Comparison of background normalization methods for text-independent speaker verification
Summary
Summary
This paper compares two approaches to background model representation for a text-independent speaker verification task using Gaussian mixture models. We compare speaker-dependent background speaker sets to the use of a universal, speaker-independent background model (UBM). For the UBM, we describe how Bayesian adaptation can be used to derive claimant speaker...
Predicting, diagnosing, and improving automatic language identification performance
Summary
Summary
Language-identification (LID) techniques that use multiple single-language phoneme recognizers followed by n-gram language models have consistently yielded top performance at NIST evaluations. In our study of such systems, we have recently cut our LID error rate by modeling the output of n-gram language models more carefully. Additionally, we are now...
Embedded dual-rate sinusoidal transform coding
Summary
Summary
This paper describes the development of a dual-rate Sinusoidal Transformer Coder in which a 2400 b/s coder is embedded as a separate packet in the 4800 b/s bit stream. The underlying coding structure provides the flexibility necessary for multirate speech coding and multimedia applications.
Ambiguity resolution for machine translation of telegraphic messages
Summary
Summary
Telegraphic messages with numerous instances of omission pose a new challenge to parsing in that a sentence with omission causes a higher degree of ambiguity than a sentence without omission. Misparsing reduced by omissions has a far-reaching consequence in machine translation. Namely, a misparse of the input often leads to...
Speech recognition by machines and humans
Summary
Summary
This paper reviews past work comparing modern speech recognition systems and humans to determine how far recent dramatic advances in technology have progressed towards the goal of human-like performance. Comparisons use six modern speech corpora with vocabularies ranging from 10 to more than 65,000 words and content ranging from read...
HTIMIT and LLHDB: speech corpora for the study of handset transducer effects
Summary
Summary
This paper describes two corpora collected at Lincoln Laboratory for the study of handset transducer effects on the speech signal: the handset TIMIT (HTIMIT) corpus and the Lincoln Laboratory Handset Database (LLHDB). The goal of these corpora are to minimize all confounding factors and to produce speech predominately differing only...
Speech recognition by humans and machines under conditions with severe channel variability and noise
Summary
Summary
Despite dramatic recent advances in speech recognition technology, speech recognizers still perform much worse than humans. The difference in performance between humans and machines is most dramatic when variable amounts and types of filtering and noise are present during testing. For example, humans readily understand speech that is low-pass filtered...
AM-FM separation using auditory-motivated filters
Summary
Summary
An approach to the joint estimation of sine-wave amplitude modulation (AM) and frequency modulation (FM) is described based on the transduction of frequency modulation into amplitude modulation by linear filters, being motivated by the hypothesis that the auditory system uses a similar transduction mechanism in measuring sine-wave FM. An AM-FM...