Publications
Improving accent identification through knowledge of English syllable structure
Summary
Summary
This paper studies the structure of foreign-accented read English speech. A system for accent identification is constructed by combining linguistic theory with statistical analysis. Results demonstrate that the linguistic theory is reflected in real speech data and its application improves accent identification. The work discussed here combines and applies previous...
Sheep, goats, lambs and wolves: a statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation
Summary
Summary
Performance variability in speech and speaker recognition systems can be attributed to many factors. One major factor, which is often acknowledged but seldom analyzed, is inherent differences in the recognizability of different speakers. In speaker recognition systems such differences are characterized by the use of animal names for different types...
Vulnerabilities of reliable multicast protocols
Summary
Summary
We examine vulnerabilities of several reliable multicast protocols. The various mechanisms employed by these protocols to provide reliability can present vulnerabilities. We show how some of these vulnerabilities can be exploited in denial-of-service attacks, and discuss potential mechanisms for withstanding such attacks.
AM-FM separation using shunting neural networks
Summary
Summary
We describe an approach to estimating the amplitude-modulated (AM) and frequency-modulated (FM) components of a signal. Any signal can be written as the product of an AM component and an FM component. There have been several approaches to solving the AM-FM estimation problem described in the literature. Popular methods include...
Magnitude-only estimation of handset nonlinearity with application to speaker recognition
Summary
Summary
A method is described for estimating telephone handset nonlinearity by matching the spectral magnitude of the distorted signal to the output of a nonlinear channel model, driven by an undistorted reference. The "magnitude-only" representation allows the model to directly match unwanted speech formants that arise over nonlinear channels and that...
Audio signal processing based on sinusoidal analysis/synthesis
Summary
Summary
Based on a sinusoidal model, an analysis/synthesis technique is developed that characterizes audio signals, such as speech and music, in terms of the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated by applying a peak-picking algorithm to the short-time Fourier transform of the input waveform...
High-performance low-complexity wordspotting using neural networks
Summary
Summary
A high-performance low-complexity neural network wordspotter was developed using radial basis function (RBF) neural networks in a hidden Markov model (HMM) framework. Two new complementary approaches substantially improve performance on the talker independent Switchboard corpus. Figure of Merit (FOM) training adapts wordspotter parameters to directly improve the FOM performance metric...
Noise reduction based on spectral change
Summary
Summary
A noise reduction algorithm is designed for the aural enhancement of short-duration wideband signals. The signal of interest contains components possibly only a few milliseconds in duration and corrupted by nonstationary noise background. The essence of the enhancement technique is a Weiner filter that uses a desired signal spectrum whose...
Comparison of background normalization methods for text-independent speaker verification
Summary
Summary
This paper compares two approaches to background model representation for a text-independent speaker verification task using Gaussian mixture models. We compare speaker-dependent background speaker sets to the use of a universal, speaker-independent background model (UBM). For the UBM, we describe how Bayesian adaptation can be used to derive claimant speaker...
Predicting, diagnosing, and improving automatic language identification performance
Summary
Summary
Language-identification (LID) techniques that use multiple single-language phoneme recognizers followed by n-gram language models have consistently yielded top performance at NIST evaluations. In our study of such systems, we have recently cut our LID error rate by modeling the output of n-gram language models more carefully. Additionally, we are now...