Publications
Shunting networks for multi-band AM-FM decomposition
Summary
Summary
We describe a transduction-based, neurodynamic approach to estimating the amplitude-modulated (AM) and frequency-modulated (FM) components of a signal. We show that the transduction approach can be realized as a bank of constant-Q bandpass filters followed by envelope detectors and shunting neural networks, and the resulting dynamical system is capable of...
A study of computation speed-ups of the GMM-UBM speaker recognition system
Summary
Summary
The Gaussian Mixture Model Universal Background Model (GMM-UBM) speaker recognition system has demonstrated very high performance in several NIST evaluations. Such evaluations, however, are concerned only with classification accuracy. In many applications, system effectiveness must be evaluated in light of both accuracy and execution speed. We present here a number...
Evaluation of confidence measures for language identification
Summary
Summary
In this paper we examine various ways to derive confidence measures for a language identification system, using phone recognition followed by language models, and describe the application of an evaluation metric for measuring the "goodness" of the different confidence measures. Experiments are conducted on the 1996 NIST Language Identification Evaluation...
Speaker and language recognition using speech codec parameters
Summary
Summary
In this paper, we investigate the effect of speech coding on speaker and language recognition tasks. Three coders were selected to cover a wide range of quality and bit rates: GSM at 12.2 kb/s, G.729 at 8 kb/s, and G.723.1 at 5.3 kb/s. Our objective is to measure recognition performance...
Modeling of the glottal flow derivative waveform with application to speaker identification
Summary
Summary
An automatic technique for estimating and modeling the glottal flow derivative source waveform from speech, and applying the model parameters to speaker identification, is presented. The estimate of the glottal flow derivative is decomposed into coarse structure, representing the general flow shape, and fine structure, comprising aspiration and other perturbations...
Understanding-based translingual information retrieval
Summary
Summary
This paper describes our preliminary research on an understanding-based translingual information retrieval system for which the input to the system is a query sentence in English, and the output of the system is a set of documents either in English or in Korean. The understanding module produces a meaning representation...
Security implications of adaptive multimedia distribution
Summary
Summary
We discuss the security implications of different techniques used in adaptive audio and video distribution. Several sources of variability in the network make it necessary for applications to adapt. Ideally, each receiver should receive media quality commensurate with the capacity of the path leading to it from each sender. Several...
Automatic speaker clustering from multi-speaker utterances
Summary
Summary
Blind clustering of multi-person utterances by speaker is complicated by the fact that each utterance has at least two talkers. In the case of a two-person conversation, one can simply split each conversation into its respective speaker halves, but this introduces error which ultimately hurts clustering. We propose a clustering...
Corpora for the evaluation of speaker recognition systems
Summary
Summary
Using standard speech corpora for development and evaluation has proven to be very valuable in promoting progress in speech and speaker recognition research. In this paper, we present an overview of current publicly available corpora intended for speaker recognition research and evaluation. We outline the corpora's salient features with respect...
Implications of glottal source for speaker and dialect identification
Summary
Summary
In this paper we explore the importance of speaker specific information carried in the glottal source. We time align utterances of two speakers speaking the same sentence from the TIMIT database of American English. We then extract the glottal flow derivative from each speaker and interchange them. Through time alignment...