Publications

Refine Results

(Filters Applied) Clear All

Modeling of the glottal flow derivative waveform with application to speaker identification

Published in:
IEEE Trans. Speech Audio Process., Vol. 7, No. 5, September 1999, pp. 569-586.

Summary

An automatic technique for estimating and modeling the glottal flow derivative source waveform from speech, and applying the model parameters to speaker identification, is presented. The estimate of the glottal flow derivative is decomposed into coarse structure, representing the general flow shape, and fine structure, comprising aspiration and other perturbations in the flow, from which model parameters are obtained. The glottal flow derivative is estimated using an inverse filter determined within a time interval of vocal-fold closure that is identified through differences in formant frequency modulation during the open and closed phases of the glottal cycle. This formant motion is predicted by Ananthapadmanabha and Fant to be a result of time-varying and nonlinear source/vocal tract coupling within a glottal cycle. The glottal flow derivative estimate is modeled using the Liljencrants-Fant model to capture its coarse structure, while the fine structure of the flow derivative is represented through energy and perturbation measures. The model parameters are used in a Gaussian mixture model speaker identification (SID) system. Both coarse- and fine-structure glottal features are shown to contain significant speaker-dependent information. For a large TIMIT database subset, averaging over male and female SID scores, the coarse-structure parameters achieve about 60% accuracy, the fine-structure parameters give about 40% accuracy, and their combination yields about 70% correct identification. Finally, in preliminary experiments on the counterpart telephone-degraded NTIMIT database, about a 5% error reduction in SID scores is obtained when source features are combined with traditional mel-cepstral measures.
READ LESS

Summary

An automatic technique for estimating and modeling the glottal flow derivative source waveform from speech, and applying the model parameters to speaker identification, is presented. The estimate of the glottal flow derivative is decomposed into coarse structure, representing the general flow shape, and fine structure, comprising aspiration and other perturbations...

READ MORE

Understanding-based translingual information retrieval

Published in:
4th Int. Conf. on Applications of Natural Language to Information Systems, 17-19 June 1999, pp. 187-195.

Summary

This paper describes our preliminary research on an understanding-based translingual information retrieval system for which the input to the system is a query sentence in English, and the output of the system is a set of documents either in English or in Korean. The understanding module produces a meaning representation --- called semantic frame --- of the input sentence where the predicate-argument structure and the question-type of the input are identified, and each keyword is assigned its concept category. The translingual search module performs search on an English and Korean bilingual corpus tagged with concept categories. The results of our preliminary experiment, performed an a document set consisting of slides and notes from English and Korean briefings in a military domain, indicate that an understanding-based approach to information retrieval combined with concept-based search technique improves both precision and recall compared with a keyword match technique without understanding for both monolingual- and translingual retrieval. Current work is directed at further development of the system, and in preparation for tests on larger copora.
READ LESS

Summary

This paper describes our preliminary research on an understanding-based translingual information retrieval system for which the input to the system is a query sentence in English, and the output of the system is a set of documents either in English or in Korean. The understanding module produces a meaning representation...

READ MORE

Security implications of adaptive multimedia distribution

Published in:
Proc. IEEE Int. Conf. on Communications, Multimedia and Wireless, Vol. 3, 6-10 June 1999, pp. 1563-1567.

Summary

We discuss the security implications of different techniques used in adaptive audio and video distribution. Several sources of variability in the network make it necessary for applications to adapt. Ideally, each receiver should receive media quality commensurate with the capacity of the path leading to it from each sender. Several different techniques have been proposed to provide such adaptation. We discuss the implications of each technique for confidentiality, authentication, integrity, and anonymity. By coincidence, the techniques with better performance also have better security properties.
READ LESS

Summary

We discuss the security implications of different techniques used in adaptive audio and video distribution. Several sources of variability in the network make it necessary for applications to adapt. Ideally, each receiver should receive media quality commensurate with the capacity of the path leading to it from each sender. Several...

READ MORE

Automatic speaker clustering from multi-speaker utterances

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. II, 15-19 March 1999, pp. 817-820.

Summary

Blind clustering of multi-person utterances by speaker is complicated by the fact that each utterance has at least two talkers. In the case of a two-person conversation, one can simply split each conversation into its respective speaker halves, but this introduces error which ultimately hurts clustering. We propose a clustering algorithm which is capable of associating each conversation with two clusters (and therefore two-speakers) obviating the need for splitting. Results are given for two speaker conversations culled from the Switchboard corpus, and comparisons are made to results obtained on single-speaker utterances. We conclude that although the approach is promising, our technique for computing inter-conversation similarities prior to clustering needs improvement.
READ LESS

Summary

Blind clustering of multi-person utterances by speaker is complicated by the fact that each utterance has at least two talkers. In the case of a two-person conversation, one can simply split each conversation into its respective speaker halves, but this introduces error which ultimately hurts clustering. We propose a clustering...

READ MORE

Corpora for the evaluation of speaker recognition systems

Published in:
ICASSP 1999, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 15-19 March 1999.

Summary

Using standard speech corpora for development and evaluation has proven to be very valuable in promoting progress in speech and speaker recognition research. In this paper, we present an overview of current publicly available corpora intended for speaker recognition research and evaluation. We outline the corpora's salient features with respect to their suitability for conducting speaker recognition experiments and evaluations. Links to these corpora, and to new corpora, will appear on the web http://www.apl.jhu.edu/Classes/Notes/Campbell/SpkrRec/. We hope to increase the awareness and use of these standard corpora and corresponding evaluation procedures throughout the speaker recognition community.
READ LESS

Summary

Using standard speech corpora for development and evaluation has proven to be very valuable in promoting progress in speech and speaker recognition research. In this paper, we present an overview of current publicly available corpora intended for speaker recognition research and evaluation. We outline the corpora's salient features with respect...

READ MORE

Implications of glottal source for speaker and dialect identification

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. II, 15-19 March 1999, pp. 813-816.

Summary

In this paper we explore the importance of speaker specific information carried in the glottal source. We time align utterances of two speakers speaking the same sentence from the TIMIT database of American English. We then extract the glottal flow derivative from each speaker and interchange them. Through time alignment and this glottal flow transformation, we can make a speaker of a northern dialect sound more like his southern counterpart. We also time align the utterances of two speakers of Spanish dialects speaking the same sentence and then perform the glottal waveform transformation. Through these processes a Peruvian speaker is made to sound more Cuban-like. From these experiments we conclude that significant speaker and dialect specific information, such as noise, breathiness or aspiration, and vocalization, is carried in the glottal signal.
READ LESS

Summary

In this paper we explore the importance of speaker specific information carried in the glottal source. We time align utterances of two speakers speaking the same sentence from the TIMIT database of American English. We then extract the glottal flow derivative from each speaker and interchange them. Through time alignment...

READ MORE

'Perfect reconstruction' time-scaling filterbanks

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. III, 15-19 March 1999, pp. 945-948.

Summary

A filterbank-based method of time-scale modification is analyzed for elemental signals including clicks, sines, and AM-FM sines. It is shown that with the use of some basic properties of linear systems, as well as FM-to-AM filter transduction, "perfect reconstruction" time-scaling filterbanks can be constructed for these elemental signal classes under certain conditions on the filterbank. Conditions for perfect reconstruction time-scaling are shown analytically for the uniform filterbank case, while empirically for the nonuniform constant-Q (gammatone) case. Extension of perfect reconstruction to multi-component signals is shown to require both filterbank and signal-dependent conditions and indicates the need for a more complete theory of "perfect reconstruction" time-scaling filterbanks.
READ LESS

Summary

A filterbank-based method of time-scale modification is analyzed for elemental signals including clicks, sines, and AM-FM sines. It is shown that with the use of some basic properties of linear systems, as well as FM-to-AM filter transduction, "perfect reconstruction" time-scaling filterbanks can be constructed for these elemental signal classes under...

READ MORE

Evaluating intrusion detection systems without attacking your friends: The 1998 DARPA intrusion detection evaluation

Summary

Intrusion detection systems monitor the use of computers and the network over which they communicate, searching for unauthorized use, anomalous behavior, and attempts to deny users, machines or portions of the network access to services. Potential users of such systems need information that is rarely found in marketing literature, including how well a given system finds intruders and how much work is required to use and maintain that system in a fully functioning network with significant daily traffic. Researchers and developers can specify which prototypical attacks can be found by their systems, but without access to the normal traffic generated by day-to-day work, they can not describe how well their systems detect real attacks while passing background traffic and avoiding false alarms. This information is critical: every declared intrusion requires time to review, regardless of whether it is a correct detection for which a real intrusion occurred, or whether it is merely a false alarm. To meet the needs of researchers, developers and ultimately system administrators we have developed the first objective, repeatable, and realistic measurement of intrusion detection system performance. Network traffic on an Air Force base was measured, characterized and subsequently simulated on an isolated network on which a few computers were used to simulate thousands of different Unix systems and hundreds of different users during periods of normal network traffic. Simulated attackers mapped the network, issued denial of service attacks, illegally gained access to systems, and obtained super-user privileges. Attack types ranged from old, well-known attacks, to new, stealthy attacks. Seven weeks of training data and two weeks of testing data were generated, filling more than 30 CD-ROMs. Methods and results from the 1998 DARPA intrusion detection evaluation will be highlighted, and preliminary plans for the 1999 evaluation will be presented.
READ LESS

Summary

Intrusion detection systems monitor the use of computers and the network over which they communicate, searching for unauthorized use, anomalous behavior, and attempts to deny users, machines or portions of the network access to services. Potential users of such systems need information that is rarely found in marketing literature, including...

READ MORE

Machine-assisted language translation for U.S./RoK Combined Forces Command

Published in:
Army RD&A Mag., November-December 1999, pp. 38-41.

Summary

The U.S. military must operate worldwide in a variety of international environments where many different languages are used. There is a critical need for translation, and there is a shortage of translators who can interpret military terminology specifically. One coalition environment where the need is particularly strong is in the Republic of Korea (RoK) where, although U.S. and RoK military personnel have been working together for many years, the language barrier still significantly reduces the speed and effectiveness of coalition command and control. This article describes the Massachusetts Institute of Technology (MIT) Lincoln Laboratory's work on automated, two-way, English/Korean translation for enhanced coalition communications. Our ultimate goal is to enhance multilingual communications by producing accurate translations across a number of languages. Therefore, we have chosen an interlingua-based approach to machine translation that is readily adaptable to multiple languages. In this approach, a natural language understanding system transforms the input into an intermediate meaning representation called Semantic Frame, which serves as a basis for generating output in multiple languages. To produce useful and effective translation systems in the short term, we have focused on limited military task domains and have configured our system as a machine-assisted translation system. This allows the human translator to confirm or edit the machine translation.
READ LESS

Summary

The U.S. military must operate worldwide in a variety of international environments where many different languages are used. There is a critical need for translation, and there is a shortage of translators who can interpret military terminology specifically. One coalition environment where the need is particularly strong is in the...

READ MORE

Blind clustering of speech utterances based on speaker and language characteristics

Published in:
5th Int. Conf. Spoken Language Processing (ICSLP), 30 November - 4 December 1998.

Summary

Classical speaker and language recognition techniques can be applied to the classification of unknown utterances by computing the likelihoods of the utterances given a set of well trained target models. This paper addresses the problem of grouping unknown utterances when no information is available regarding the speaker or language classes or even the total number of classes. Approaches to blind message clustering are presented based on conventional hierarchical clustering techniques and an integrated cluster generation and selection method called the d* algorithm. Results are presented using message sets derived from the Switchboard and Callfriend corpora. Potential applications include automatic indexing of recorded speech corpora by speaker/language tags and automatic or semiautomatic selection of speaker specific speech utterances for speaker recognition adaptation.
READ LESS

Summary

Classical speaker and language recognition techniques can be applied to the classification of unknown utterances by computing the likelihoods of the utterances given a set of well trained target models. This paper addresses the problem of grouping unknown utterances when no information is available regarding the speaker or language classes...

READ MORE