Publications

Refine Results

(Filters Applied) Clear All

Corpora for the evaluation of speaker recognition systems

Published in:
ICASSP 1999, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 15-19 March 1999.

Summary

Using standard speech corpora for development and evaluation has proven to be very valuable in promoting progress in speech and speaker recognition research. In this paper, we present an overview of current publicly available corpora intended for speaker recognition research and evaluation. We outline the corpora's salient features with respect to their suitability for conducting speaker recognition experiments and evaluations. Links to these corpora, and to new corpora, will appear on the web http://www.apl.jhu.edu/Classes/Notes/Campbell/SpkrRec/. We hope to increase the awareness and use of these standard corpora and corresponding evaluation procedures throughout the speaker recognition community.
READ LESS

Summary

Using standard speech corpora for development and evaluation has proven to be very valuable in promoting progress in speech and speaker recognition research. In this paper, we present an overview of current publicly available corpora intended for speaker recognition research and evaluation. We outline the corpora's salient features with respect...

READ MORE

Implications of glottal source for speaker and dialect identification

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. II, 15-19 March 1999, pp. 813-816.

Summary

In this paper we explore the importance of speaker specific information carried in the glottal source. We time align utterances of two speakers speaking the same sentence from the TIMIT database of American English. We then extract the glottal flow derivative from each speaker and interchange them. Through time alignment and this glottal flow transformation, we can make a speaker of a northern dialect sound more like his southern counterpart. We also time align the utterances of two speakers of Spanish dialects speaking the same sentence and then perform the glottal waveform transformation. Through these processes a Peruvian speaker is made to sound more Cuban-like. From these experiments we conclude that significant speaker and dialect specific information, such as noise, breathiness or aspiration, and vocalization, is carried in the glottal signal.
READ LESS

Summary

In this paper we explore the importance of speaker specific information carried in the glottal source. We time align utterances of two speakers speaking the same sentence from the TIMIT database of American English. We then extract the glottal flow derivative from each speaker and interchange them. Through time alignment...

READ MORE

'Perfect reconstruction' time-scaling filterbanks

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. III, 15-19 March 1999, pp. 945-948.

Summary

A filterbank-based method of time-scale modification is analyzed for elemental signals including clicks, sines, and AM-FM sines. It is shown that with the use of some basic properties of linear systems, as well as FM-to-AM filter transduction, "perfect reconstruction" time-scaling filterbanks can be constructed for these elemental signal classes under certain conditions on the filterbank. Conditions for perfect reconstruction time-scaling are shown analytically for the uniform filterbank case, while empirically for the nonuniform constant-Q (gammatone) case. Extension of perfect reconstruction to multi-component signals is shown to require both filterbank and signal-dependent conditions and indicates the need for a more complete theory of "perfect reconstruction" time-scaling filterbanks.
READ LESS

Summary

A filterbank-based method of time-scale modification is analyzed for elemental signals including clicks, sines, and AM-FM sines. It is shown that with the use of some basic properties of linear systems, as well as FM-to-AM filter transduction, "perfect reconstruction" time-scaling filterbanks can be constructed for these elemental signal classes under...

READ MORE

Evaluating intrusion detection systems without attacking your friends: The 1998 DARPA intrusion detection evaluation

Summary

Intrusion detection systems monitor the use of computers and the network over which they communicate, searching for unauthorized use, anomalous behavior, and attempts to deny users, machines or portions of the network access to services. Potential users of such systems need information that is rarely found in marketing literature, including how well a given system finds intruders and how much work is required to use and maintain that system in a fully functioning network with significant daily traffic. Researchers and developers can specify which prototypical attacks can be found by their systems, but without access to the normal traffic generated by day-to-day work, they can not describe how well their systems detect real attacks while passing background traffic and avoiding false alarms. This information is critical: every declared intrusion requires time to review, regardless of whether it is a correct detection for which a real intrusion occurred, or whether it is merely a false alarm. To meet the needs of researchers, developers and ultimately system administrators we have developed the first objective, repeatable, and realistic measurement of intrusion detection system performance. Network traffic on an Air Force base was measured, characterized and subsequently simulated on an isolated network on which a few computers were used to simulate thousands of different Unix systems and hundreds of different users during periods of normal network traffic. Simulated attackers mapped the network, issued denial of service attacks, illegally gained access to systems, and obtained super-user privileges. Attack types ranged from old, well-known attacks, to new, stealthy attacks. Seven weeks of training data and two weeks of testing data were generated, filling more than 30 CD-ROMs. Methods and results from the 1998 DARPA intrusion detection evaluation will be highlighted, and preliminary plans for the 1999 evaluation will be presented.
READ LESS

Summary

Intrusion detection systems monitor the use of computers and the network over which they communicate, searching for unauthorized use, anomalous behavior, and attempts to deny users, machines or portions of the network access to services. Potential users of such systems need information that is rarely found in marketing literature, including...

READ MORE

Machine-assisted language translation for U.S./RoK Combined Forces Command

Published in:
Army RD&A Mag., November-December 1999, pp. 38-41.

Summary

The U.S. military must operate worldwide in a variety of international environments where many different languages are used. There is a critical need for translation, and there is a shortage of translators who can interpret military terminology specifically. One coalition environment where the need is particularly strong is in the Republic of Korea (RoK) where, although U.S. and RoK military personnel have been working together for many years, the language barrier still significantly reduces the speed and effectiveness of coalition command and control. This article describes the Massachusetts Institute of Technology (MIT) Lincoln Laboratory's work on automated, two-way, English/Korean translation for enhanced coalition communications. Our ultimate goal is to enhance multilingual communications by producing accurate translations across a number of languages. Therefore, we have chosen an interlingua-based approach to machine translation that is readily adaptable to multiple languages. In this approach, a natural language understanding system transforms the input into an intermediate meaning representation called Semantic Frame, which serves as a basis for generating output in multiple languages. To produce useful and effective translation systems in the short term, we have focused on limited military task domains and have configured our system as a machine-assisted translation system. This allows the human translator to confirm or edit the machine translation.
READ LESS

Summary

The U.S. military must operate worldwide in a variety of international environments where many different languages are used. There is a critical need for translation, and there is a shortage of translators who can interpret military terminology specifically. One coalition environment where the need is particularly strong is in the...

READ MORE

Blind clustering of speech utterances based on speaker and language characteristics

Published in:
5th Int. Conf. Spoken Language Processing (ICSLP), 30 November - 4 December 1998.

Summary

Classical speaker and language recognition techniques can be applied to the classification of unknown utterances by computing the likelihoods of the utterances given a set of well trained target models. This paper addresses the problem of grouping unknown utterances when no information is available regarding the speaker or language classes or even the total number of classes. Approaches to blind message clustering are presented based on conventional hierarchical clustering techniques and an integrated cluster generation and selection method called the d* algorithm. Results are presented using message sets derived from the Switchboard and Callfriend corpora. Potential applications include automatic indexing of recorded speech corpora by speaker/language tags and automatic or semiautomatic selection of speaker specific speech utterances for speaker recognition adaptation.
READ LESS

Summary

Classical speaker and language recognition techniques can be applied to the classification of unknown utterances by computing the likelihoods of the utterances given a set of well trained target models. This paper addresses the problem of grouping unknown utterances when no information is available regarding the speaker or language classes...

READ MORE

Improving accent identification through knowledge of English syllable structure

Published in:
5th Int. Conf. on Spoken Language Processing, ICSLP, 30 November - 4 December 1998.

Summary

This paper studies the structure of foreign-accented read English speech. A system for accent identification is constructed by combining linguistic theory with statistical analysis. Results demonstrate that the linguistic theory is reflected in real speech data and its application improves accent identification. The work discussed here combines and applies previous research in language identification based on phonemic features [1] with the analysis of the structure and function of the English language [2]. Working with phonemically hand-labelled data in three accented speaker groups of Australian English (Vietnamese, Lebanese, and native speakers), we show that accents of foreign speakers can be predicted and manifest themselves differently as a function of their position within the syllable. When applying this knowledge, English vs. Vietnamese accent identification improves from 86% to 93% (English vs. Lebanese improves from 78% to 84%). The described algorithm is also applied to automatically aligned phonemes.
READ LESS

Summary

This paper studies the structure of foreign-accented read English speech. A system for accent identification is constructed by combining linguistic theory with statistical analysis. Results demonstrate that the linguistic theory is reflected in real speech data and its application improves accent identification. The work discussed here combines and applies previous...

READ MORE

Sheep, goats, lambs and wolves: a statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation

Summary

Performance variability in speech and speaker recognition systems can be attributed to many factors. One major factor, which is often acknowledged but seldom analyzed, is inherent differences in the recognizability of different speakers. In speaker recognition systems such differences are characterized by the use of animal names for different types of speakers, including sheep, goats, lambs and wolves, depending on their behavior with respect to automatic recognition systems. In this paper we propose statistical tests for the existence of these animals and apply these tests to hunt for such animals using results from the 1998 NIST speaker recognition evaluation.
READ LESS

Summary

Performance variability in speech and speaker recognition systems can be attributed to many factors. One major factor, which is often acknowledged but seldom analyzed, is inherent differences in the recognizability of different speakers. In speaker recognition systems such differences are characterized by the use of animal names for different types...

READ MORE

Vulnerabilities of reliable multicast protocols

Published in:
IEEE MILCOM '98, Vol. 3, 21 October 1998, pp. 934-938.

Summary

We examine vulnerabilities of several reliable multicast protocols. The various mechanisms employed by these protocols to provide reliability can present vulnerabilities. We show how some of these vulnerabilities can be exploited in denial-of-service attacks, and discuss potential mechanisms for withstanding such attacks.
READ LESS

Summary

We examine vulnerabilities of several reliable multicast protocols. The various mechanisms employed by these protocols to provide reliability can present vulnerabilities. We show how some of these vulnerabilities can be exploited in denial-of-service attacks, and discuss potential mechanisms for withstanding such attacks.

READ MORE

AM-FM separation using shunting neural networks

Published in:
Proc. of the IEEE-SP Int. Symp. on Time-Frequency and Time-Scale Analysis, 6-9 October 1998, pp. 553-556.

Summary

We describe an approach to estimating the amplitude-modulated (AM) and frequency-modulated (FM) components of a signal. Any signal can be written as the product of an AM component and an FM component. There have been several approaches to solving the AM-FM estimation problem described in the literature. Popular methods include the use of time-frequency analysis, the Hilbert transform, and the Teager energy operator. We focus on an approach based on FM-to-AM transduction that is motivated by auditory physiology. We show that the transduction approach can be realized as a bank of bandpass filters followed by envelope detectors and shunting neural networks, and the resulting dynamical system is capable of robust AM-FM estimation in noisy environments and over a broad range of filter bandwidths and locations. Our model is consistent with recent psychophysical experiments that indicate AM and FM components of acoustic signals may be transformed into a common neural code in the brain stem via FM-to-AM transduction. Applications of our model include signal recognition and multi-component decomposition.
READ LESS

Summary

We describe an approach to estimating the amplitude-modulated (AM) and frequency-modulated (FM) components of a signal. Any signal can be written as the product of an AM component and an FM component. There have been several approaches to solving the AM-FM estimation problem described in the literature. Popular methods include...

READ MORE