Using standard speech corpora for development and evaluation has proven to be very valuable in promoting progress in speech and speaker recognition research. In this paper, we present an overview of current publicly available corpora intended for speaker recognition research and evaluation. We outline the corpora's salient features with respect to their suitability for conducting speaker recognition experiments and evaluations. Links to these corpora, and to new corpora, will appear on the web http://www.apl.jhu.edu/Classes/Notes/Campbell/SpkrRec/. We hope to increase the awareness and use of these standard corpora and corresponding evaluation procedures throughout the speaker recognition community.

READ LESS

Summary

Corpora for the evaluation of speaker recognition systems

Implications of glottal source for speaker and dialect identification

March 15, 1999

Conference Paper

Author:

Lisa R. Yanguas

…

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. II, 15-19 March 1999, pp. 813-816.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In this paper we explore the importance of speaker specific information carried in the glottal source. We time align utterances of two speakers speaking the same sentence from the TIMIT database of American English. We then extract the glottal flow derivative from each speaker and interchange them. Through time alignment and this glottal flow transformation, we can make a speaker of a northern dialect sound more like his southern counterpart. We also time align the utterances of two speakers of Spanish dialects speaking the same sentence and then perform the glottal waveform transformation. Through these processes a Peruvian speaker is made to sound more Cuban-like. From these experiments we conclude that significant speaker and dialect specific information, such as noise, breathiness or aspiration, and vocalization, is carried in the glottal signal.

READ LESS

Summary

Implications of glottal source for speaker and dialect identification

'Perfect reconstruction' time-scaling filterbanks

March 15, 1999

Conference Paper

Author:

Thomas F. Quatieri

…

Thomas E. Hanna

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. III, 15-19 March 1999, pp. 945-948.

Topic:

speech enhancement

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

A filterbank-based method of time-scale modification is analyzed for elemental signals including clicks, sines, and AM-FM sines. It is shown that with the use of some basic properties of linear systems, as well as FM-to-AM filter transduction, "perfect reconstruction" time-scaling filterbanks can be constructed for these elemental signal classes under certain conditions on the filterbank. Conditions for perfect reconstruction time-scaling are shown analytically for the uniform filterbank case, while empirically for the nonuniform constant-Q (gammatone) case. Extension of perfect reconstruction to multi-component signals is shown to require both filterbank and signal-dependent conditions and indicates the need for a more complete theory of "perfect reconstruction" time-scaling filterbanks.

READ LESS

Summary

'Perfect reconstruction' time-scaling filterbanks

Evaluating intrusion detection systems without attacking your friends: The 1998 DARPA intrusion detection evaluation

February 9, 1999

Conference Paper

Author:

Robert K. Cunningham

…

Published in:

Third Conf. and Workshop on Intrusion Detection and Response, 9-13 February 1999.

Topic:

intrusion detection & monitoring

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Intrusion detection systems monitor the use of computers and the network over which they communicate, searching for unauthorized use, anomalous behavior, and attempts to deny users, machines or portions of the network access to services. Potential users of such systems need information that is rarely found in marketing literature, including how well a given system finds intruders and how much work is required to use and maintain that system in a fully functioning network with significant daily traffic. Researchers and developers can specify which prototypical attacks can be found by their systems, but without access to the normal traffic generated by day-to-day work, they can not describe how well their systems detect real attacks while passing background traffic and avoiding false alarms. This information is critical: every declared intrusion requires time to review, regardless of whether it is a correct detection for which a real intrusion occurred, or whether it is merely a false alarm. To meet the needs of researchers, developers and ultimately system administrators we have developed the first objective, repeatable, and realistic measurement of intrusion detection system performance. Network traffic on an Air Force base was measured, characterized and subsequently simulated on an isolated network on which a few computers were used to simulate thousands of different Unix systems and hundreds of different users during periods of normal network traffic. Simulated attackers mapped the network, issued denial of service attacks, illegally gained access to systems, and obtained super-user privileges. Attack types ranged from old, well-known attacks, to new, stealthy attacks. Seven weeks of training data and two weeks of testing data were generated, filling more than 30 CD-ROMs. Methods and results from the 1998 DARPA intrusion detection evaluation will be highlighted, and preliminary plans for the 1999 evaluation will be presented.

READ LESS

Summary

Evaluating intrusion detection systems without attacking your friends: The 1998 DARPA intrusion detection evaluation

Machine-assisted language translation for U.S./RoK Combined Forces Command

January 1, 1999

Journal Article

Author:

Young-Suk Lee

…

Published in:

Army RD&A Mag., November-December 1999, pp. 38-41.

Topic:

machine translation

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

The U.S. military must operate worldwide in a variety of international environments where many different languages are used. There is a critical need for translation, and there is a shortage of translators who can interpret military terminology specifically. One coalition environment where the need is particularly strong is in the Republic of Korea (RoK) where, although U.S. and RoK military personnel have been working together for many years, the language barrier still significantly reduces the speed and effectiveness of coalition command and control. This article describes the Massachusetts Institute of Technology (MIT) Lincoln Laboratory's work on automated, two-way, English/Korean translation for enhanced coalition communications. Our ultimate goal is to enhance multilingual communications by producing accurate translations across a number of languages. Therefore, we have chosen an interlingua-based approach to machine translation that is readily adaptable to multiple languages. In this approach, a natural language understanding system transforms the input into an intermediate meaning representation called Semantic Frame, which serves as a basis for generating output in multiple languages. To produce useful and effective translation systems in the short term, we have focused on limited military task domains and have configured our system as a machine-assisted translation system. This allows the human translator to confirm or edit the machine translation.

READ LESS

Summary

Machine-assisted language translation for U.S./RoK Combined Forces Command

Blind clustering of speech utterances based on speaker and language characteristics

November 30, 1998

Conference Paper

Author:

Douglas A. Reynolds

…

Published in:

5th Int. Conf. Spoken Language Processing (ICSLP), 30 November - 4 December 1998.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Classical speaker and language recognition techniques can be applied to the classification of unknown utterances by computing the likelihoods of the utterances given a set of well trained target models. This paper addresses the problem of grouping unknown utterances when no information is available regarding the speaker or language classes or even the total number of classes. Approaches to blind message clustering are presented based on conventional hierarchical clustering techniques and an integrated cluster generation and selection method called the d* algorithm. Results are presented using message sets derived from the Switchboard and Callfriend corpora. Potential applications include automatic indexing of recorded speech corpora by speaker/language tags and automatic or semiautomatic selection of speaker specific speech utterances for speaker recognition adaptation.

READ LESS

Summary

Blind clustering of speech utterances based on speaker and language characteristics

Improving accent identification through knowledge of English syllable structure

November 30, 1998

Conference Paper

Author:

Kay M. Berkling

…

Published in:

5th Int. Conf. on Spoken Language Processing, ICSLP, 30 November - 4 December 1998.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper studies the structure of foreign-accented read English speech. A system for accent identification is constructed by combining linguistic theory with statistical analysis. Results demonstrate that the linguistic theory is reflected in real speech data and its application improves accent identification. The work discussed here combines and applies previous research in language identification based on phonemic features [1] with the analysis of the structure and function of the English language [2]. Working with phonemically hand-labelled data in three accented speaker groups of Australian English (Vietnamese, Lebanese, and native speakers), we show that accents of foreign speakers can be predicted and manifest themselves differently as a function of their position within the syllable. When applying this knowledge, English vs. Vietnamese accent identification improves from 86% to 93% (English vs. Lebanese improves from 78% to 84%). The described algorithm is also applied to automatically aligned phonemes.

READ LESS

Summary

Improving accent identification through knowledge of English syllable structure

Sheep, goats, lambs and wolves: a statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation

November 1, 1998

Conference Paper

Author:

George R. Doddington

…

Published in:

NIST 1998 Speaker Recognition Evaluation, November 1998.

Topic:

speaker recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

Performance variability in speech and speaker recognition systems can be attributed to many factors. One major factor, which is often acknowledged but seldom analyzed, is inherent differences in the recognizability of different speakers. In speaker recognition systems such differences are characterized by the use of animal names for different types of speakers, including sheep, goats, lambs and wolves, depending on their behavior with respect to automatic recognition systems. In this paper we propose statistical tests for the existence of these animals and apply these tests to hunt for such animals using results from the 1998 NIST speaker recognition evaluation.

READ LESS

Summary

Sheep, goats, lambs and wolves: a statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation

Vulnerabilities of reliable multicast protocols

October 21, 1998

Conference Paper

Author:

Thomas M. Parks

…

Published in:

IEEE MILCOM '98, Vol. 3, 21 October 1998, pp. 934-938.

Topic:

cryptography

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

We examine vulnerabilities of several reliable multicast protocols. The various mechanisms employed by these protocols to provide reliability can present vulnerabilities. We show how some of these vulnerabilities can be exploited in denial-of-service attacks, and discuss potential mechanisms for withstanding such attacks.

READ LESS

Summary

Vulnerabilities of reliable multicast protocols

AM-FM separation using shunting neural networks

October 6, 1998

Conference Paper

Author:

Robert A. Baxter

…

Thomas F. Quatieri

Published in:

Proc. of the IEEE-SP Int. Symp. on Time-Frequency and Time-Scale Analysis, 6-9 October 1998, pp. 553-556.

Topic:

signal processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

We describe an approach to estimating the amplitude-modulated (AM) and frequency-modulated (FM) components of a signal. Any signal can be written as the product of an AM component and an FM component. There have been several approaches to solving the AM-FM estimation problem described in the literature. Popular methods include the use of time-frequency analysis, the Hilbert transform, and the Teager energy operator. We focus on an approach based on FM-to-AM transduction that is motivated by auditory physiology. We show that the transduction approach can be realized as a bank of bandpass filters followed by envelope detectors and shunting neural networks, and the resulting dynamical system is capable of robust AM-FM estimation in noisy environments and over a broad range of filter bandwidths and locations. Our model is consistent with recent psychophysical experiments that indicate AM and FM components of acoustic signals may be transformed into a common neural code in the brain stem via FM-to-AM transduction. Applications of our model include signal recognition and multi-component decomposition.

READ LESS

Summary

AM-FM separation using shunting neural networks

Publications

Refine Results

Corpora for the evaluation of speaker recognition systems

Summary

Summary

Implications of glottal source for speaker and dialect identification

Summary

Summary

'Perfect reconstruction' time-scaling filterbanks

Summary

Summary

Evaluating intrusion detection systems without attacking your friends: The 1998 DARPA intrusion detection evaluation

Summary

Summary

Machine-assisted language translation for U.S./RoK Combined Forces Command

Summary

Summary

Blind clustering of speech utterances based on speaker and language characteristics

Summary

Summary

Improving accent identification through knowledge of English syllable structure

Summary

Summary

Sheep, goats, lambs and wolves: a statistical analysis of speaker performance in the NIST 1998 speaker recognition evaluation

Summary

Summary

Vulnerabilities of reliable multicast protocols

Summary

Summary

AM-FM separation using shunting neural networks

Summary

Summary

Showing Results