Publications

Refine Results

(Filters Applied) Clear All

Auditory signal processing as a basis for speaker recognition

Published in:
IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 19-22 October, 2003, pp. 111-114.

Summary

In this paper, we exploit models of auditory signal processing at different levels along the auditory pathway for use in speaker recognition. A low-level nonlinear model, at the cochlea, provides accentuated signal dynamics, while a a high-level model, at the inferior colliculus, provides frequency analysis of modulation components that reveals additional temporal structure. A variety of features are derived from the low-level dynamic and high-level modulation signals. Fusion of likelihood scores from feature sets at different auditory levels with scores from standard mel-cepstral features provides an encouraging speaker recognition performance gain over use of the mel-cepstrum alone with corpora from land-line and cellular telephone communications.
READ LESS

Summary

In this paper, we exploit models of auditory signal processing at different levels along the auditory pathway for use in speaker recognition. A low-level nonlinear model, at the cochlea, provides accentuated signal dynamics, while a a high-level model, at the inferior colliculus, provides frequency analysis of modulation components that reveals...

READ MORE

System adaptation as a trust response in tactical ad hoc networks

Published in:
IEEE MILCOM 2003, 13-16 October 2003, pp. 209-214.

Summary

While mobile ad hoc networks offer significant improvements for tactical communications, these networks are vulnerable to node capture and other forms of cyberattack. In this paper we evaluated via simulation of the impact of a passive attacker, a denial of service (DoS) attack, and a data swallowing attack. We compared two different adaptive network responses to these attacks against a baseline of no response for 10 and 20 node networks. Each response reflects a level of trust assigned to the captured node. Our simulation used a responsive variant of the ad hoc on-demand distance vector (AODV) routing algorithm and focused on the response performance. We assumed that the attacks had been detected and reported. We compared performance tradeoffs of attack, response, and network size by focusing on metrics such as "goodput", i.e., percentage of messages that reach the intended destination untainted by the captured node. We showed, for example, that under general conditions a DoS attack response should minimize attacker impact while a response to a data swallowing attack should minimize risk to the system and trust of the compromised node with most of the response benefit. We show that the best network response depends on the mission goals, network configuration, density, network performance, attacker skill, and degree of compromise.
READ LESS

Summary

While mobile ad hoc networks offer significant improvements for tactical communications, these networks are vulnerable to node capture and other forms of cyberattack. In this paper we evaluated via simulation of the impact of a passive attacker, a denial of service (DoS) attack, and a data swallowing attack. We compared...

READ MORE

Utilizing local terrain to determine targeted weather observation locations

Published in:
Conf. on Battlespace Atmospheric and Cloud Impacts on Military Operations, BACIMO, 9-11 September 2003.

Summary

Many of the recent conflicts where the United States (US) military forces have been deployed are regions that contain complex terrain (i.e. Korea, Kosovo, Afghanistan, and northern Iraq). Accurate weather forecasts are critical to the success of operations in these regions and are typically supplied by numerical weather prediction (NWP) models like the US Navy NOGAPS, CAOMPS, and US Airforce MM5. Unfortunately the weather observations required to generate accurate initial conditions needed by these models are often not available. In these cases it is desirable to deploy additional weather sensors. The question then becomes: Where should the military planners deploy their sensor resources? This study demonstrates that knowledge of just the terrain within the model domain may be a useful factor for military planners to consider. For NWP, model forecast errors in mountainous areas are typically thought to be due to poorly resolved terrain, or model physics not suited for use in a complex terrain environment. Recent advances in computational technology are making it possible to run these models at resolutions where many of the significant terrain features are now being well resolved. While terrain can be accurately specified, often the gradients in wind, temperature, and moisture fields associated with the higher resolution terrain are not. As a result, initial conditions in complex terrain environments are not be adequately specified. Since not all initial condition errors contribute significantly to model forecast error, knowledge of terrain induced NWP model forecast sensitivity may be important when developing and deploying a weather sensor network to support a regional scale NWP model. The terrain induced model sensitivity can provide an indication of which variables in the initial conditions have a significant influence on the forecast and where initial conditions need to be most accurate to minimize model forecast error. A sensor network can then be designed to minimize these errors by deploying critical sensors in sensitive locations, thereby reducing relevant initial condition error without the costly deployment of a high-density sensor network. This is similar to the targeted observation technique first suggested by Emanuel et al. (1995), except that in this example the targeted observations would be designed to reduce initial condition error associated with poorly resolved atmospheric features created by the terrain. This paper is organized as follows. Section 2 contains a brief description of the data collection effort designed to support this study. The experimental design and the specifics of the case used in this study are described in section 3. The analysis and results from both the forward and adjoint simulations are presented in section 4. Section 5 contains a summary of the results, and a brief discussion of their implications.
READ LESS

Summary

Many of the recent conflicts where the United States (US) military forces have been deployed are regions that contain complex terrain (i.e. Korea, Kosovo, Afghanistan, and northern Iraq). Accurate weather forecasts are critical to the success of operations in these regions and are typically supplied by numerical weather prediction (NWP)...

READ MORE

Acoustic, phonetic, and discriminative approaches to automatic language identification

Summary

Formal evaluations conducted by NIST in 1996 demonstrated that systems that used parallel banks of tokenizer-dependent language models produced the best language identification performance. Since that time, other approaches to language identification have been developed that match or surpass the performance of phone-based systems. This paper describes and evaluates three techniques that have been applied to the language identification problem: phone recognition, Gaussian mixture modeling, and support vector machine classification. A recognizer that fuses the scores of three systems that employ these techniques produces a 2.7% equal error rate (EER) on the 1996 NIST evaluation set and a 2.8% EER on the NIST 2003 primary condition evaluation set. An approach to dealing with the problem of out-of-set data is also discussed.
READ LESS

Summary

Formal evaluations conducted by NIST in 1996 demonstrated that systems that used parallel banks of tokenizer-dependent language models produced the best language identification performance. Since that time, other approaches to language identification have been developed that match or surpass the performance of phone-based systems. This paper describes and evaluates three...

READ MORE

Fusing high- and low-level features for speaker recognition

Summary

The area of automatic speaker recognition has been dominated by systems using only short-term, low-level acoustic information, such as cepstral features. While these systems have produced low error rates, they ignore higher levels of information beyond low-level acoustics that convey speaker information. Recently published works have demonstrated that such high-level information can be used successfully in automatic speaker recognition systems by improving accuracy and potentially increasing robustness. Wide ranging high-level-feature-based approaches using pronunciation models, prosodic dynamics, pitch gestures, phone streams, and conversational interactions were explored and developed under the SuperSID project at the 2002 JHU CLSP Summer Workshop (WS2002): http://www.clsp.jhu.edu/ws2002/groups/supersid/. In this paper, we show how these novel features and classifiers provide complementary information and can be fused together to drive down the equal error rate on the 2001 NIST Extended Data Task to 0.2%-a 71% relative reduction in error over the previous state of the art.
READ LESS

Summary

The area of automatic speaker recognition has been dominated by systems using only short-term, low-level acoustic information, such as cepstral features. While these systems have produced low error rates, they ignore higher levels of information beyond low-level acoustics that convey speaker information. Recently published works have demonstrated that such high-level...

READ MORE

Person authentication by voice: a need for caution

Published in:
8th European Conf. on Speech Communication and Technology, EUROSPEECH, 1-4 September 2003.

Summary

Because of recent events and as members of the scientific community working in the field of speech processing, we feel compelled to publicize our views concerning the possibility of identifying or authenticating a person from his or her voice. The need for a clear and common message was indeed shown by the diversity of information that has been circulating on this matter in the media and general public over the past year. In a press release initiated by the AFCP and further elaborated in collaboration with the SpLC ISCA-SIG, the two groups herein discuss and present a summary of the current state of scientific knowledge and technological development in the field of speaker recognition, in accessible wording for nonspecialists. Our main conclusion is that, despite the existence of technological solutions to some constrained applications, at the present time, there is no scientific process that enables one to uniquely characterize a person's voice or to identify with absolute certainty an individual from his or her voice.
READ LESS

Summary

Because of recent events and as members of the scientific community working in the field of speech processing, we feel compelled to publicize our views concerning the possibility of identifying or authenticating a person from his or her voice. The need for a clear and common message was indeed shown...

READ MORE

Integration of speaker recognition into conversational spoken dialogue systems

Summary

In this paper we examine the integration of speaker identification/verification technology into two dialogue systems developed at MIT: the Mercury air travel reservation system and the Orion task delegation system. These systems both utilize information collected from registered users that is useful in personalizing the system to specific users and that must be securely protected from imposters. Two speaker recognition systems, the MIT Lincoln Laboratory text independent GMM based system and the MIT Laboratory for Computer Science text-constrained speaker-adaptive ASR-based system, are evaluated and compared within the context of these conversational systems.
READ LESS

Summary

In this paper we examine the integration of speaker identification/verification technology into two dialogue systems developed at MIT: the Mercury air travel reservation system and the Orion task delegation system. These systems both utilize information collected from registered users that is useful in personalizing the system to specific users and...

READ MORE

Model compression for GMM based speaker recognition systems

Published in:
EUROSPEECH 2003, 1-4 September 2003.

Summary

For large-scale deployments of speaker verification systems models size can be an important issue for not only minimizing storage requirements but also reducing transfer time of models over networks. Model size is also critical for deployments to small, portable devices. In this paper we present a new model compression technique for Gaussian Mixture Model (GMM) based speaker recognition systems. For GMM systems using adaptation from a background model, the compression technique exploits the fact that speaker models are adapted from a single speaker-independent model and not all parameters need to be stored. We present results on the 2002 NIST speaker recognition evaluation cellular telephone corpus and show that the compression technique provides a good tradeoff of compression ratio to performance loss. We are able to achieve a 56:1 compression (624KB -> 11KB) with only a 3.2% relative increase in EER (9.1% -> 9.4%).
READ LESS

Summary

For large-scale deployments of speaker verification systems models size can be an important issue for not only minimizing storage requirements but also reducing transfer time of models over networks. Model size is also critical for deployments to small, portable devices. In this paper we present a new model compression technique...

READ MORE

Measuring the readability of automatic speech-to-text transcripts

Summary

This paper reports initial results from a novel psycholinguistic study that measures the readability of several types of speech transcripts. We define a four-part figure of merit to measure readability: accuracy of answers to comprehension questions, reaction-time for passage reading, reaction-time for question answering and a subjective rating of passage difficulty. We present results from an experiment with 28 test subjects reading transcripts in four experimental conditions.
READ LESS

Summary

This paper reports initial results from a novel psycholinguistic study that measures the readability of several types of speech transcripts. We define a four-part figure of merit to measure readability: accuracy of answers to comprehension questions, reaction-time for passage reading, reaction-time for question answering and a subjective rating of passage...

READ MORE

An examination of wind shear alert integration at the Dallas/Ft. Worth International Airport (DFW)

Published in:
MIT Lincoln Laboratory Report ATC-309

Summary

The Dallas / Fort Worth International Airport (DFW) is one of the four demonstration system sites for the Integrated Terminal Weather System (ITWS). One of the primary benefits of the ITWS is a suite of algorithms that utilize data from the Terminal Doppler Weather Radar (TDWR) to generate wind shear alerts. DFW also benefits from a Network Expansion of the Low-Level Wind Shear Advisory System (LLWAS-NE). The LLWAS-NE generated alerts are integrated with the radar-based alerts in ITWS to provide Air Traffic Control (ATC) with a comprehensive set of alert information. This study examines the integrated DFW wind shear alerts with emphasis on circumstances in which the detection performance of the TDWR-based wind shear algorithms was poor. Specific detection problems occur in the following situations: when wind shear events over the airport are aligned along a radial to the TDWR, during "non-traditional" wind shear events, when severe signal attenuation occurs during heavy precipitation over the TDWR radar site, and because of excessive TDWR clutter-residue editing over the airport. In all of the cases examined, the LLWAS-NE issued alerts to ATC that would have otherwise gone unreported.
READ LESS

Summary

The Dallas / Fort Worth International Airport (DFW) is one of the four demonstration system sites for the Integrated Terminal Weather System (ITWS). One of the primary benefits of the ITWS is a suite of algorithms that utilize data from the Terminal Doppler Weather Radar (TDWR) to generate wind shear...

READ MORE