Publications

Refine Results

(Filters Applied) Clear All

Channel robust speaker verification via feature mapping

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. II, 6-10 April 2003, pp. II-53 - II-56.

Summary

In speaker recognition applications, channel variability is a major cause of errors. Techniques in the feature, model and score domains have been applied to mitigate channel effects. In this paper we present a new feature mapping technique that maps feature vectors into a channel independent space. The feature mapping learns mapping parameters from a set of channel-dependent models derived for a channel-dependent models derived from a channel-independent model via MAP adaptation. The technique is developed primarily for speaker verification, but can be applied for feature normalization in speech recognition applications. Results are presented on NIST landline and cellular telephone speech corpora where it is shown that feature mapping provides significant performance improvements over baseline systems and similar performance to Hnorm and Speaker-Model-Synthesis (SMS).
READ LESS

Summary

In speaker recognition applications, channel variability is a major cause of errors. Techniques in the feature, model and score domains have been applied to mitigate channel effects. In this paper we present a new feature mapping technique that maps feature vectors into a channel independent space. The feature mapping learns...

READ MORE

Conditional pronunciation modeling in speaker detection

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 6-10 April 2003.

Summary

In this paper, we present a conditional pronunciation modeling method for the speaker detection task that does not rely on acoustic vectors. Aiming at exploiting higher-level information carried by the speech signal, it uses time-aligned streams of phones and phonemes to model a speaker's specific Pronunciation. Our system uses phonemes drawn from a lexicon of pronunciations of words recognized by an automatic speech recognition system to generate the phoneme stream and an open-loop phone recognizer to generate a phone stream. The phoneme and phone streams are aligned at the frame level and conditional probabilities of a phone, given a phoneme, are estimated using co-occurrence counts. A likelihood detector is then applied to these probabilities. Performance is measured using the NIST Extended Data paradigm and the Switchboard-I corpus. Using 8 training conversations for enrollment, a 2.1% equal error rate was achieved. Extensions and alternatives, as well as fusion experiments, are presented and discussed.
READ LESS

Summary

In this paper, we present a conditional pronunciation modeling method for the speaker detection task that does not rely on acoustic vectors. Aiming at exploiting higher-level information carried by the speech signal, it uses time-aligned streams of phones and phonemes to model a speaker's specific Pronunciation. Our system uses phonemes...

READ MORE

Phonetic speaker recognition using maximum-likelihood binary-decision tree models

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, Vol. 4, 6-10 April 2003.

Summary

Recent work in phonetic speaker recognition has shown that modeling phone sequences using n-grams is a viable and effective approach to speaker recognition, primarily aiming at capturing speaker-dependent pronunciation and also word usage. This paper describes a method involving binary-tree-structured statistical models for extending the phonetic context beyond that of standard n-grams (particularly bigrams) by exploiting statistical dependencies within a longer sequence window without exponentially increasing the model complexity, as is the case with n-grams. Two ways of dealing with data sparsity are also studied, namely, model adaptation and a recursive bottom-up smoothing of symbol distributions. Results obtained under a variety of experimental conditions using the NIST 2001 Speaker Recognition Extended Data Task indicate consistent improvements in equal-error rate performance as compared to standard bigram models. The described approach confirms the relevance of long phonetic context in phonetic speaker recognition and represents an intermediate stage between short phone context and word-level modeling without the need for any lexical knowledge, which suggests its language independence.
READ LESS

Summary

Recent work in phonetic speaker recognition has shown that modeling phone sequences using n-grams is a viable and effective approach to speaker recognition, primarily aiming at capturing speaker-dependent pronunciation and also word usage. This paper describes a method involving binary-tree-structured statistical models for extending the phonetic context beyond that of...

READ MORE

The SuperSID project : exploiting high-level information for high-accuracy speaker recognition

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 4, 6-10 April 2003, pp. IV-784 - IV-787.

Summary

The area of automatic speaker recognition has been dominated by systems using only short-term, low-level acoustic information, such as cepstral features. While these systems have indeed produced very low error rates, they ignore other levels of information beyond low-level acoustics that convey speaker information. Recently published work has shown examples that such high-level information can be used successfully in automatic speaker recognition systems and has the potential to improve accuracy and add robustness. For the 2002 JHU CLSP summer workshop, the SuperSID project was undertaken to exploit these high-level information sources and dramatically increase speaker recognition accuracy on a defined NIST evaluation corpus and task. This paper provides an overview of the structures, data, task, tools, and accomplishments of this project. Wide ranging approaches using pronunciation models, prosodic dynamics, pitch and duration features, phone streams, and conversational interactions were explored and developed. In this paper we show how these novel features and classifiers indeed provide complementary information and can be fused together to drive down the equal error rate on the 2001 NIS extended data task to 0.2% - a 71% relative reduction in error over the previous state of the art.
READ LESS

Summary

The area of automatic speaker recognition has been dominated by systems using only short-term, low-level acoustic information, such as cepstral features. While these systems have indeed produced very low error rates, they ignore other levels of information beyond low-level acoustics that convey speaker information. Recently published work has shown examples...

READ MORE

Using prosodic and conversational features for high-performance speaker recognition : report from JHU WS'02

Published in:
Proc. IEEE Int. Conf. on Acoustics, speech, and Signal Processing, ICASSP, Vol. IV, 6-10 April 2003, pp. IV-792 - IV-795.

Summary

While there has been a long tradition of research seeking to use prosodic features, especially pitch, in speaker recognition systems, results have generally been disappointing when such features are used in isolation and only modest improvements have been set when used in conjunction with traditional cepstral GMM systems. In contrast, we report here on work from the JHU 2002 Summer Workshop exploring a range of prosodic features, using as testbed NIST's 2001 Extended Data task. We examined a variety of modeling techniques, such as n-gram models of turn-level prosodic features and simple vectors of summary statistics per conversation side scored by kth nearest-neighbor classifiers. We found that purely prosodic models were able to achieve equal error rates of under 10%, and yielded significant gains when combined with more traditional systems. We also report on exploratory work on "conversational" features, capturing properties of the interaction across conversion sides, such as turn-taking patterns.
READ LESS

Summary

While there has been a long tradition of research seeking to use prosodic features, especially pitch, in speaker recognition systems, results have generally been disappointing when such features are used in isolation and only modest improvements have been set when used in conjunction with traditional cepstral GMM systems. In contrast...

READ MORE

Evaluation of TDWR range-velocity ambiguity mitigation techniques

Author:
Published in:
MIT Lincoln Laboratory Report ATC-310

Summary

Range and velocity ambiguities pose significant data quality challenges for the Terminal Doppler Weather Radar (TDWR). For typical pulse repetition frequencies (PRFs) of 1-2 kHz, the radar is subject to both range-ambiguous precipitation returns and velocity aliasing. Experience shows that these are a major contributor to failures of the system's wind shear detection algorithms. Here we evaluate the degree of mitigation offered by existing phase diversity methods to these problems. Using optimized processing techniques, we analyze the performance of two particular phase codes that are best suited for application to TDWRs- random and SZ(8/64) [Sachidananda and Zrnic', [1999]- in the protection of weak-trip power, velocity, and spectral width estimates. Results from both simulated and real weather data indicate that the SZ(8/64) code generally outperforms the random code, except for protection of 1st trip from 5th trip interference. However, the SZ code estimates require a priori knowledge of out-of-trip spectral widths for censoring. This information cannot be provided adequately by a separate scan with a Pulse Repetition Frequency (PRF) low enough to unambiguously cover the entire range of detectable weather, because then the upper limit of measurable spectral width is only about 2 m/s . For this reason we conclude that SZ phase codes are not appropriate for TDWR use. For velocity ambiguity resolution, the random phase code could be transmitted at two PRFs on alternating dwells. Assuming the velocity changes little between two consecutive dwells, a Chinese remainder type of approach can be used to dealias the velocities. Strong ground clutter at close range, however, disables this scheme for gates at the beginning of the 2nd trip of the higher PRF. We offer an alternative scheme for range-velocity ambiguity mitigation: Multistaggered Pulse Processing (MSPP). Yielding excellent velocity dealiasing capabilities, the MSPP method should also provide protection from patchy, small-scale out-of-trip weather. To obtain maximum performance in both range and velocity dealiasing, we suggest that information from the initial low-PRF scan be used to decide the best waveform to transmit in the following scan-random phase code with alternating-dwell PRFs or MSPP. Such an adaptive approach presages future developments in weather radar, for example electronically scanned arrays allow selective probing of relevant weather events.
READ LESS

Summary

Range and velocity ambiguities pose significant data quality challenges for the Terminal Doppler Weather Radar (TDWR). For typical pulse repetition frequencies (PRFs) of 1-2 kHz, the radar is subject to both range-ambiguous precipitation returns and velocity aliasing. Experience shows that these are a major contributor to failures of the system's...

READ MORE

A multi-threaded fast convolver for dynamically parallel image filtering

Author:
Published in:
J. Parallel Distrib. Comput, Vol. 63, No. 3, March 2003, pp. 360-372.

Summary

2D convolution is a staple of digital image processing. The advent of large format imagers makes it possible to literally ''pave'' with silicon the focal plane of an optical sensor, which results in very large images that can require a significant amount computation to process. Filtering of large images via 2D convolutions is often complicated by a variety of effects (e.g., non-uniformities found in wide field of view instruments) which must be compensated for in the filtering process by changing the filter across the image. This paper describes a fast (FFT based) method for convolving images with slowly varying filters. A parallel version of the method is implemented using a multi-threaded approach, which allows more efficient load balancing and a simpler software architecture. The method has been implemented within a high level interpreted language (IDL), while also exploiting open standards vector libraries (VSIPL) and open standards parallel directives (OpenMP). The parallel approach and software architecture are generally applicable to a variety of algorithms and has the advantage of enabling users to obtain the convenience of an easy operating environment while also delivering high performance using a fully portable code.
READ LESS

Summary

2D convolution is a staple of digital image processing. The advent of large format imagers makes it possible to literally ''pave'' with silicon the focal plane of an optical sensor, which results in very large images that can require a significant amount computation to process. Filtering of large images via...

READ MORE

Observations of non-traditional wind shear events at the Dallas/Fort Worth International Airport

Published in:
MIT Lincoln Laboratory Report ATC-308
Topic:

Summary

During the past 20 years there has been great success in understanding and detecting microbursts. These "traditional" wind shear events are most prominent in the summer and are characterized by a two-dimensional, divergent outflow associated with precipitation loading from a thunderstorm downdraft or evaporative cooling from high-based rain clouds. Analysis of wind shear loss alerts at the Dallas/Fort Worth International Airport (DFW) from August 1999 through July 2002 reveals that a significant number of the wind shear events were generated by "non-traditional" mechanisms. The "non-traditional" wind shear mechanisms, linear divergence, divergence behind gust fronts, and gravity waves, accounted for one half of the alert events in the period studied. Radar-based algorithms have shown considerable skill in detecting wind shear events. However, the algorithms were developed to identifl features common to the "traditional" events. If the algorithms were modified to detect "non-traditional" wind shear, the corresponding increase in false detections could be unacceptable. Therefore, in this report a new radar-based algorithm is proposed that detects linear divergence, divergence behind gust fronts, and gravity waves for output on the Integrated Terminal Weather System by identifying the radar signatures that are common to these features.
READ LESS

Summary

During the past 20 years there has been great success in understanding and detecting microbursts. These "traditional" wind shear events are most prominent in the summer and are characterized by a two-dimensional, divergent outflow associated with precipitation loading from a thunderstorm downdraft or evaporative cooling from high-based rain clouds. Analysis...

READ MORE

Multi-radar integration to improve en route aviation operations in severe convective weather

Published in:
19th Int. Conf. of Interactive Info Processing Systems in Meteorology, Oceanography and Hydrology, IIPS, 9-13 February 2003.

Summary

In this paper, we describe a major new FAA initiative, the Corridor Integrated Weather System (CIWS), to improve convective weather decision support for congested en route airspace and the terminals within that airspace through use of a large, heterogeneous network of weather sensing radars as well as many additional sensors. The objective of the CIWS concept exploration is to determine the improvements in NAS performance that could be achieved by providing en route controllers, en route and major terminal traffic flow managers, and airline dispatch with accurate, fully automated high update-rate information on current and near term (0-2 hour) storm locations, severity and vertical structure so that they can achieve more efficient tactical use of the airspace. These "tactical" traffic flow management products will complement the longer-term (2-6 hr) forecasts that are also needed for flight planning and strategic traffic flow management. Since balancing the en route traffic flows in the presence of time varying impacts on sector capacities by convective weather is essential if delays are to be reduced, an important element of the CIWS initiative is interfacing to and, in some cases providing, air traffic flow management (TFM) and airline dispatch decision support tools (DSTs)
READ LESS

Summary

In this paper, we describe a major new FAA initiative, the Corridor Integrated Weather System (CIWS), to improve convective weather decision support for congested en route airspace and the terminals within that airspace through use of a large, heterogeneous network of weather sensing radars as well as many additional sensors...

READ MORE

Automated forecasting of road conditions and recommended road treatments for winter storms

Published in:
19th Int. Conf. of Interactive Information Processing Systems for Meteorology, Oceanography and Hydrology, 9-13-February 2003.

Summary

Over the past decade there have been significant improvements in the availability, volume, and quality of the sensors and technology utilized to both capture the current state of the atmosphere and generate weather forecasts. New radar systems, automated surface observing systems, satellites and advanced numerical models have all contributed to these advances. However, the practical application of this new technology for transportation decision makers has been primarily limited to aviation. Surface transportation operators, like air traffic operators, require tailored weather products and alerts and guidance on recommended remedial action (e.g. applying chemicals or adjusting traffic flow). Recognizing this deficiency, the FHWA (Federal Highway Administration) has been working to define the weather related needs and operational requirements of the surface transportation community since October 1999. A primary focus of the FHWA baseline user needs and requirements has been winter road maintenance personnel (Pisano, 2001). A key finding of the requirements process was that state DOTs (Departments of Transportation) were in need of a weather forecast system that provided them both an integrated view of their weather, road and crew operations and advanced guidance on what course of action might be required to keep traffic flowing safely. As a result, the FHWA funded a small project (~$900K/year) involving a consortium of national laboratories to aggressively research and develop a prototype integrated Maintenance Decision Support System (MDSS). The prototype MDSS uses state-of-the-art weather and road condition forecast technology and integrates it with FHWA anti-icing guidelines to provide guidance to State DOTs in planning and managing winter storm events (Mahoney, 2003). The overall flow of the MDSS is shown in Figure 1. Basic meteorological data and advanced models are ingested into the Road Weather Forecast System (RWFS). The RWFS, developed by the National Center for Atmospheric Research (NCAR), dynamically weights the ingested model and station data to produce ambient weather forecasts (temperature, precipitation, wind, etc.). More details on the RWFS system can be found in (Myers, 2002). Next, the RCTM (Road Condition Treatment Module) ingests the forecasted weather conditions from the RWFS, calculates the predicted road conditions (snow depth, pavement temperature), Once a treatment plan has been determined, the recommendations are presented in map and table form through the MDSS display. The display also allows users to examine specific road and weather parameters, and to override the algorithm recommended treatments with a user-specified plan. A brief test of the MDSS system was performed in Minnesota during the spring of 2002. Further refinements were made and an initial version of the MDSS was released by the FHWA in September 2002. While this basic system is not yet complete, it does ingest all the necessary weather data and produce an integrated view of the road conditions and recommended treatments. This paper details the RCTM algorithm and its’ components, including the current and potential capabilities of the system.
READ LESS

Summary

Over the past decade there have been significant improvements in the availability, volume, and quality of the sensors and technology utilized to both capture the current state of the atmosphere and generate weather forecasts. New radar systems, automated surface observing systems, satellites and advanced numerical models have all contributed to...

READ MORE