Publications

Refine Results

(Filters Applied) Clear All

The cube coefficient subspace architecture for nonlinear digital predistortion

Published in:
42th Asilomar Conf. on Signals, Systems, and Computers, 27 October 2008, pp. 1857-1861.

Summary

In this paper, we present the cube coefficient subspace (CCS) architecture for linearizing power amplifiers (PAs), which divides the overparametrized Volterra kernel into small, computationally efficient subkernels spanning only the portions of the full multidimensional coefficient space with the greatest impact on linearization. Using measured results from a Q-Band solid state PA, we demonstrate that the CCS predistorter architecture achieves better linearization performance than state-of-the-art memory polynomials and generalized memory polynomials.
READ LESS

Summary

In this paper, we present the cube coefficient subspace (CCS) architecture for linearizing power amplifiers (PAs), which divides the overparametrized Volterra kernel into small, computationally efficient subkernels spanning only the portions of the full multidimensional coefficient space with the greatest impact on linearization. Using measured results from a Q-Band solid...

READ MORE

Language, dialect, and speaker recognition using Gaussian mixture models on the cell processor

Published in:
Twelfth Annual High Performance Embedded Computing Workshop, HPEC 2008, 23-25 September 2008.

Summary

Automatic recognition systems are commonly used in speech processing to classify observed utterances by the speaker's identity, dialect, and language. These problems often require high processing throughput, especially in applications involving multiple concurrent incoming speech streams, such as in datacenter-level processing. Recent advances in processor technology allow multiple processors to reside within the same chip, allowing high performance per watt. Currently the Cell Broadband Engine has the leading performance-per-watt specifications in its class. Each Cell processor consists of a PowerPC Processing Element (PPE) working together with eight Synergistic Processing Elements (SPE). The SPEs have 256KB of memory (local store), which is used for storing both program and data. This paper addresses the implementation of language, dialect, and speaker recognition on the Cell architecture. Classically, the problem of performing speech-domain recognition has been approached as embarrassingly parallel, with each utterance being processed in parallel to the others. As we will discuss, efficient processing on the Cell requires a different approach, whereby computation and data for each utterance are subdivided to be handled by separate processors. We present a computational model for automatic recognition on the Cell processor that takes advantage of its architecture, while mitigating its limitations. Using the proposed design, we predict a system able to concurrently score over 220 real-time speech streams on a single Cell.
READ LESS

Summary

Automatic recognition systems are commonly used in speech processing to classify observed utterances by the speaker's identity, dialect, and language. These problems often require high processing throughput, especially in applications involving multiple concurrent incoming speech streams, such as in datacenter-level processing. Recent advances in processor technology allow multiple processors to...

READ MORE

A comparison of subspace feature-domain methods for language recognition

Summary

Compensation of cepstral features for mismatch due to dissimilar train and test conditions has been critical for good performance in many speech applications. Mismatch is typically due to variability from changes in speaker, channel, gender, and environment. Common methods for compensation include RASTA, mean and variance normalization, VTLN, and feature warping. Recently, a new class of subspace methods for model compensation have become popular in language and speaker recognition--nuisance attribute projection (NAP) and factor analysis. A feature space version of latent factor analysis has been proposed. In this work, a feature space version of NAP is presented. This new approach, fNAP, is contrasted with feature domain latent factor analysis (fLFA). Both of these methods are applied to a NIST language recognition task. Results show the viability of the new fNAP method. Also, results indicate when the different methods perform best.
READ LESS

Summary

Compensation of cepstral features for mismatch due to dissimilar train and test conditions has been critical for good performance in many speech applications. Mismatch is typically due to variability from changes in speaker, channel, gender, and environment. Common methods for compensation include RASTA, mean and variance normalization, VTLN, and feature...

READ MORE

A hybrid SVM/MCE training approach for vector space topic identification of spoken audio recordings

Published in:
INTERSPEECH 2008, 22-26 September 2008, pp. 2542-2545.

Summary

The success of support vector machines (SVMs) for classification problems is often dependent on an appropriate normalization of the input feature space. This is particularly true in topic identification, where the relative contribution of the common but uninformative function words can overpower the contribution of the rare but informative content words in the SVM kernel function score if the feature space is not normalized properly. In this paper we apply the discriminative minimum classification error (MCE) training approach to the problem of learning an appropriate feature space normalization for use with an SVM classifier. Results are presented showing significant error rate reductions for an SVM-based system on a topic identification task using the Fisher corpus of audio recordings of human conversations.
READ LESS

Summary

The success of support vector machines (SVMs) for classification problems is often dependent on an appropriate normalization of the input feature space. This is particularly true in topic identification, where the relative contribution of the common but uninformative function words can overpower the contribution of the rare but informative content...

READ MORE

Dialect recognition using adapted phonetic models

Published in:
INTERSPEECH 2008, 22-26 September 2008, p. 763-766.

Summary

In this paper, we introduce a dialect recognition method that makes use of phonetic models adapted per dialect without phonetically labeled data. We show that this method can be implemented efficiently within an existing PRLM system. We compare the performance of this system with other state-of-the-art dialect recognition methods (both acoustic and token-based) on the NIST LRE 2007 English and Mandarin dialect recognition tasks. Our experimental results indicate that this system can perform better than baseline GMM and adapted PRLM systems, and also results in consistent gains of 15-23% when combined with other systems.
READ LESS

Summary

In this paper, we introduce a dialect recognition method that makes use of phonetic models adapted per dialect without phonetically labeled data. We show that this method can be implemented efficiently within an existing PRLM system. We compare the performance of this system with other state-of-the-art dialect recognition methods (both...

READ MORE

Eigen-channel compensation and discriminatively trained Gaussian mixture models for dialect and accent recognition

Published in:
Proc. INTERSPEECH 2008, 22-26 September 2008, pp. 723-726.

Summary

This paper presents a series of dialect/accent identification results for three sets of dialects with discriminatively trained Gaussian mixture models and feature compensation using eigen-channel decomposition. The classification tasks evaluated in the paper include: 1)the Chinese language classes, 2) American and Indian accented English and 3) discrimination between three Arabic dialects. The first two tasks were evaluated on the 2007 NIST LRE corpus. The Arabic discrimination task was evaluated using data derived from the LDC Arabic set collected by Appen. Analysis is performed for the English accent problem studied and an approach to open set dialect scoring is introduced. The system resulted in equal error rates at or below 10% for each of the tasks studied.
READ LESS

Summary

This paper presents a series of dialect/accent identification results for three sets of dialects with discriminatively trained Gaussian mixture models and feature compensation using eigen-channel decomposition. The classification tasks evaluated in the paper include: 1)the Chinese language classes, 2) American and Indian accented English and 3) discrimination between three Arabic...

READ MORE

The MITLL NIST LRE 2007 language recognition system

Summary

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2007 Language Recognition Evaluation. This system consists of a fusion of four core recognizers, two based on tokenization and two based on spectral similarity. Results for NIST?s 14-language detection task are presented for both the closed-set and open-set tasks and for the 30, 10 and 3 second durations. On the 30 second 14-language closed set detection task, the system achieves a 1% equal error rate.
READ LESS

Summary

This paper presents a description of the MIT Lincoln Laboratory language recognition system submitted to the NIST 2007 Language Recognition Evaluation. This system consists of a fusion of four core recognizers, two based on tokenization and two based on spectral similarity. Results for NIST?s 14-language detection task are presented for...

READ MORE

Two protocols comparing human and machine phonetic discrimination performance in conversational speech

Published in:
INTERSPEECH 2008, 22-26 September 2008, pp. 1630-1633.

Summary

This paper describes two experimental protocols for direct comparison on human and machine phonetic discrimination performance in continuous speech. These protocols attempt to isolate phonetic discrimination while controlling for language and segmentation biases. Results of two human experiments are described including comparisons with automatic phonetic recognition baselines. Our experiments suggest that in conversational telephone speech, human performance on these tasks exceeds that of machines by 15%. Furthermore, in a related controlled language model control experiment, human subjects were better able to correctly predict words in conversational speech by 45%.
READ LESS

Summary

This paper describes two experimental protocols for direct comparison on human and machine phonetic discrimination performance in continuous speech. These protocols attempt to isolate phonetic discrimination while controlling for language and segmentation biases. Results of two human experiments are described including comparisons with automatic phonetic recognition baselines. Our experiments suggest...

READ MORE

Proficiency testing for imaging and audio enhancement: guidelines for evaluation

Published in:
Int. Assoc. of Forensic Sciences, IAFS, 21-26 July 2008.

Summary

Proficiency tests in the forensic sciences are vital in the accreditation and quality assurance process. Most commercially available proficiency testing is available for examiners in the traditional forensic disciplines, such as latent prints, drug analysis, DNA, questioned documents, etc. Each of these disciplines is identification based. There are other forensic disciplines, however, where the output of the examination is not an identification of a person or substance. Two such disciplines are audio enhancement and video/image enhancement.
READ LESS

Summary

Proficiency tests in the forensic sciences are vital in the accreditation and quality assurance process. Most commercially available proficiency testing is available for examiners in the traditional forensic disciplines, such as latent prints, drug analysis, DNA, questioned documents, etc. Each of these disciplines is identification based. There are other forensic...

READ MORE

Retrieval and browsing of spoken content

Published in:
IEEE Signal Process. Mag., Vol. 25, No. 3, May 2008, pp. 39-49.

Summary

Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, information search and retrieval has emerged as a key application area. Text-based search is the most active area, with applications that range from Web and local network search to searching for personal information residing on one's own hard-drive. Speech search has received less attention perhaps because large collections of spoken material have previously not been available. However, with cheaper storage and increased broadband access, there has been a subsequent increase in the availability of online spoken audio content such as news broadcasts, podcasts, and academic lectures. A variety of personal and commercial uses also exist. As data availability increases, the lack of adequate technology for processing spoken documents becomes the limiting factor to large-scale access to spoken content. In this article, we strive to discuss the technical issues involved in the development of information retrieval systems for spoken audio documents, concentrating on the issue of handling the errorful or incomplete output provided by ASR systems. We focus on the usage case where a user enters search terms into a search engine and is returned a collection of spoken document hits.
READ LESS

Summary

Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, information search and retrieval has emerged as a key application area. Text-based search is the most active area, with applications that range...

READ MORE