Publications

Refine Results

(Filters Applied) Clear All

Efficient reconstruction of block-sparse signals

Published in:
IEEE Statistical Signal Processing Workshop, 28-30 June 2011.

Summary

In many sparse reconstruction problems, M observations are used to estimate K components in an N dimensional basis, where N > M ¿ K. The exact basis vectors, however, are not known a priori and must be chosen from an M x N matrix. Such underdetermined problems can be solved using an l2 optimization with an l1 penalty on the sparsity of the solution. There are practical applications in which multiple measurements can be grouped together, so that K x P data must be estimated from M x P observations, where the l1 sparsity penalty is taken with respect to the vector formed using the l2 norms of the rows of the data matrix. In this paper we develop a computationally efficient block partitioned homotopy method for reconstructing K x P data from M x P observations using a grouped sparsity constraint, and compare its performance to other block reconstruction algorithms.
READ LESS

Summary

In many sparse reconstruction problems, M observations are used to estimate K components in an N dimensional basis, where N > M ¿ K. The exact basis vectors, however, are not known a priori and must be chosen from an M x N matrix. Such underdetermined problems can be solved...

READ MORE

Graph relational features for speaker recognition and mining

Published in:
Proc. 2011 IEEE Statistical Signal Processing Workshop (SSP), 28-30 June 2011, pp. 525-528.

Summary

Recent advances in the field of speaker recognition have resulted in highly efficient speaker comparison algorithms. The advent of these algorithms allows for leveraging a background set, consisting a large numbers of unlabeled recordings, to improve recognition. In this work, a relational graph, where nodes represent utterances and links represent speaker similarity, is created from the background recordings in which the recordings of interest, train and test, are then embedded. Relational features computed from the embedding are then used to obtain a match score between the recordings of interest. We show the efficacy of these features in speaker verification and speaker mining tasks.
READ LESS

Summary

Recent advances in the field of speaker recognition have resulted in highly efficient speaker comparison algorithms. The advent of these algorithms allows for leveraging a background set, consisting a large numbers of unlabeled recordings, to improve recognition. In this work, a relational graph, where nodes represent utterances and links represent...

READ MORE

Matched filtering for subgraph detection in dynamic networks

Published in:
2011 IEEE Statistical Signal Processing Workshop (SSP), 28-30 June 2011, pp. 509-512.

Summary

Graphs are high-dimensional, non-Euclidean data, whose utility spans a wide variety of disciplines. While their non-Euclidean nature complicates the application of traditional signal processing paradigms, it is desirable to seek an analogous detection framework. In this paper we present a matched filtering method for graph sequences, extending to a dynamic setting a previous method for the detection of anomalously dense subgraphs in a large background. In simulation, we show that this temporal integration technique enables the detection of weak subgraph anomalies than are not detectable in the static case. We also demonstrate background/foreground separation using a real background graph based on a computer network.
READ LESS

Summary

Graphs are high-dimensional, non-Euclidean data, whose utility spans a wide variety of disciplines. While their non-Euclidean nature complicates the application of traditional signal processing paradigms, it is desirable to seek an analogous detection framework. In this paper we present a matched filtering method for graph sequences, extending to a dynamic...

READ MORE

An active filter achieving 43.6dBm OIP3

Published in:
IEEE Radio Frequency Integrated Circuits Symp., RFIC, 5-7 June 2011.

Summary

An active filter with a 50 omega buffer suitable as an anti-alias filter to drive a highly linear ADC is implemented in 0.13 um SiGe BiCMOS. This 6th-order Chebyshev filter has a 3 dB cutoff frequency of 28.3 MHz and achieves 36.5 dBm OIP3. Nonlinear digital equalization further improves OIP3 to 43.6 dBm. Measurements show 92 dB of rejection at the stopband and a gain of 49 dB. The measured in-band OIP3 of 43.6 dBm is 19 dB higher than previously published designs.
READ LESS

Summary

An active filter with a 50 omega buffer suitable as an anti-alias filter to drive a highly linear ADC is implemented in 0.13 um SiGe BiCMOS. This 6th-order Chebyshev filter has a 3 dB cutoff frequency of 28.3 MHz and achieves 36.5 dBm OIP3. Nonlinear digital equalization further improves OIP3...

READ MORE

A time-warping framework for speech turbulence-noise component estimation during aperiodic phonation

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 22-27 May 2011, pp. 5404-5407.

Summary

The accurate estimation of turbulence noise affects many areas of speech processing including separate modification of the noise component, analysis of degree of speech aspiration for treating pathological voice, the automatic labeling of speech voicing, as well as speaker characterization and recognition. Previous work in the literature has provided methods by which such a high-quality noise component may be estimated in near-periodic speech, but it is known that these methods tend to leak aperiodic phonation (with even slight deviations from periodicity) into the noise-component estimate. In this paper, we improve upon existing algorithms in conditions of aperiodicity by introducing a time-warping based approach to speech noise-component estimation, demonstrating the results on both natural and synthetic speech examples.
READ LESS

Summary

The accurate estimation of turbulence noise affects many areas of speech processing including separate modification of the noise component, analysis of degree of speech aspiration for treating pathological voice, the automatic labeling of speech voicing, as well as speaker characterization and recognition. Previous work in the literature has provided methods...

READ MORE

Assessing the speaker recognition performance of naive listeners using Mechanical Turk

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 22-27 May 2011, pp. 5916-5919.

Summary

In this paper we attempt to quantify the ability of naive listeners to perform speaker recognition in the context of the NIST evaluation task. We describe our protocol: a series of listening experiments using large numbers of naive listeners (432) on Amazon's Mechanical Turk that attempts to measure the ability of the average human listener to perform speaker recognition. Our goal was to compare the performance of the average human listener to both forensic experts and state-of-the- art automatic systems. We show that naive listeners vary substantially in their performance, but that an aggregation of listener responses can achieve performance similar to that of expert forensic examiners.
READ LESS

Summary

In this paper we attempt to quantify the ability of naive listeners to perform speaker recognition in the context of the NIST evaluation task. We describe our protocol: a series of listening experiments using large numbers of naive listeners (432) on Amazon's Mechanical Turk that attempts to measure the ability...

READ MORE

Informative dialect recognition using context-dependent pronunciation modeling

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 22-27 May 2011, pp. 4396-4399.

Summary

We propose an informative dialect recognition system that learns phonetic transformation rules, and uses them to identify dialects. A hidden Markov model is used to align reference phones with dialect specific pronunciations to characterize when and how often substitutions, insertions, and deletions occur. Decision tree clustering is used to find context-dependent phonetic rules. We ran recognition tasks on 4 Arabic dialects. Not only do the proposed systems perform well on their own, but when fused with baselines they improve performance by 21-36% relative. In addition, our proposed decision-tree system beats the baseline monophone system in recovering phonetic rules by 21% relative. Pronunciation rules learned by our proposed system quantify the occurrence frequency of known rules, and suggest rule candidates for further linguistic studies.
READ LESS

Summary

We propose an informative dialect recognition system that learns phonetic transformation rules, and uses them to identify dialects. A hidden Markov model is used to align reference phones with dialect specific pronunciations to characterize when and how often substitutions, insertions, and deletions occur. Decision tree clustering is used to find...

READ MORE

NAP for high level language identification

Published in:
ICASSP 2011, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 22-27 May 2011, pp. 4392-4395.

Summary

Varying channel conditions present a difficult problem for many speech technologies such as language identification (LID). Channel compensation techniques have been shown to significantly improve performance in LID for acoustic systems. For high-level token systems, nuisance attribute projection (NAP) has been shown to perform well in the context of speaker identification. In this work, we describe a novel approach to dealing with the high dimensional sparse NAP training problem as applied to a 4-gram phonotactic LID system run on the NIST 2009 Language Recognition Evaluation (LRE) task. We demonstrate performance gains on the Voice of America (VOA) portion of the 2009 LRE data.
READ LESS

Summary

Varying channel conditions present a difficult problem for many speech technologies such as language identification (LID). Channel compensation techniques have been shown to significantly improve performance in LID for acoustic systems. For high-level token systems, nuisance attribute projection (NAP) has been shown to perform well in the context of speaker...

READ MORE

The MIT LL 2010 speaker recognition evaluation system: scalable language-independent speaker recognition

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 22-27 May 2011, pp. 5272-5275.

Summary

Research in the speaker recognition community has continued to address methods of mitigating variational nuisances. Telephone and auxiliary-microphone recorded speech emphasize the need for a robust way of dealing with unwanted variation. The design of recent 2010 NIST-SRE Speaker Recognition Evaluation (SRE) reflects this research emphasis. In this paper, we present the MIT submission applied to the tasks of the 2010 NIST-SRE with two main goals--language-independent scalable modeling and robust nuisance mitigation. For modeling, exclusive use of inner product-based and cepstral systems produced a language-independent computationally-scalable system. For robustness, systems that captured spectral and prosodic information, modeled nuisance subspaces using multiple novel methods, and fused scores of multiple systems were implemented. The performance of the system is presented on a subset of the NIST SRE 2010 core tasks.
READ LESS

Summary

Research in the speaker recognition community has continued to address methods of mitigating variational nuisances. Telephone and auxiliary-microphone recorded speech emphasize the need for a robust way of dealing with unwanted variation. The design of recent 2010 NIST-SRE Speaker Recognition Evaluation (SRE) reflects this research emphasis. In this paper, we...

READ MORE

Towards reduced false-alarms using cohorts

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 22-27 May 2011, pp. 4512-4515.

Summary

The focus of the 2010 NIST Speaker Recognition Evaluation (SRE) was the low false alarm regime of the detection error trade-off (DET) curve. This paper presents several approaches that specifically target this issue. It begins by highlighting the main problem with operating in the low-false alarm regime. Two sets of methods to tackle this issue are presented that require a large and diverse impostor set: the first set penalizes trials whose enrollment and test utterances are not nearest neighbors of each other while the second takes an adaptive score normalization approach similar to TopNorm and ATNorm.
READ LESS

Summary

The focus of the 2010 NIST Speaker Recognition Evaluation (SRE) was the low false alarm regime of the detection error trade-off (DET) curve. This paper presents several approaches that specifically target this issue. It begins by highlighting the main problem with operating in the low-false alarm regime. Two sets of...

READ MORE