Publications

Refine Results

(Filters Applied) Clear All

MCE training techniques for topic identification of spoken audio documents

Published in:
IEEE Trans. Audio, Speech, Language Proc., Vol. 19, No. 8, November 2011, pp. 2451-2461.

Summary

In this paper, we discuss the use of minimum classification error (MCE) training as a means for improving traditional approaches to topic identification such as naive Bayes classifiers and support vector machines. A key element of our new MCE training techniques is their ability to efficiently apply jackknifing or leave-one-out training to yield improved models which generalize better to unseen data. Experiments were conducted using recorded human-human telephone conversations from the Fisher Corpus using feature vector representations from word-based automatic speech recognition lattices. Sizeable improvements in topic identification accuracy using the new MCE training techniques were observed.
READ LESS

Summary

In this paper, we discuss the use of minimum classification error (MCE) training as a means for improving traditional approaches to topic identification such as naive Bayes classifiers and support vector machines. A key element of our new MCE training techniques is their ability to efficiently apply jackknifing or leave-one-out...

READ MORE

On-chip nonlinear digital compensation for RF receiver

Published in:
HPEC 2011: Conf. on High Performance Embedded Computing, 21-22 September 2011.

Summary

A system-on-chip (SOC) implementation is an attractive solution for size, weight and power (SWaP) restricted applications, such as mobile devices and UAVs. This is partly because the individual parts of the system can be designed for a specific application rather than for a broad range of them, like commercial parts usually must be. Co-design of the analog hardware and digital processing further enhances the benefits of SOC implementations by allowing, for example, nonlinear digital equalization to further enhance the dynamic range of a given front-end component. This paper presents the implementation of nonlinear digital compensation for an active anti-aliasing filter, which is part of a low-power homodyne receiver design. The RF front-end circuitry and the digital compensation will be integrated in the same chip. Co-design allows the front-end to be designed with known dynamic range limitations that will later be compensated by nonlinear equalization. It also allows nonlinear digital compensation architectures matched to specific circuits and dynamic range requirements--while still maintaining some flexibility to deal with process variation--as opposed to higher power general purpose designs.
READ LESS

Summary

A system-on-chip (SOC) implementation is an attractive solution for size, weight and power (SWaP) restricted applications, such as mobile devices and UAVs. This is partly because the individual parts of the system can be designed for a specific application rather than for a broad range of them, like commercial parts...

READ MORE

A new perspective on GMM subspace compensation based on PPCA and Wiener filtering

Published in:
2011 INTERSPEECH, 27-31 August 2011, pp. 145-148.

Summary

We present a new perspective on the subspace compensation techniques that currently dominate the field of speaker recognition using Gaussian Mixture Models (GMMs). Rather than the traditional factor analysis approach, we use Gaussian modeling in the sufficient statistic supervector space combined with Probabilistic Principal Component Analysis (PPCA) within-class and shared across class covariance matrices to derive a family of training and testing algorithms. Key to this analysis is the use of two noise terms for each speech cut: a random channel offset and a length dependent observation noise. Using the Wiener filtering perspective, formulas for optimal train and test algorithms for Joint Factor Analysis (JFA) are simple to derive. In addition, we can show that an alternative form of Wiener filtering results in the i-vector approach, thus tying together these two disparate techniques.
READ LESS

Summary

We present a new perspective on the subspace compensation techniques that currently dominate the field of speaker recognition using Gaussian Mixture Models (GMMs). Rather than the traditional factor analysis approach, we use Gaussian modeling in the sufficient statistic supervector space combined with Probabilistic Principal Component Analysis (PPCA) within-class and shared...

READ MORE

Automatic detection of depression in speech using Gaussian mixture modeling with factor analysis

Summary

Of increasing importance in the civilian and military population is the recognition of Major Depressive Disorder at its earliest stages and intervention before the onset of severe symptoms. Toward the goal of more effective monitoring of depression severity, we investigate automatic classifiers of depression state, that have the important property of mitigating nuisances due to data variability, such as speaker and channel effects, unrelated to levels of depression. To assess our measures, we use a 35-speaker free-response speech database of subjects treated for depression over a six-week duration, along with standard clinical HAMD depression ratings. Preliminary experiments indicate that by mitigating nuisances, thus focusing on depression severity as a class, we can significantly improve classification accuracy over baseline Gaussian-mixture-model-based classifiers.
READ LESS

Summary

Of increasing importance in the civilian and military population is the recognition of Major Depressive Disorder at its earliest stages and intervention before the onset of severe symptoms. Toward the goal of more effective monitoring of depression severity, we investigate automatic classifiers of depression state, that have the important property...

READ MORE

Sinewave representations of nonmodality

Summary

Regions of nonmodal phonation, exhibiting deviations from uniform glottal-pulse periods and amplitudes, occur often and convey information about speaker- and linguistic-dependent factors. Such waveforms pose challenges for speech modeling, analysis/synthesis, and processing. In this paper, we investigate the representation of nonmodal pulse trains as a sum of harmonically-related sinewaves with time-varying amplitudes, phases, and frequencies. We show that a sinewave representation of any impulsive signal is not unique and also the converse, i.e., frame-based measurements of the underlying sinewave representation can yield different impulse trains. Finally, we argue how this ambiguity may explain addition, deletion, and movement of pulses in sinewave synthesis and a specific illustrative example of time-scale modification of a nonmodal case of diplophonia.
READ LESS

Summary

Regions of nonmodal phonation, exhibiting deviations from uniform glottal-pulse periods and amplitudes, occur often and convey information about speaker- and linguistic-dependent factors. Such waveforms pose challenges for speech modeling, analysis/synthesis, and processing. In this paper, we investigate the representation of nonmodal pulse trains as a sum of harmonically-related sinewaves with...

READ MORE

Language recognition via i-vectors and dimensionality reduction

Published in:
2011 INTERSPEECH, 27-31 August 2011, pp. 857-860.

Summary

In this paper, a new language identification system is presented based on the total variability approach previously developed in the field of speaker identification. Various techniques are employed to extract the most salient features in the lower dimensional i-vector space and the system developed results in excellent performance on the 2009 LRE evaluation set without the need for any post-processing or backend techniques. Additional performance gains are observed when the system is combined with other acoustic systems.
READ LESS

Summary

In this paper, a new language identification system is presented based on the total variability approach previously developed in the field of speaker identification. Various techniques are employed to extract the most salient features in the lower dimensional i-vector space and the system developed results in excellent performance on the...

READ MORE

Latent topic modeling for audio corpus summarization

Published in:
INTERSPEECH 2011, 27-31 August 2011, pp. 913-916.

Summary

This work presents techniques for automatically summarizing the topical content of an audio corpus. Probabilistic latent semantic analysis (PLSA) is used to learn a set of latent topics in an unsupervised fashion. These latent topics are ranked by their relative importance in the corpus and a summary of each topic is generated from signature words that aptly describe the content of that topic. This paper presents techniques for producing a high quality summarization. An example summarization of conversational data from the Fisher corpus that demonstrates the effectiveness of our approach is presented and evaluated.
READ LESS

Summary

This work presents techniques for automatically summarizing the topical content of an audio corpus. Probabilistic latent semantic analysis (PLSA) is used to learn a set of latent topics in an unsupervised fashion. These latent topics are ranked by their relative importance in the corpus and a summary of each topic...

READ MORE

Phonologically-based biomarkers for major depressive disorder

Summary

Of increasing importance in the civilian and military population is the recognition of major depressive disorder at its earliest stages and intervention before the onset of severe symptoms. Toward the goal of more effective monitoring of depression severity, we introduce vocal biomarkers that are derived automatically from phonologically-based measures of speech rate. To assess our measures, we use a 35-speaker free-response speech database of subjects treated for depression over a 6-week duration. We find that dissecting average measures of speech rate into phone-specific characteristics and, in particular, combined phone-duration measures uncovers stronger relationships between speech rate and depression severity than global measures previously reported for a speech-rate biomarker. Results of this study are supported by correlation of our measures with depression severity and classification of depression state with these vocal measures. Our approach provides a general framework for analyzing individual symptom categories through phonological units, and supports the premise that speaking rate can be an indicator of psychomotor retardation severity.
READ LESS

Summary

Of increasing importance in the civilian and military population is the recognition of major depressive disorder at its earliest stages and intervention before the onset of severe symptoms. Toward the goal of more effective monitoring of depression severity, we introduce vocal biomarkers that are derived automatically from phonologically-based measures of...

READ MORE

Eigenspace analysis for threat detection in social networks

Published in:
Int. Conf. on Information Fusion, 5 July 2011.

Summary

The problem of detecting a small, anomalous subgraph within a large background network is important and applicable to many fields. The non-Euclidean nature of graph data, however, complicates the application of classical detection theory in this context. A recent statistical framework for anomalous subgraph detection uses spectral properties of a graph's modularity matrix to determine the presence of an anomaly. In this paper, this detection framework and the related algorithms are applied to data focused on a specific application: detection of a threat subgraph embedded in a social network. The results presented use data created to simulate threat activity among noisy interactions. The detectability of the threat subgraph and its separability from the noise is analyzed under a variety of background conditions in both static and dynamic scenarios.
READ LESS

Summary

The problem of detecting a small, anomalous subgraph within a large background network is important and applicable to many fields. The non-Euclidean nature of graph data, however, complicates the application of classical detection theory in this context. A recent statistical framework for anomalous subgraph detection uses spectral properties of a...

READ MORE

Anomalous subgraph detection via sparse principal component analysis

Published in:
Proc. 2011 IEEE Statistical Signal Processing Workshop (SSP), 28-30 June 2011, pp. 485-488.

Summary

Network datasets have become ubiquitous in many fields of study in recent years. In this paper we investigate a problem with applicability to a wide variety of domains - detecting small, anomalous subgraphs in a background graph. We characterize the anomaly in a subgraph via the well-known notion of network modularity, and we show that the optimization problem formulation resulting from our setup is very similar to a recently introduced technique in statistics called Sparse Principal Component Analysis (Sparse PCA), which is an extension of the classical PCA algorithm. The exact version of our problem formulation is a hard combinatorial optimization problem, so we consider a recently introduced semidefinite programming relaxation of the Sparse PCA problem. We show via results on simulated data that the technique is very promising.
READ LESS

Summary

Network datasets have become ubiquitous in many fields of study in recent years. In this paper we investigate a problem with applicability to a wide variety of domains - detecting small, anomalous subgraphs in a background graph. We characterize the anomaly in a subgraph via the well-known notion of network...

READ MORE