Publications

Refine Results

(Filters Applied) Clear All

Estimating lower vocal tract features with closed-open phase spectral analyses

Published in:
INTERSPEECH 2015: 15th Annual Conf. of the Int. Speech Communication Assoc., 6-10 September 2015.

Summary

Previous studies have shown that, in addition to being speaker-dependent yet context-independent, lower vocal tract acoustics significantly impact the speech spectrum at mid-to-high frequencies (e.g 3-6kHz). The present work automatically estimates spectral features that exhibit acoustic properties of the lower vocal tract. Specifically aiming to capture the cyclicity property of the epilarynx tube, a novel multi-resolution approach to spectral analyses is presented that exploits significant differences between the closed and open phases of a glottal cycle. A prominent null linked to the piriform fossa is also estimated. Examples of the feature estimation on natural speech of the VOICES multi-speaker corpus illustrate that a salient spectral pattern indeed emerges between 3-6kHz across all speakers. Moreover, the observed pattern is consistent with that canonically shown for the lower vocal tract in previous works. Additionally, an instance of a speaker's formant (i.e. spectral peak around 3kHz that has been well-established as a characteristic of voice projection) is quantified here for the VOICES template speaker in relation to epilarynx acoustics. The corresponding peak is shown to be double the power on average compared to the other speakers (20 vs 10 dB).
READ LESS

Summary

Previous studies have shown that, in addition to being speaker-dependent yet context-independent, lower vocal tract acoustics significantly impact the speech spectrum at mid-to-high frequencies (e.g 3-6kHz). The present work automatically estimates spectral features that exhibit acoustic properties of the lower vocal tract. Specifically aiming to capture the cyclicity property of...

READ MORE

In situ microfluidic SERS assay for monitoring enzymatic breakdown of organophosphates

Summary

In this paper, we report on a method to probe the breakdown of the organophosphate (OP) simulants o, s-diethyl methyl phosphonothioate (OSDMP) and demeton S by the enzyme organophosphorous hydrolase (OPH) in a microfluidic device by surface enhanced Raman spectroscopy (SERS). SERS hotspots were formed on-demand inside the microfluidic device by laser-induced aggregation of injected Ag NPs suspensions. The Ag NP clusters, covering micron-sized areas, were formed within minutes using a conventional confocal Raman laser microscope. These Ag NP clusters were used to enhance the Raman spectra of the thiol products of OP breakdown in the microfluidic device: ethanethiol (EtSH) and (ethylsulfanyl) ethane-1-thiol (2-EET). When the OPH enzyme and its substrates OSDMP and demeton S were introduced, the thiolated breakdown products were generated, resulting in changes in the SERS spectra. With the ability to analyze reaction volumes as low as 20 nL, our approach demonstrates great potential for miniaturization of SERS analytical protocols.
READ LESS

Summary

In this paper, we report on a method to probe the breakdown of the organophosphate (OP) simulants o, s-diethyl methyl phosphonothioate (OSDMP) and demeton S by the enzyme organophosphorous hydrolase (OPH) in a microfluidic device by surface enhanced Raman spectroscopy (SERS). SERS hotspots were formed on-demand inside the microfluidic device...

READ MORE

Rapid sequence identification of potential pathogens using techniques from sparse linear algebra

Summary

The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck. Here we present D4RAGenS, a genetic sequence identification algorithm that exhibits the Big Data handling and computational power of the Dynamic Distributed Dimensional Data Model (D4M). The method leverages linear algebra and statistical properties to increase computational performance while retaining accuracy by subsampling the data. Two run modes, Fast and Wise, yield speed and precision tradeoffs, with applications in biodefense and medical diagnostics. The D4RAGenS analysis algorithm is tested over several datasets, including three utilized for the Defense Threat Reduction Agency (DTRA) metagenomic algorithm contest.
READ LESS

Summary

The decreasing costs and increasing speed and accuracy of DNA sample collection, preparation, and sequencing has rapidly produced an enormous volume of genetic data. However, fast and accurate analysis of the samples remains a bottleneck. Here we present D4RAGenS, a genetic sequence identification algorithm that exhibits the Big Data handling...

READ MORE

Using a big data database to identify pathogens in protein data space [e-print]

Summary

Current metagenomic analysis algorithms require significant computing resources, can report excessive false positives (type I errors), may miss organisms (type II errors/false negatives), or scale poorly on large datasets. This paper explores using big data database technologies to characterize very large metagenomic DNA sequences in protein space, with the ultimate goal of rapid pathogen identification in patient samples. Our approach uses the abilities of a big data databases to hold large sparse associative array representations of genetic data to extract statistical patterns about the data that can be used in a variety of ways to improve identification algorithms.
READ LESS

Summary

Current metagenomic analysis algorithms require significant computing resources, can report excessive false positives (type I errors), may miss organisms (type II errors/false negatives), or scale poorly on large datasets. This paper explores using big data database technologies to characterize very large metagenomic DNA sequences in protein space, with the ultimate...

READ MORE

Genetic sequence matching using D4M big data approaches

Published in:
HPEC 2014: IEEE Conf. on High Performance Extreme Computing, 9-11 September 2014.

Summary

Recent technological advances in Next Generation Sequencing tools have led to increasing speeds of DNA sample collection, preparation, and sequencing. One instrument can produce over 600 Gb of genetic sequence data in a single run. This creates new opportunities to efficiently handle the increasing workload. We propose a new method of fast genetic sequence analysis using the Dynamic Distributed Dimensional Data Model (D4M) - an associative array environment for MATLAB developed at MIT Lincoln Laboratory. Based on mathematical and statistical properties, the method leverages big data techniques and the implementation of an Apache Acculumo database to accelerate computations one-hundred fold over other methods. Comparisons of the D4M method with the current gold-standard for sequence analysis, BLAST, show the two are comparable in the alignments they find. This paper will present an overview of the D4M genetic sequence algorithm and statistical comparisons with BLAST.
READ LESS

Summary

Recent technological advances in Next Generation Sequencing tools have led to increasing speeds of DNA sample collection, preparation, and sequencing. One instrument can produce over 600 Gb of genetic sequence data in a single run. This creates new opportunities to efficiently handle the increasing workload. We propose a new method...

READ MORE

Speech enhancement using sparse convolutive non-negative matrix factorization with basis adaptation

Published in:
INTERSPEECH 2012: 13th Annual Conf. of the Int. Speech Communication Assoc., 9-13 September 2012.

Summary

We introduce a framework for speech enhancement based on convolutive non-negative matrix factorization that leverages available speech data to enhance arbitrary noisy utterances with no a priori knowledge of the speakers or noise types present. Previous approaches have shown the utility of a sparse reconstruction of the speech-only components of an observed noisy utterance. We demonstrate that an underlying speech representation which, in addition to applying sparsity, also adapts to the noisy acoustics improves overall enhancement quality. The proposed system performs comparably to a traditional Wiener filtering approach, and the results suggest that the proposed framework is most useful in moderate- to low-SNR scenarios.
READ LESS

Summary

We introduce a framework for speech enhancement based on convolutive non-negative matrix factorization that leverages available speech data to enhance arbitrary noisy utterances with no a priori knowledge of the speakers or noise types present. Previous approaches have shown the utility of a sparse reconstruction of the speech-only components of...

READ MORE

Vocal-source biomarkers for depression - a link to psychomotor activity

Published in:
INTERSPEECH 2012: 13th Annual Conf. of the Int. Speech Communication Assoc., 9-13 September 2012.

Summary

A hypothesis in characterizing human depression is that change in the brain's basal ganglia results in a decline of motor coordination. Such a neuro-physiological change may therefore affect laryngeal control and dynamics. Under this hypothesis, toward the goal of objective monitoring of depression severity, we investigate vocal-source biomarkers for depression; specifically, source features that may relate to precision in motor control, including vocal-fold shimmer and jitter, degree of aspiration, fundamental frequency dynamics, and frequency-dependence of variability and velocity of energy. We use a 35-subject database collected by Mundt et al. in which subjects were treated over a six-week period, and investigate correlation of our features with clinical (HAMD), as well as self-reported (QIDS) Total subject assessment scores. To explicitly address the motor aspect of depression, we compute correlations with the Psychomotor Retardation component of clinical and self-reported Total assessments. For our longitudinal database, most correlations point to statistical relationships of our vocal-source biomarkers with psychomotor activity, as well as with depression severity.
READ LESS

Summary

A hypothesis in characterizing human depression is that change in the brain's basal ganglia results in a decline of motor coordination. Such a neuro-physiological change may therefore affect laryngeal control and dynamics. Under this hypothesis, toward the goal of objective monitoring of depression severity, we investigate vocal-source biomarkers for depression...

READ MORE

Exploring the impact of advanced front-end processing on NIST speaker recognition microphone tasks

Summary

The NIST speaker recognition evaluation (SRE) featured microphone data in the 2005-2010 evaluations. The preprocessing and use of this data has typically been performed with telephone bandwidth and quantization. Although this approach is viable, it ignores the richer properties of the microphone data-multiple channels, high-rate sampling, linear encoding, ambient noise properties, etc. In this paper, we explore alternate choices of preprocessing and examine their effects on speaker recognition performance. Specifically, we consider the effects of quantization, sampling rate, enhancement, and two-channel speech activity detection. Experiments on the NIST 2010 SRE interview microphone corpus demonstrate that performance can be dramatically improved with a different preprocessing chain.
READ LESS

Summary

The NIST speaker recognition evaluation (SRE) featured microphone data in the 2005-2010 evaluations. The preprocessing and use of this data has typically been performed with telephone bandwidth and quantization. Although this approach is viable, it ignores the richer properties of the microphone data-multiple channels, high-rate sampling, linear encoding, ambient noise...

READ MORE

Automatic detection of depression in speech using Gaussian mixture modeling with factor analysis

Summary

Of increasing importance in the civilian and military population is the recognition of Major Depressive Disorder at its earliest stages and intervention before the onset of severe symptoms. Toward the goal of more effective monitoring of depression severity, we investigate automatic classifiers of depression state, that have the important property of mitigating nuisances due to data variability, such as speaker and channel effects, unrelated to levels of depression. To assess our measures, we use a 35-speaker free-response speech database of subjects treated for depression over a six-week duration, along with standard clinical HAMD depression ratings. Preliminary experiments indicate that by mitigating nuisances, thus focusing on depression severity as a class, we can significantly improve classification accuracy over baseline Gaussian-mixture-model-based classifiers.
READ LESS

Summary

Of increasing importance in the civilian and military population is the recognition of Major Depressive Disorder at its earliest stages and intervention before the onset of severe symptoms. Toward the goal of more effective monitoring of depression severity, we investigate automatic classifiers of depression state, that have the important property...

READ MORE

Sinewave representations of nonmodality

Summary

Regions of nonmodal phonation, exhibiting deviations from uniform glottal-pulse periods and amplitudes, occur often and convey information about speaker- and linguistic-dependent factors. Such waveforms pose challenges for speech modeling, analysis/synthesis, and processing. In this paper, we investigate the representation of nonmodal pulse trains as a sum of harmonically-related sinewaves with time-varying amplitudes, phases, and frequencies. We show that a sinewave representation of any impulsive signal is not unique and also the converse, i.e., frame-based measurements of the underlying sinewave representation can yield different impulse trains. Finally, we argue how this ambiguity may explain addition, deletion, and movement of pulses in sinewave synthesis and a specific illustrative example of time-scale modification of a nonmodal case of diplophonia.
READ LESS

Summary

Regions of nonmodal phonation, exhibiting deviations from uniform glottal-pulse periods and amplitudes, occur often and convey information about speaker- and linguistic-dependent factors. Such waveforms pose challenges for speech modeling, analysis/synthesis, and processing. In this paper, we investigate the representation of nonmodal pulse trains as a sum of harmonically-related sinewaves with...

READ MORE