Publications
Preserving the character of perturbations in scaled pitch contours
Summary
Summary
The global and fine dynamic components of a pitch contour in voice production, as in the speaking and singing voice, are important for both the meaning and character of an utterance. In speech, for example, slow pitch inflections, rapid pitch accents, and irregular regions all comprise the pitch contour. In...
Sinewave parameter estimation using the fast Fan-Chirp Transform
Summary
Summary
Sinewave analysis/synthesis has long been an important tool for audio analysis, modification and synthesis [1]. The recently introduced Fan-Chirp Transform (FChT) [2,3] has been shown to improve the fidelity of sinewave parameter estimates for a harmonic audio signal with rapid frequency modulation [4]. A fast version of the FChT [3]...
Language, dialect, and speaker recognition using Gaussian mixture models on the cell processor
Summary
Summary
Automatic recognition systems are commonly used in speech processing to classify observed utterances by the speaker's identity, dialect, and language. These problems often require high processing throughput, especially in applications involving multiple concurrent incoming speech streams, such as in datacenter-level processing. Recent advances in processor technology allow multiple processors to...
Spectral representations of nonmodal phonation
Summary
Summary
Regions of nonmodal phonation, which exhibit deviations from uniform glottal-pulse periods and amplitudes, occur often in speech and convey information about linguistic content, speaker identity, and vocal health. Some aspects of these deviations are random, including small perturbations, known as jitter and shimmer, as well as more significant aperiodicities. Other...
Analysis of nonmodal phonation using minimum entropy deconvolution
Summary
Summary
Nonmodal phonation occurs when glottal pulses exhibit nonuniform pulse-to-pulse characteristics such as irregular spacings, amplitudes, and/or shapes. The analysis of regions of such nonmodality has application to automatic speech, speaker, language, and dialect recognition. In this paper, we examine the usefulness of a technique called minimum-entropy deconvolution, or MED, for...
Automatic dysphonia recognition using biologically-inspired amplitude-modulation features
Summary
Summary
A dysphonia, or disorder of the mechanisms of phonation in the larynx, can create time-varying amplitude fluctuations in the voice. A model for band-dependent analysis of this amplitude modulation (AM) phenomenon in dysphonic speech is developed from a traditional communications engineering perspective. This perspective challenges current dysphonia analysis methods that...
Auditory signal processing as a basis for speaker recognition
Summary
Summary
In this paper, we exploit models of auditory signal processing at different levels along the auditory pathway for use in speaker recognition. A low-level nonlinear model, at the cochlea, provides accentuated signal dynamics, while a a high-level model, at the inferior colliculus, provides frequency analysis of modulation components that reveals...