Summary
Nonmodal phonation occurs when glottal pulses exhibit nonuniform pulse-to-pulse characteristics such as irregular spacings, amplitudes, and/or shapes. The analysis of regions of such nonmodality has application to automatic speech, speaker, language, and dialect recognition. In this paper, we examine the usefulness of a technique called minimum-entropy deconvolution, or MED, for the analysis of pulse events in nonmodal speech. Our study presents evidence for both natural and synthetic speech that MED decomposes nonmodal phonation into a series of sharp pulses and a set of mixedphase impulse responses. We show that the estimated impulse responses are quantitatively similar to those in our synthesis model. A hybrid method incorporating aspects of both MED and linear prediction is also introduced. We show preliminary evidence that the hybrid method has benefit over MED alone for composite impulse-response estimation by being more robust to short-time windowing effects as well as a speech aspiration noise component.