Publications
Tagged As
Missing feature theory with soft spectral subtraction for speaker verification
Summary
Summary
This paper considers the problem of training/testing mismatch in the context of speaker verification and, in particular, explores the application of missing feature theory in the case of additive white Gaussian noise corruption in testing. Missing feature theory allows for corrupted features to be removed from scoring, the initial step...
An overview of automatic speaker diarization systems
Summary
Summary
Audio diarization is the process of annotating an input audio channel with information that attributes (possibly overlapping) temporal regions of signal energy to their specific sources. These sources can include particular speakers, music, background noise sources, and other signal source/channel characteristics. Diarization can be used for helping speech recognition, facilitating...
Synthesis, analysis, and pitch modification of the breathy vowel
Summary
Summary
Breathiness is an aspect of voice quality that is difficult to analyze and synthesize, especially since its periodic and noise components are typically overlapping in frequency. The decomposition and manipulation of these two components is of importance in a variety of speech application areas such as text-to-speech synthesis, speech encoding...
A comparison of soft and hard spectral subtraction for speaker verification
Summary
Summary
An important concern in speaker recognition is the performance degradation that occurs when speaker models trained with speech from one type of channel are subsequently used to score speech from another type of channel, known as channel mismatch. This paper investigates the relative performance of two different spectral subtraction methods...
Automated lip-reading for improved speech intelligibility
Summary
Summary
Various psycho-acoustical experiments have concluded that visual features strongly affect the perception of speech. This contribution is most pronounced in noisy environments where the intelligibility of audio-only speech is quickly degraded. An exploration of the effectiveness for extracted visual features such as lip height and width for improving speech intelligibility...
Exploiting nonacoustic sensors for speech enhancement
Summary
Summary
Nonacoustic sensors such as the general electromagnetic motion sensor (GEMS), the physiological microphone (P-mic), and the electroglottograph (EGG) offer multimodal approaches to speech processing and speaker and speech recognition. These sensors provide measurements of functions of the glottal excitation and, more generally, of the vocal tract articulator movements that are...
Multimodal speaker authentication using nonacuostic sensors
Summary
Summary
Many nonacoustic sensors are now available to augment user authentication. Devices such as the GEMS (glottal electromagnetic micro-power sensor), the EGG (electroglottograph), and the P-mic (physiological mic) all have distinct methods of measuring physical processes associated with speech production. A potential exciting aspect of the application of these sensors is...
2-D processing of speech with application to pitch estimation
Summary
Summary
In this paper, we introduce a new approach to two-dimensional (2-D) processing of the one-dimensional (1-D) speech signal in the time-frequency plane. Specifically, we obtain the shortspace 2-D Fourier transform magnitude of a narrowband spectrogram of the signal and show that this 2-D transformation maps harmonically-related signal components to a...
Speech enhancement based on auditory spectral change
Summary
Summary
In this paper, an adaptive approach to the enhancement of speech signals is developed based on auditory spectral change. The algorithm is motivated by sensitivity of aural biologic systems to signal dynamics, by evidence that noise is aurally masked by rapid changes in a signal, and by analogies to these...
'Perfect reconstruction' time-scaling filterbanks
Summary
Summary
A filterbank-based method of time-scale modification is analyzed for elemental signals including clicks, sines, and AM-FM sines. It is shown that with the use of some basic properties of linear systems, as well as FM-to-AM filter transduction, "perfect reconstruction" time-scaling filterbanks can be constructed for these elemental signal classes under...