Towards co-channel speaker separation by 2-D demodulation of spectrograms

October 18, 2009

Conference Paper

Author:

Tianyu Tom Wang

…

Thomas F. Quatieri

Published in:

WASPAA 2009, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 18-21 October 2009, pp. 65-68.

R&D Area:

Cyber Security and Information Sciences

R&D Group:

Artificial Intelligence Technology and Systems

Towards co-channel speaker separation by 2-D demodulation of spectrograms

Summary

This paper explores a two-dimensional (2-D) processing approach for co-channel speaker separation of voiced speech. We analyze localized time-frequency regions of a narrowband spectrogram using 2-D Fourier transforms and propose a 2-D amplitude modulation model based on pitch information for single and multi-speaker content in each region. Our model maps harmonically-related speech content to concentrated entities in a transformed 2-D space, thereby motivating 2-D demodulation of the spectrogram for analysis/synthesis and speaker separation. Using a priori pitch estimates of individual speakers, we show through a quantitative evaluation: 1) Utility of the model for representing speech content of a single speaker and 2) Its feasibility for speaker separation. For the separation task, we also illustrate benefits of the model's representation of pitch dynamics relative to a sinusoidal-based separation system.

Tagged As