Summary
In this paper we propose a multiresolution short-time analysis method for speech enhancement. It is well known that fixed resolution methods such as the traditional short-time Fourier transform do not generally match the time-frequency structure of the signal being analyzed resulting in poor estimates of the speech and noise spectra required for enhancement. This can lead to the reduction of quality in the enhanced signal through the introduction of artifacts such as musical noise. To counter these limitations, we propose an adaptive short-time analysis-synthesis scheme for speech enhancement in which the adaptation is based on a measure of local time-frequency concentration. Synthesis is made possible through a modified overlap-add procedure. Empirical results using voiced speech indicate a clear improvement over a fixed time-frequency resolution enhancement scheme both in terms of mean-square error and as indicated by informal listening tests.