Speech Enhancement via Attention Masking Network (SEAMNET)

The system suppresses the noise and reverberation incurred by automated speech and speaker recognition systems, thereby improving the intelligibility of mobile communication devices, "smart" virtual assistants, and assistive speech technologies.

Download PDF

SEAMNET is a neural network–based, end-to-end, single-channel speech enhancement system that could help improve the intelligibility of mobile communication devices, "smart" virtual assistants, and assistive speech technologies. It is designed to suppress both the noise and reverberation that can signiﬁcantly degrade the performance of typical automated speech and speaker recognition systems.

Problem
Speech signals are typically captured in the presence of distortions such as reverberation and/or additive noise. For human listeners, these distortions result in increased listening effort and reduced intelligibility. For automated applications, such as speech and speaker recognition, distortions decrease system performance.

Solution
SEAMNET software reduces the effects of reverberation and noise on speech signals. It takes as input a noisy and/or reverberant speech waveform and outputs an enhanced version of the signal, processed to improve perceptual quality for human listening. SEAMNET leverages a deep neural network to predict the underlying clean speech signal from a noisy observed version. It suppresses interfering sources in the input signal via a multiplicative mask in a learned embedding space. The enhancement and autoencoder capabilities within the overall network allow the user to dynamically control the trade-off between noise suppression and speech quality.

System testing has shown SEAMNET to outperform state-of-the-art competitors, both in terms of objective speech quality metrics and subjective listening tests. During the listening tests, human subjects found the dynamic control particularly beneﬁcial in reducing the amount of perceived disturbing artifacts in the output signal.

Benefits

Allows users to control the level of suppression applied during speech enhancement
Provides joint noise and reverberation suppression; currently used speech/speaker recognition systems dampen background noise but not reverberation that is inherent in all speech signals
Computational eﬃciency allows SEAMNET to operate even with systems limited in speed

Potential Use Cases

Automated speech recognition
Mobile communications

Additional Resources

U.S. Patent 11,227,586

B.J. Borgström and M.S. Brandstein, "Speech Enhancement via Attention Masking Network (SEAMNET): An End-to-End System for Joint Suppression of Noise and Reverberation," IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, 515−526, 2021.