Summary
Speech communication is significantly more difficult in severe acoustic background noise environments, especially when low-rate speech coders are used. Non-acoustic sensors, such as radar sensors, vibrometers, and bone-conduction microphones, offer significant potential in these situations. We extend previous work on fixed waveform fusion from multiple sensors to an optimal dynamic waveform fusion algorithm that minimizes both additive noise and signal distortion in the estimated speech signal. We show that a minimum mean squared error (MMSE) waveform matching criterion results in a generalized multichannel Wiener filter, and that this filter will simultaneously perform waveform fusion, noise suppression, and crosschannel noise cancellation. Formal intelligibility and quality testing demonstrate significant improvement from this approach.