Publications

Refine Results

(Filters Applied) Clear All

EEG alpha and pupil diameter reflect endogenous auditory attention switching and listening effort

Published in:
Eur. J. Neurosci., 2022, pp. 1-16.

Summary

Everyday environments often contain distracting competing talkers and background noise, requiring listeners to focus their attention on one acoustic source and reject others. During this auditory attention task, listeners may naturally interrupt their sustained attention and switch attended sources. The effort required to perform this attention switch has not been well studied in the context of competing continuous speech. In this work, we developed two variants of endogenous attention switching and a sustained attention control. We characterized these three experimental conditions under the context of decoding auditory attention, while simultaneously evaluating listening effort and neural markers of spatial-audio cues. A least-squares, electroencephalography (EEG) based, attention decoding algorithm was implemented across all conditions. It achieved an accuracy of 69.4% and 64.0% when computed over non-overlapping 10 and 5-second correlation windows, respectively. Both decoders illustrated smooth transitions in the attended talker prediction through switches at approximately half of the analysis window size (e.g. the mean lag taken across the two switch conditions was 2.2 seconds when the 5-second correlation window was used). Expended listening effort, as measured by simultaneous EEG and pupillometry, was also a strong indicator of whether the listeners sustained attention or performed an endogenous attention switch (peak pupil diameter measure (p = 0.034) and minimum parietal alpha power measure (p = 0.016)). We additionally found evidence of talker spatial cues in the form of centrotemporal alpha power lateralization (p = 0.0428). These results suggest that listener effort and spatial cues may be promising features to pursue in a decoding context, in addition to speech-based features.
READ LESS

Summary

Everyday environments often contain distracting competing talkers and background noise, requiring listeners to focus their attention on one acoustic source and reject others. During this auditory attention task, listeners may naturally interrupt their sustained attention and switch attended sources. The effort required to perform this attention switch has not been...

READ MORE

Unsupervised Bayesian adaptation of PLDA for speaker verification

Published in:
Interspeech, 30 August - 3 September 2021.

Summary

This paper presents a Bayesian framework for unsupervised domain adaptation of Probabilistic Linear Discriminant Analysis (PLDA). By interpreting class labels as latent random variables, Variational Bayes (VB) is used to derive a maximum a posterior (MAP) solution of the adapted PLDA model when labels are missing, referred to as VB-MAP. The VB solution iteratively infers class labels and updates PLDA hyperparameters, offering a systematic framework for dealing with unlabeled data. While presented as a general solution, this paper includes experimental results for domain adaptation in speaker verification. VBMAP estimation is applied to the 2016 and 2018 NIST Speaker Recognition Evaluations (SREs), both of which included small and unlabeled in-domain data sets, and is shown to provide performance improvements over a variety of state-of-the-art domain adaptation methods. Additionally, VB-MAP estimation is used to train a fully unsupervised PLDA model, suffering only minor performance degradation relative to conventional supervised training, offering promise for training PLDA models when no relevant labeled data exists.
READ LESS

Summary

This paper presents a Bayesian framework for unsupervised domain adaptation of Probabilistic Linear Discriminant Analysis (PLDA). By interpreting class labels as latent random variables, Variational Bayes (VB) is used to derive a maximum a posterior (MAP) solution of the adapted PLDA model when labels are missing, referred to as VB-MAP...

READ MORE

Speaker separation in realistic noise environments with applications to a cognitively-controlled hearing aid

Summary

Future wearable technology may provide for enhanced communication in noisy environments and for the ability to pick out a single talker of interest in a crowded room simply by the listener shifting their attentional focus. Such a system relies on two components, speaker separation and decoding the listener's attention to acoustic streams in the environment. To address the former, we present a system for joint speaker separation and noise suppression, referred to as the Binaural Enhancement via Attention Masking Network (BEAMNET). The BEAMNET system is an end-to-end neural network architecture based on self-attention. Binaural input waveforms are mapped to a joint embedding space via a learned encoder, and separate multiplicative masking mechanisms are included for noise suppression and speaker separation. Pairs of output binaural waveforms are then synthesized using learned decoders, each capturing a separated speaker while maintaining spatial cues. A key contribution of BEAMNET is that the architecture contains a separation path, an enhancement path, and an autoencoder path. This paper proposes a novel loss function which simultaneously trains these paths, so that disabling the masking mechanisms during inference causes BEAMNET to reconstruct the input speech signals. This allows dynamic control of the level of suppression applied by BEAMNET via a minimum gain level, which is not possible in other state-of-the-art approaches to end-to-end speaker separation. This paper also proposes a perceptually-motivated waveform distance measure. Using objective speech quality metrics, the proposed system is demonstrated to perform well at separating two equal-energy talkers, even in high levels of background noise. Subjective testing shows an improvement in speech intelligibility across a range of noise levels, for signals with artificially added head-related transfer functions and background noise. Finally, when used as part of an auditory attention decoder (AAD) system using existing electroencephalogram (EEG) data, BEAMNET is found to maintain the decoding accuracy achieved with ideal speaker separation, even in severe acoustic conditions. These results suggest that this enhancement system is highly effective at decoding auditory attention in realistic noise environments, and could possibly lead to improved speech perception in a cognitively controlled hearing aid.
READ LESS

Summary

Future wearable technology may provide for enhanced communication in noisy environments and for the ability to pick out a single talker of interest in a crowded room simply by the listener shifting their attentional focus. Such a system relies on two components, speaker separation and decoding the listener's attention to...

READ MORE

Implicitly-defined neural networks for sequence labeling

Published in:
Annual Meeting of Assoc. of Computational Lingusitics, 31 July 2017.

Summary

In this work, we propose a novel, implicitly defined neural network architecture and describe a method to compute its components. The proposed architecture forgoes the causality assumption previously used to formulate recurrent neural networks and allow the hidden states of the network to coupled together, allowing potential improvement on problems with complex, long-distance dependencies. Initial experiments demonstrate the new architecture outperforms both the Stanford Parser and a baseline bidirectional network on the Penn Treebank Part-of-Speech tagging task and a baseline bidirectional network on an additional artificial random biased walk task.
READ LESS

Summary

In this work, we propose a novel, implicitly defined neural network architecture and describe a method to compute its components. The proposed architecture forgoes the causality assumption previously used to formulate recurrent neural networks and allow the hidden states of the network to coupled together, allowing potential improvement on problems...

READ MORE

Adaptive noise cancellation in a fighter cockpit environment

Published in:
ICASSP'84, IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 19-21 March 1984.

Summary

In this paper we discuss some preliminary results on using Widrow's Adaptive Noise Cancelling (ANC) algorithm to reduce the background noise present in a fighter pilot's speech. With a dominant noise source present and with the pilot wearing an oxygen facemask, we demonstrate that good (>10 dB) cancellation of the additive noise and little speech distortion can be achieved by having the reference microphone attached to the outside of the facemask and by updating the filter coefficients only during silence intervals.
READ LESS

Summary

In this paper we discuss some preliminary results on using Widrow's Adaptive Noise Cancelling (ANC) algorithm to reduce the background noise present in a fighter pilot's speech. With a dominant noise source present and with the pilot wearing an oxygen facemask, we demonstrate that good (>10 dB) cancellation of the...

READ MORE

The effects of microphones and facemasks on LPC vocoder performance

Author:
Published in:
Proc. of IEEE Int. Conf. on Acoustics, Speech & Signal Processing, 30 March - 1 April 1981.

Summary

The effects of oxygen facemasks and noise cancelling microphones on LPC vocoder performance were analyzed and evaluated. Likely sources of potential vocoder performance degradation included the non-ideal frequency response characteristics of the microphone and the possible presence of additional resonances in the speech waveform due to the addition of the facemask cavity. Examination of vowel spectra revealed that spurious resonances do not occur in the vocoder frequency band for speech generated using the facemask and microphone. Also observed was a vowel-dependent reduction in the bandwidths of the upper formants, a result which can be predicted from acoustic theory. Finally, it is shown that the low frequency emphasis associated with small enclosures is not relevant when using a pressure gradient (noise cancelling) microphone. Diagnostic Rhyme Tests involving three subjects indicated that the presence of the oxygen facemask and noise cancelling microphone did not result in a significant increase in the LPC vocoder processing loss.
READ LESS

Summary

The effects of oxygen facemasks and noise cancelling microphones on LPC vocoder performance were analyzed and evaluated. Likely sources of potential vocoder performance degradation included the non-ideal frequency response characteristics of the microphone and the possible presence of additional resonances in the speech waveform due to the addition of the...

READ MORE

A split band adaptive predictive coding (SBAPC) speech system

Published in:
IEEE Int. Conf. on Acoustics, Speech, & Signal Processing, 9-11 April 1980.

Summary

As developed by Atal and Schroeder [1], conventional Adaptive Predictive Coding (APC) of speech employs both vocal tract and pitch prediction to achieve a low energy, spectrally flattened residual. Errors in the pitch predictor can result in clipping errors which can propagate in the system for relatively long periods of time and degrade the quality of the synthesized speech. Makhoul and Berouti [2] have developed a high quality 16 kbps APC system which eliminates the pitch predictor by using a multi-level variable rate quantizer. In order to achieve comparable quality at even lower data rates, a split band APC (SBAPC) structure is proposed which employs the multi-level quantizer on the low frequency portion of the residual and a 1-bit quantizer on the high frequency portion of the residual.
READ LESS

Summary

As developed by Atal and Schroeder [1], conventional Adaptive Predictive Coding (APC) of speech employs both vocal tract and pitch prediction to achieve a low energy, spectrally flattened residual. Errors in the pitch predictor can result in clipping errors which can propagate in the system for relatively long periods of...

READ MORE