Publications

Refine Results

(Filters Applied) Clear All

Variability of speech timing features across repeated recordings: a comparison of open-source extraction techniques

Summary

Variations in speech timing features have been reliably linked to symptoms of various health conditions, demonstrating clinical potential. However, replication challenges hinder their
translation; extracted speech features are susceptible to methodological variations in the recording and processing pipeline. Investigating this, we compared exemplar timing features extracted via three different techniques from recordings of healthy speech. Our results show that features extracted via an intensity-based method differ from those produced by forced alignment. Different extraction methods also led to differing estimates of within-speaker feature variability over time in an analysis of recordings repeated systematically over three sessions in one day (n=26) and in one week (n=28). Our findings highlight the importance of feature extraction in study design and interpretation, and the need for consistent, accurate extraction techniques for clinical research.
READ LESS

Summary

Variations in speech timing features have been reliably linked to symptoms of various health conditions, demonstrating clinical potential. However, replication challenges hinder their
translation; extracted speech features are susceptible to methodological variations in the recording and processing pipeline. Investigating this, we compared exemplar timing features extracted via three different techniques...

READ MORE

An exploratory characterization of speech- and fine-motor coordination in verbal children with Autism spectrum disorder

Summary

Autism spectrum disorder (ASD) is a neurodevelopmental disorder often associated with difficulties in speech production and fine-motor tasks. Thus, there is a need to develop objective measures to assess and understand speech production and other fine-motor challenges in individuals with ASD. In addition, recent research suggests that difficulties with speech production and fine-motor tasks may contribute to language difficulties in ASD. In this paper, we explore the utility of an off-body recording platform, from which we administer a speech- and fine-motor protocol to verbal children with ASD and neurotypical controls. We utilize a correlation-based analysis technique to develop proxy measures of motor coordination from signals derived from recordings of speech- and fine-motor behaviors. Eigenvalues of the resulting correlation matrix are inputs to Gaussian Mixture Models to discriminate between highly-verbal children with ASD and neurotypical controls. These eigenvalues also characterize the complexity (underlying dimensionality) of representative signals of speech- and fine-motor movement dynamics, and form the feature basis to estimate scores on an expressive vocabulary measure. Based on a pilot dataset (15 ASD and 15 controls), features derived from an oral story reading task are used in discriminating between the two groups with AUCs > 0.80, and highlight lower complexity of coordination in children with ASD. Features derived from handwriting and maze tracing tasks led to AUCs of 0.86 and 0.91, however features derived from ocular tasks did not aid in discrimination between the ASD and neurotypical groups. In addition, features derived from free speech and sustained vowel tasks are strongly correlated with expressive vocabulary scores. These results indicate the promise of a correlation-based analysis in elucidating motor differences between individuals with ASD and neurotypical controls.
READ LESS

Summary

Autism spectrum disorder (ASD) is a neurodevelopmental disorder often associated with difficulties in speech production and fine-motor tasks. Thus, there is a need to develop objective measures to assess and understand speech production and other fine-motor challenges in individuals with ASD. In addition, recent research suggests that difficulties with speech...

READ MORE

A neurophysiological-auditory "listen receipt" for communication enhancement

Published in:
49th IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 14-19 April 2024.

Summary

Information overload, and specifically auditory overload, is common in critical situations and detrimental to communication. Currently, there is no auditory equivalent of an email read receipt to know if a person has heard a message, other than waiting for a reply. This work hypothesizes that it may be possible to decode whether a person has indeed heard a message, or in other words, create an an auditory "listen receipt," through use of non-invasive physiological or neural monitoring. We extracted a variety of features derived from Electrodermal activity (EDA), Electroencephalography (EEG), and the correlations between the acoustic envelope of the radio message and EEG to use in the decoder. We were able to classify the cases in which the subject responded correctly to the question in the message, versus the cases where they missed or heard the message incorrectly, with an accuracy of 79% and a receiver operating characteristic (ROC) area under the curve (AUC) of 0.83. This work suggests that the concept of a "listen receipt" may be possible, and future wearable machine-brain interface technologies may be able to automatically determine if an important radio message has been missed for both human-to-human and human-to-machine communication.
READ LESS

Summary

Information overload, and specifically auditory overload, is common in critical situations and detrimental to communication. Currently, there is no auditory equivalent of an email read receipt to know if a person has heard a message, other than waiting for a reply. This work hypothesizes that it may be possible to...

READ MORE

Towards robust paralinguistic assessment for real-world mobile health (mHealth) monitoring: an initial study of reverberation effects on speech

Published in:
Proc. Annual Conf. Intl. Speech Communication Assoc., INTERSPEECH 2023, 20-24 August 2023, pp. 2373-77.

Summary

Speech is promising as an objective, convenient tool to monitor health remotely over time using mobile devices. Numerous paralinguistic features have been demonstrated to contain salient information related to an individual's health. However, mobile device specification and acoustic environments vary widely, risking the reliability of the extracted features. In an initial step towards quantifying these effects, we report the variability of 13 exemplar paralinguistic features commonly reported in the speech-health literature and extracted from the speech of 42 healthy volunteers recorded consecutively in rooms with low and high reverberation with one budget and two higher-end smartphones, and a condenser microphone. Our results show reverberation has a clear effect on several features, in particular voice quality markers. They point to new research directions investigating how best to record and process in-the-wild speech for reliable longitudinal health state assessment.
READ LESS

Summary

Speech is promising as an objective, convenient tool to monitor health remotely over time using mobile devices. Numerous paralinguistic features have been demonstrated to contain salient information related to an individual's health. However, mobile device specification and acoustic environments vary widely, risking the reliability of the extracted features. In an...

READ MORE

ReCANVo: A database of real-world communicative and affective nonverbal vocalizations

Published in:
Sci. Data, Vol. 10, No. 1, 5 August 2023, 523.

Summary

Nonverbal vocalizations, such as sighs, grunts, and yells, are informative expressions within typical verbal speech. Likewise, individuals who produce 0-10 spoken words or word approximations ("minimally speaking" individuals) convey rich affective and communicative information through nonverbal vocalizations even without verbal speech. Yet, despite their rich content, little to no data exists on the vocal expressions of this population. Here, we present ReCANVo: Real-World Communicative and Affective Nonverbal Vocalizations - a novel dataset of non-speech vocalizations labeled by function from minimally speaking individuals. The ReCANVo database contains over 7000 vocalizations spanning communicative and affective functions from eight minimally speaking individuals, along with communication profiles for each participant. Vocalizations were recorded in real-world settings and labeled in real-time by a close family member who knew the communicator well and had access to contextual information while labeling. ReCANVo is a novel database of nonverbal vocalizations from minimally speaking individuals, the largest available dataset of nonverbal vocalizations, and one of the only affective speech datasets collected amidst daily life across contexts.
READ LESS

Summary

Nonverbal vocalizations, such as sighs, grunts, and yells, are informative expressions within typical verbal speech. Likewise, individuals who produce 0-10 spoken words or word approximations ("minimally speaking" individuals) convey rich affective and communicative information through nonverbal vocalizations even without verbal speech. Yet, despite their rich content, little to no data...

READ MORE

Dissociating COVID-19 from other respiratory infections based on acoustic, motor coordination, and phonemic patterns

Published in:
Sci. Rep., Vol. 13, No. 1, January 2023, 1567.

Summary

In the face of the global pandemic caused by the disease COVID-19, researchers have increasingly turned to simple measures to detect and monitor the presence of the disease in individuals at home. We sought to determine if measures of neuromotor coordination, derived from acoustic time series, as well as phoneme-based and standard acoustic features extracted from recordings of simple speech tasks could aid in detecting the presence of COVID-19. We further hypothesized that these features would aid in characterizing the effect of COVID-19 on speech production systems. A protocol, consisting of a variety of speech tasks, was administered to 12 individuals with COVID-19 and 15 individuals with other viral infections at University Hospital Galway. From these recordings, we extracted a set of acoustic time series representative of speech production subsystems, as well as their univariate statistics. The time series were further utilized to derive correlation-based features, a proxy for speech production motor coordination. We additionally extracted phoneme-based features. These features were used to create machine learning models to distinguish between the COVID-19 positive and other viral infection groups, with respiratory- and laryngeal-based features resulting in the highest performance. Coordination-based features derived from harmonic-to-noise ratio time series from read speech discriminated between the two groups with an area under the ROC curve (AUC) of 0.94. A longitudinal case study of two subjects, one from each group, revealed differences in laryngeal based acoustic features, consistent with observed physiological differences between the two groups. The results from this analysis highlight the promise of using nonintrusive sensing through simple speech recordings for early warning and tracking of COVID-19.
READ LESS

Summary

In the face of the global pandemic caused by the disease COVID-19, researchers have increasingly turned to simple measures to detect and monitor the presence of the disease in individuals at home. We sought to determine if measures of neuromotor coordination, derived from acoustic time series, as well as phoneme-based...

READ MORE

Affective ratings of nonverbal vocalizations produced by minimally-speaking individuals: What do native listeners perceive?

Published in:
10th Intl. Conf. Affective Computing and Intelligent Interaction, ACII, 18-21 October 2022.

Summary

Individuals who produce few spoken words ("minimally-speaking" individuals) often convey rich affective and communicative information through nonverbal vocalizations, such as grunts, yells, babbles, and monosyllabic expressions. Yet, little data exists on the affective content of the vocal expressions of this population. Here, we present 78,624 arousal and valence ratings of nonverbal vocalizations from the online ReCANVo (Real-World Communicative and Affective Nonverbal Vocalizations) database. This dataset contains over 7,000 vocalizations that have been labeled with their expressive functions (delight, frustration, etc.) from eight minimally-speaking individuals. Our results suggest that raters who have no knowledge of the context or meaning of a nonverbal vocalization are still able to detect arousal and valence differences between different types of vocalizations based on Likert-scale ratings. Moreover, these ratings are consistent with hypothesized arousal and valence rankings for the different vocalization types. Raters are also able to detect arousal and valence differences between different vocalization types within individual speakers. To our knowledge, this is the first large-scale analysis of affective content within nonverbal vocalizations from minimally verbal individuals. These results complement affective computing research of nonverbal vocalizations that occur within typical verbal speech (e.g., grunts, sighs) and serve as a foundation for further understanding of how humans perceive emotions in sounds.
READ LESS

Summary

Individuals who produce few spoken words ("minimally-speaking" individuals) often convey rich affective and communicative information through nonverbal vocalizations, such as grunts, yells, babbles, and monosyllabic expressions. Yet, little data exists on the affective content of the vocal expressions of this population. Here, we present 78,624 arousal and valence ratings of...

READ MORE

Modeling real-world affective and communicative nonverbal vocalizations from minimally speaking individuals

Published in:
IEEE Trans. on Affect. Comput., Vol. 13, No. 4, October 2022, pp. 2238-53.

Summary

Nonverbal vocalizations from non- and minimally speaking individuals (mv*) convey important communicative and affective information. While nonverbal vocalizations that occur amidst typical speech and infant vocalizations have been studied extensively in the literature, there is limited prior work on vocalizations by mv* individuals. Our work is among the first studies of the communicative and affective information expressed in nonverbal vocalizations by mv* children and adults. We collected labeled vocalizations in real-world settings with eight mv* communicators, with communicative and affective labels provided in-the-moment by a close family member. Using evaluation strategies suitable for messy, real-world data, we show that nonverbal vocalizations can be classified by function (with 4- and 5-way classifications) with F1 scores above chance for all participants. We analyze labeling and data collection practices for each participating family, and discuss the classification results in the context of our novel real-world data collection protocol. The presented work includes results from the largest classification experiments with nonverbal vocalizations from mv* communicators to date.
READ LESS

Summary

Nonverbal vocalizations from non- and minimally speaking individuals (mv*) convey important communicative and affective information. While nonverbal vocalizations that occur amidst typical speech and infant vocalizations have been studied extensively in the literature, there is limited prior work on vocalizations by mv* individuals. Our work is among the first studies...

READ MORE

Bayesian estimation of PLDA in the presence of noisy training labels, with applications to speaker verification

Published in:
IEEE/ACM Trans. Audio, Speech, Language Process., Vol. 30, 2022, pp. 414-28.

Summary

This paper presents a Bayesian framework for estimating a Probabilistic Linear Discriminant Analysis (PLDA) model in the presence of noisy labels. True class labels are interpreted as latent random variables, which are transmitted through a noisy channel, and received as observed speaker labels. The labeling process is modeled as a Discrete Memoryless Channel (DMC). PLDA hyperparameters are interpreted as random variables, and their joint posterior distribution is derived using meanfield Variational Bayes, allowing maximum a posteriori (MAP) estimates of the PLDA model parameters to be determined. The proposed solution, referred to as VB-MAP, is presented as a general framework, but is studied in the context of speaker verification, and a variety of use cases are discussed. Specifically, VB-MAP can be used for PLDA estimation with unreliable labels, unsupervised PLDA estimation, and to infer the reliability of a PLDA training set. Experimental results show the proposed approach to provide significant performance improvements on a variety of NIST Speaker Recognition Evaluation (SRE) tasks, both for data sets with simulated mislabels, and for data sets with naturally occurring missing or unreliable labels.
READ LESS

Summary

This paper presents a Bayesian framework for estimating a Probabilistic Linear Discriminant Analysis (PLDA) model in the presence of noisy labels. True class labels are interpreted as latent random variables, which are transmitted through a noisy channel, and received as observed speaker labels. The labeling process is modeled as a...

READ MORE

Speech as a biomarker: opportunities, interoperability, and challenges

Published in:
Perspectives of the ASHA Special Interest Groups, Vo. 7, February 2022, pp. 276-83.

Summary

Purpose: Over the past decade, the signal processing and machine learning literature has demonstrated notable advancements in automated speech processing with the use of artificial intelligence for medical assessment and monitoring (e.g., depression, dementia, and Parkinson's disease, among others). Meanwhile, the clinical speech literature has identified several interpretable, theoretically motivated measures that are sensitive to abnormalities in the cognitive, linguistic, affective, motoric, and anatomical domains. Both fields have, thus, independently demonstrated the potential for speech to serve as an informative biomarker for detecting different psychiatric and physiological conditions. However, despite these parallel advancements, automated speech biomarkers have not been integrated into routine clinical practice to date. Conclusions: In this article, we present opportunities and challenges for adoption of speech as a biomarker in clinical practice and research. Toward clinical acceptance and adoption of speech-based digital biomarkers, we argue for the importance of several factors such as robustness, specificity, diversity, and physiological interpretability of speech analytics in clinical applications.
READ LESS

Summary

Purpose: Over the past decade, the signal processing and machine learning literature has demonstrated notable advancements in automated speech processing with the use of artificial intelligence for medical assessment and monitoring (e.g., depression, dementia, and Parkinson's disease, among others). Meanwhile, the clinical speech literature has identified several interpretable, theoretically motivated...

READ MORE

Showing Results

1-10 of 17