Publications

Refine Results

(Filters Applied) Clear All

Variability of speech timing features across repeated recordings: a comparison of open-source extraction techniques

Summary

Variations in speech timing features have been reliably linked to symptoms of various health conditions, demonstrating clinical potential. However, replication challenges hinder their
translation; extracted speech features are susceptible to methodological variations in the recording and processing pipeline. Investigating this, we compared exemplar timing features extracted via three different techniques from recordings of healthy speech. Our results show that features extracted via an intensity-based method differ from those produced by forced alignment. Different extraction methods also led to differing estimates of within-speaker feature variability over time in an analysis of recordings repeated systematically over three sessions in one day (n=26) and in one week (n=28). Our findings highlight the importance of feature extraction in study design and interpretation, and the need for consistent, accurate extraction techniques for clinical research.
READ LESS

Summary

Variations in speech timing features have been reliably linked to symptoms of various health conditions, demonstrating clinical potential. However, replication challenges hinder their
translation; extracted speech features are susceptible to methodological variations in the recording and processing pipeline. Investigating this, we compared exemplar timing features extracted via three different techniques...

READ MORE

An exploratory characterization of speech- and fine-motor coordination in verbal children with Autism spectrum disorder

Summary

Autism spectrum disorder (ASD) is a neurodevelopmental disorder often associated with difficulties in speech production and fine-motor tasks. Thus, there is a need to develop objective measures to assess and understand speech production and other fine-motor challenges in individuals with ASD. In addition, recent research suggests that difficulties with speech production and fine-motor tasks may contribute to language difficulties in ASD. In this paper, we explore the utility of an off-body recording platform, from which we administer a speech- and fine-motor protocol to verbal children with ASD and neurotypical controls. We utilize a correlation-based analysis technique to develop proxy measures of motor coordination from signals derived from recordings of speech- and fine-motor behaviors. Eigenvalues of the resulting correlation matrix are inputs to Gaussian Mixture Models to discriminate between highly-verbal children with ASD and neurotypical controls. These eigenvalues also characterize the complexity (underlying dimensionality) of representative signals of speech- and fine-motor movement dynamics, and form the feature basis to estimate scores on an expressive vocabulary measure. Based on a pilot dataset (15 ASD and 15 controls), features derived from an oral story reading task are used in discriminating between the two groups with AUCs > 0.80, and highlight lower complexity of coordination in children with ASD. Features derived from handwriting and maze tracing tasks led to AUCs of 0.86 and 0.91, however features derived from ocular tasks did not aid in discrimination between the ASD and neurotypical groups. In addition, features derived from free speech and sustained vowel tasks are strongly correlated with expressive vocabulary scores. These results indicate the promise of a correlation-based analysis in elucidating motor differences between individuals with ASD and neurotypical controls.
READ LESS

Summary

Autism spectrum disorder (ASD) is a neurodevelopmental disorder often associated with difficulties in speech production and fine-motor tasks. Thus, there is a need to develop objective measures to assess and understand speech production and other fine-motor challenges in individuals with ASD. In addition, recent research suggests that difficulties with speech...

READ MORE

A neurophysiological-auditory "listen receipt" for communication enhancement

Published in:
49th IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, ICASSP, 14-19 April 2024.

Summary

Information overload, and specifically auditory overload, is common in critical situations and detrimental to communication. Currently, there is no auditory equivalent of an email read receipt to know if a person has heard a message, other than waiting for a reply. This work hypothesizes that it may be possible to decode whether a person has indeed heard a message, or in other words, create an an auditory "listen receipt," through use of non-invasive physiological or neural monitoring. We extracted a variety of features derived from Electrodermal activity (EDA), Electroencephalography (EEG), and the correlations between the acoustic envelope of the radio message and EEG to use in the decoder. We were able to classify the cases in which the subject responded correctly to the question in the message, versus the cases where they missed or heard the message incorrectly, with an accuracy of 79% and a receiver operating characteristic (ROC) area under the curve (AUC) of 0.83. This work suggests that the concept of a "listen receipt" may be possible, and future wearable machine-brain interface technologies may be able to automatically determine if an important radio message has been missed for both human-to-human and human-to-machine communication.
READ LESS

Summary

Information overload, and specifically auditory overload, is common in critical situations and detrimental to communication. Currently, there is no auditory equivalent of an email read receipt to know if a person has heard a message, other than waiting for a reply. This work hypothesizes that it may be possible to...

READ MORE

Quantifying speech production coordination from non- and minimally-speaking individuals

Published in:
J. Autism Dev. Disord., 13 April 2024.

Summary

Purpose: Non-verbal utterances are an important tool of communication for individuals who are non- or minimally-speaking. While these utterances are typically understood by caregivers, they can be challenging to interpret by their larger community. To date, there has been little work done to detect and characterize the vocalizations produced by non- or minimally-speaking individuals. This paper aims to characterize five categories of utterances across a set of 7 non- or minimally-speaking individuals. Methods: The characterization is accomplished using a correlation structure methodology, acting as a proxy measurement for motor coordination, to localize similarities and differences to specific speech production systems. Results: We specifically find that frustrated and dysregulated utterances show similar correlation structure outputs, especially when compared to self-talk, request, and delighted utterances. We additionally witness higher complexity of coordination between articulatory and respiratory subsystems and lower complexity of coordination between laryngeal and respiratory subsystems in frustration and dysregulation as compared to self-talk, request, and delight. Finally, we observe lower complexity of coordination across all three speech subsystems in the request utterances as compared to self-talk and delight. Conclusion: The insights from this work aid in understanding of the modifications made by non- or minimally-speaking individuals to accomplish specific goals in non-verbal communication.
READ LESS

Summary

Purpose: Non-verbal utterances are an important tool of communication for individuals who are non- or minimally-speaking. While these utterances are typically understood by caregivers, they can be challenging to interpret by their larger community. To date, there has been little work done to detect and characterize the vocalizations produced by...

READ MORE

A vocal model to predict readiness under sleep deprivation

Published in:
Proc. 2023 IEEE 19th Intl. Conf. on Body Sensor Networks, BSN, 9-11 October 2023.

Summary

A variety of factors can affect cognitive readiness and influence human performance in tasks that are mission critical. Sleep deprivation is one of the most prevalent factors that degrade performance. One risk-mitigation approach is to use vocal biomarkers to detect cognitive fatigue and resulting performance decrements. In this study, a group of 20 subjects were deprived of sleep for a period of 24 hours. Every two hours, they performed a battery of both speech tasks and cognitive performance tasks, including the psychomotor vigilance test (PVT). Performance on the PVT declined dramatically during nighttime hours between 2 AM and 8 AM. We demonstrate that a model using vocal biomarkers from read speech and free speech can be successfully trained to detect performance decrements on the PVT. We also demonstrate that the vocal model successfully generalizes to other outcomes at a similar level as PVT, detecting sleep deprivation (AUC=0.79) and cognitive performance declines on a battery of cognitive tasks (AUC=0.79). In comparison, using PVT as the basis for detecting sleep deprivation and performance declines resulted in AUC=0.75 and AUC=0.80, respectively.
READ LESS

Summary

A variety of factors can affect cognitive readiness and influence human performance in tasks that are mission critical. Sleep deprivation is one of the most prevalent factors that degrade performance. One risk-mitigation approach is to use vocal biomarkers to detect cognitive fatigue and resulting performance decrements. In this study, a...

READ MORE

Towards robust paralinguistic assessment for real-world mobile health (mHealth) monitoring: an initial study of reverberation effects on speech

Published in:
Proc. Annual Conf. Intl. Speech Communication Assoc., INTERSPEECH 2023, 20-24 August 2023, pp. 2373-77.

Summary

Speech is promising as an objective, convenient tool to monitor health remotely over time using mobile devices. Numerous paralinguistic features have been demonstrated to contain salient information related to an individual's health. However, mobile device specification and acoustic environments vary widely, risking the reliability of the extracted features. In an initial step towards quantifying these effects, we report the variability of 13 exemplar paralinguistic features commonly reported in the speech-health literature and extracted from the speech of 42 healthy volunteers recorded consecutively in rooms with low and high reverberation with one budget and two higher-end smartphones, and a condenser microphone. Our results show reverberation has a clear effect on several features, in particular voice quality markers. They point to new research directions investigating how best to record and process in-the-wild speech for reliable longitudinal health state assessment.
READ LESS

Summary

Speech is promising as an objective, convenient tool to monitor health remotely over time using mobile devices. Numerous paralinguistic features have been demonstrated to contain salient information related to an individual's health. However, mobile device specification and acoustic environments vary widely, risking the reliability of the extracted features. In an...

READ MORE

ReCANVo: A database of real-world communicative and affective nonverbal vocalizations

Published in:
Sci. Data, Vol. 10, No. 1, 5 August 2023, 523.

Summary

Nonverbal vocalizations, such as sighs, grunts, and yells, are informative expressions within typical verbal speech. Likewise, individuals who produce 0-10 spoken words or word approximations ("minimally speaking" individuals) convey rich affective and communicative information through nonverbal vocalizations even without verbal speech. Yet, despite their rich content, little to no data exists on the vocal expressions of this population. Here, we present ReCANVo: Real-World Communicative and Affective Nonverbal Vocalizations - a novel dataset of non-speech vocalizations labeled by function from minimally speaking individuals. The ReCANVo database contains over 7000 vocalizations spanning communicative and affective functions from eight minimally speaking individuals, along with communication profiles for each participant. Vocalizations were recorded in real-world settings and labeled in real-time by a close family member who knew the communicator well and had access to contextual information while labeling. ReCANVo is a novel database of nonverbal vocalizations from minimally speaking individuals, the largest available dataset of nonverbal vocalizations, and one of the only affective speech datasets collected amidst daily life across contexts.
READ LESS

Summary

Nonverbal vocalizations, such as sighs, grunts, and yells, are informative expressions within typical verbal speech. Likewise, individuals who produce 0-10 spoken words or word approximations ("minimally speaking" individuals) convey rich affective and communicative information through nonverbal vocalizations even without verbal speech. Yet, despite their rich content, little to no data...

READ MORE

Dissociating COVID-19 from other respiratory infections based on acoustic, motor coordination, and phonemic patterns

Published in:
Sci. Rep., Vol. 13, No. 1, January 2023, 1567.

Summary

In the face of the global pandemic caused by the disease COVID-19, researchers have increasingly turned to simple measures to detect and monitor the presence of the disease in individuals at home. We sought to determine if measures of neuromotor coordination, derived from acoustic time series, as well as phoneme-based and standard acoustic features extracted from recordings of simple speech tasks could aid in detecting the presence of COVID-19. We further hypothesized that these features would aid in characterizing the effect of COVID-19 on speech production systems. A protocol, consisting of a variety of speech tasks, was administered to 12 individuals with COVID-19 and 15 individuals with other viral infections at University Hospital Galway. From these recordings, we extracted a set of acoustic time series representative of speech production subsystems, as well as their univariate statistics. The time series were further utilized to derive correlation-based features, a proxy for speech production motor coordination. We additionally extracted phoneme-based features. These features were used to create machine learning models to distinguish between the COVID-19 positive and other viral infection groups, with respiratory- and laryngeal-based features resulting in the highest performance. Coordination-based features derived from harmonic-to-noise ratio time series from read speech discriminated between the two groups with an area under the ROC curve (AUC) of 0.94. A longitudinal case study of two subjects, one from each group, revealed differences in laryngeal based acoustic features, consistent with observed physiological differences between the two groups. The results from this analysis highlight the promise of using nonintrusive sensing through simple speech recordings for early warning and tracking of COVID-19.
READ LESS

Summary

In the face of the global pandemic caused by the disease COVID-19, researchers have increasingly turned to simple measures to detect and monitor the presence of the disease in individuals at home. We sought to determine if measures of neuromotor coordination, derived from acoustic time series, as well as phoneme-based...

READ MORE

An emotion-driven vocal biomarker-based PTSD screening tool

Summary

This paper introduces an automated post-traumatic stress disorder (PTSD) screening tool that could potentially be used as a self-assessment or inserted into routine medical visits for PTSD diagnosis and treatment. Methods: With an emotion estimation algorithm providing arousal (excited to calm) and valence (pleasure to displeasure) levels through discourse, we select regions of the acoustic signal that are most salient for PTSD detection. Our algorithm was tested on a subset of data from the DVBIC-TBICoE TBI Study, which contains PTSD Check List Civilian (PCL-C) assessment scores. Results: Speech from low-arousal and positive-valence regions provide the best discrimination for PTSD. Our model achieved an AUC (area under the curve) equal to 0.80 in detecting PCL-C ratings, outperforming models with no emotion filtering (AUC = 0.68). Conclusions: This result suggests that emotion drives the selection of the most salient temporal regions of an audio recording for PTSD detection.
READ LESS

Summary

This paper introduces an automated post-traumatic stress disorder (PTSD) screening tool that could potentially be used as a self-assessment or inserted into routine medical visits for PTSD diagnosis and treatment. Methods: With an emotion estimation algorithm providing arousal (excited to calm) and valence (pleasure to displeasure) levels through discourse, we...

READ MORE

Affective ratings of nonverbal vocalizations produced by minimally-speaking individuals: What do native listeners perceive?

Published in:
10th Intl. Conf. Affective Computing and Intelligent Interaction, ACII, 18-21 October 2022.

Summary

Individuals who produce few spoken words ("minimally-speaking" individuals) often convey rich affective and communicative information through nonverbal vocalizations, such as grunts, yells, babbles, and monosyllabic expressions. Yet, little data exists on the affective content of the vocal expressions of this population. Here, we present 78,624 arousal and valence ratings of nonverbal vocalizations from the online ReCANVo (Real-World Communicative and Affective Nonverbal Vocalizations) database. This dataset contains over 7,000 vocalizations that have been labeled with their expressive functions (delight, frustration, etc.) from eight minimally-speaking individuals. Our results suggest that raters who have no knowledge of the context or meaning of a nonverbal vocalization are still able to detect arousal and valence differences between different types of vocalizations based on Likert-scale ratings. Moreover, these ratings are consistent with hypothesized arousal and valence rankings for the different vocalization types. Raters are also able to detect arousal and valence differences between different vocalization types within individual speakers. To our knowledge, this is the first large-scale analysis of affective content within nonverbal vocalizations from minimally verbal individuals. These results complement affective computing research of nonverbal vocalizations that occur within typical verbal speech (e.g., grunts, sighs) and serve as a foundation for further understanding of how humans perceive emotions in sounds.
READ LESS

Summary

Individuals who produce few spoken words ("minimally-speaking" individuals) often convey rich affective and communicative information through nonverbal vocalizations, such as grunts, yells, babbles, and monosyllabic expressions. Yet, little data exists on the affective content of the vocal expressions of this population. Here, we present 78,624 arousal and valence ratings of...

READ MORE