Publications

Refine Results

(Filters Applied) Clear All

Corpora design and score calibration for text dependent pronunciation proficiency recognition

Published in:
8th ISCA Workshop on Speech and Language Technology in Education, SLaTe 2019, 20-21 September 2019.

Summary

This work investigates methods for improving a pronunciation proficiency recognition system, both in terms of phonetic level posterior probability calibration, and in ordinal utterance level classification, for Modern Standard Arabic (MSA), Spanish and Russian. To support this work, utterance level labels were obtained by crowd-sourcing the annotation of language learners' recordings. Phonetic posterior probability estimates extracted using automatic speech recognition systems trained in each language were estimated using a beta calibration approach [1] and language proficiency level was estimated using an ordinal regression [2]. Fusion with language recognition (LR) scores from an i-vector system [3] trained on 23 languages is also explored. Initial results were promising for all three languages and it was demonstrated that the calibrated posteriors were effective for predicting pronunciation proficiency. Significant relative gains of 16% mean absolute error for the ordinal regression and 17% normalized cross entropy for the binary beta regression were achieved on MSA through fusion with LR scores.
READ LESS

Summary

This work investigates methods for improving a pronunciation proficiency recognition system, both in terms of phonetic level posterior probability calibration, and in ordinal utterance level classification, for Modern Standard Arabic (MSA), Spanish and Russian. To support this work, utterance level labels were obtained by crowd-sourcing the annotation of language learners'...

READ MORE

Using K-means in SVR-based text difficulty estimation

Published in:
8th ISCA Workshop on Speech and Language Technology in Education, SLaTE, 20-21 September 2019.

Summary

A challenge for second language learners, educators, and test creators is the identification of authentic materials at the right level of difficulty. In this work, we present an approach to automatically measure text difficulty, integrated into Auto-ILR, a web-based system that helps find text material at the right level for learners in 18 languages. The Auto-ILR subscription service scans web feeds, extracts article content, evaluates the difficulty, and notifies users of documents that match their skill level. Difficulty is measured on the standard ILR scale with language-specific support vector machine regression (SVR) models built from vectors incorporating length features, term frequencies, relative entropy, and K-means clustering.
READ LESS

Summary

A challenge for second language learners, educators, and test creators is the identification of authentic materials at the right level of difficulty. In this work, we present an approach to automatically measure text difficulty, integrated into Auto-ILR, a web-based system that helps find text material at the right level for...

READ MORE

The leakage-resilience dilemma

Published in:
Proc. European Symp. on Research in Computer Security, ESORICS 2019, pp. 87-106.

Summary

Many control-flow-hijacking attacks rely on information leakage to disclose the location of gadgets. To address this, several leakage-resilient defenses, have been proposed that fundamentally limit the power of information leakage. Examples of such defenses include address-space re-randomization, destructive code reads, and execute-only code memory. Underlying all of these defenses is some form of code randomization. In this paper, we illustrate that randomization at the granularity of a page or coarser is not secure, and can be exploited by generalizing the idea of partial pointer overwrites, which we call the Relative ROP (RelROP) attack. We then analyzed more that 1,300 common binaries and found that 94% of them contained sufficient gadgets for an attacker to spawn a shell. To demonstrate this concretely, we built a proof-of-concept exploit against PHP 7.0.0. Furthermore, randomization at a granularity finer than a memory page faces practicality challenges when applied to shared libraries. Our findings highlight the dilemma that faces randomization techniques: course-grained techniques are efficient but insecure and fine-grained techniques are secure but impractical.
READ LESS

Summary

Many control-flow-hijacking attacks rely on information leakage to disclose the location of gadgets. To address this, several leakage-resilient defenses, have been proposed that fundamentally limit the power of information leakage. Examples of such defenses include address-space re-randomization, destructive code reads, and execute-only code memory. Underlying all of these defenses is...

READ MORE

State-of-the-art speaker recognition for telephone and video speech: the JHU-MIT submission for NIST SRE18

Summary

We present a condensed description of the joint effort of JHUCLSP, JHU-HLTCOE, MIT-LL., MIT CSAIL and LSE-EPITA for NIST SRE18. All the developed systems consisted of xvector/i-vector embeddings with some flavor of PLDA backend. Very deep x-vector architectures–Extended and Factorized TDNN, and ResNets– clearly outperformed shallower xvectors and i-vectors. The systems were tailored to the video (VAST) or to the telephone (CMN2) condition. The VAST data was challenging, yielding 4 times worse performance than other video based datasets like Speakers in the Wild. We were able to calibrate the VAST data with very few development trials by using careful adaptation and score normalization methods. The VAST primary fusion yielded EER=10.18% and Cprimary= 0.431. By improving calibration in post-eval, we reached Cprimary=0.369. In CMN2, we used unsupervised SPLDA adaptation based on agglomerative clustering and score normalization to correct the domain shift between English and Tunisian Arabic models. The CMN2 primary fusion yielded EER=4.5% and Cprimary=0.313. Extended TDNN x-vector was the best single system obtaining EER=11.1% and Cprimary=0.452 in VAST; and 4.95% and 0.354 in CMN2.
READ LESS

Summary

We present a condensed description of the joint effort of JHUCLSP, JHU-HLTCOE, MIT-LL., MIT CSAIL and LSE-EPITA for NIST SRE18. All the developed systems consisted of xvector/i-vector embeddings with some flavor of PLDA backend. Very deep x-vector architectures–Extended and Factorized TDNN, and ResNets– clearly outperformed shallower xvectors and i-vectors. The...

READ MORE

Monetized weather radar network benefits for tornado cost reduction

Author:
Published in:
MIT Lincoln Laboratory Report NOAA-35

Summary

A monetized tornado benefit model is developed for arbitrary weather radar network configurations. Geospatial regression analyses indicate that improvement in two key radar coverage parameters--fraction of vertical space observed and cross-range horizontal resolution--lead to better tornado warning performance as characterized by tornado detection probability and false alarm ratio. Previous experimental results showing faster volume scan rates yielding greater warning performance, including increased lead times, are also incorporated into the model. Enhanced tornado warning performance, in turn, reduces casualty rates. In combination, then, it is clearly established that better and faster radar observations reduce tornado casualty rates. Furthermore, lower false alarm ratios save costs by cutting down on people's time lost when taking shelter.
READ LESS

Summary

A monetized tornado benefit model is developed for arbitrary weather radar network configurations. Geospatial regression analyses indicate that improvement in two key radar coverage parameters--fraction of vertical space observed and cross-range horizontal resolution--lead to better tornado warning performance as characterized by tornado detection probability and false alarm ratio. Previous experimental...

READ MORE

Guest editorial: special issue on hardware solutions for cyber security

Published in:
J. Hardw. Syst. Secur., Vol. 3, No. 199, 2019.

Summary

A cyber system could be viewed as an architecture consisting of application software, system software, and system hardware. The hardware layer, being at the foundation of the overall architecture, must be secure itself and also provide effective security features to the software layers. In order to seamlessly integrate security hardware into a system with minimal performance compromises, designers must develop and understand tangible security specifications and metrics to trade between security, performance, and cost for an optimal solution. Hardware security components, libraries, and reference architecture are increasingly important in system design and security. This special issue includes four exciting manuscripts on several aspects of developing hardware-oriented security for systems.
READ LESS

Summary

A cyber system could be viewed as an architecture consisting of application software, system software, and system hardware. The hardware layer, being at the foundation of the overall architecture, must be secure itself and also provide effective security features to the software layers. In order to seamlessly integrate security hardware...

READ MORE

Comparison of two-talker attention decoding from EEG with nonlinear neural networks and linear methods

Summary

Auditory attention decoding (AAD) through a brain-computer interface has had a flowering of developments since it was first introduced by Mesgarani and Chang (2012) using electrocorticograph recordings. AAD has been pursued for its potential application to hearing-aid design in which an attention-guided algorithm selects, from multiple competing acoustic sources, which should be enhanced for the listener and which should be suppressed. Traditionally, researchers have separated the AAD problem into two stages: reconstruction of a representation of the attended audio from neural signals, followed by determining the similarity between the candidate audio streams and the reconstruction. Here, we compare the traditional two-stage approach with a novel neural-network architecture that subsumes the explicit similarity step. We compare this new architecture against linear and non-linear (neural-network) baselines using both wet and dry electroencephalogram (EEG) systems. Our results indicate that the new architecture outperforms the baseline linear stimulus-reconstruction method, improving decoding accuracy from 66% to 81% using wet EEG and from 59% to 87% for dry EEG. Also of note was the finding that the dry EEG system can deliver comparable or even better results than the wet, despite the latter having one third as many EEG channels as the former. The 11-subject, wet-electrode AAD dataset for two competing, co-located talkers, the 11-subject, dry-electrode AAD dataset, and our software are available for further validation, experimentation, and modification.
READ LESS

Summary

Auditory attention decoding (AAD) through a brain-computer interface has had a flowering of developments since it was first introduced by Mesgarani and Chang (2012) using electrocorticograph recordings. AAD has been pursued for its potential application to hearing-aid design in which an attention-guided algorithm selects, from multiple competing acoustic sources, which...

READ MORE

Improving robustness to attacks against vertex classification

Published in:
15th Intl. Workshop on Mining and Learning with Graphs, 5 August 2019.

Summary

Vertex classification—the problem of identifying the class labels of nodes in a graph—has applicability in a wide variety of domains. Examples include classifying subject areas of papers in citation networks or roles of machines in a computer network. Recent work has demonstrated that vertex classification using graph convolutional networks is susceptible to targeted poisoning attacks, in which both graph structure and node attributes can be changed in an attempt to misclassify a target node. This vulnerability decreases users' confidence in the learning method and can prevent adoption in high-stakes contexts. This paper presents work in progress aiming to make vertex classification robust to these types of attacks. We investigate two aspects of this problem: (1) the classification model and (2) the method for selecting training data. Our alternative classifier is a support vector machine (with a radial basis function kernel), which is applied to an augmented node feature-vector obtained by appending the node’s attributes to a Euclidean vector representing the node based on the graph structure. Our alternative methods of selecting training data are (1) to select the highest-degree nodes in each class and (2) to iteratively select the node with the most neighbors minimally connected to the training set. In the datasets on which the original attack was demonstrated, we show that changing the training set can make the network much harder to attack. To maintain a given probability of attack success, the adversary must use far more perturbations; often a factor of 2–4 over the random training baseline. Even in cases where success is relatively easy for the attacker, we show that the classification and training alternatives allow classification performance to degrade much more gradually, with weaker incorrect predictions for the attacked nodes.
READ LESS

Summary

Vertex classification—the problem of identifying the class labels of nodes in a graph—has applicability in a wide variety of domains. Examples include classifying subject areas of papers in citation networks or roles of machines in a computer network. Recent work has demonstrated that vertex classification using graph convolutional networks is...

READ MORE

The Human Trafficking Technology Roadmap: A Targeted Development Strategy for the Department of Homeland Security(9.16 MB)

Summary

Human trafficking is a form of modern-day slavery that involves the use of force, fraud, or coercion for the purposes of involuntary labor and sexual exploitation. It affects tens of million of victims worldwide and generates tens of billions of dollars in illicit pro fits annually. While agencies across the U.S. Government employ a diverse range of resources to combat human trafficking in the U.S. and abroad, trafficking operations remain challenging to measure, investigate, and interdict. Within the Department of Homeland Security, the Science and Technology Directorate is addressing these challenges by incorporating computational social science research into their counter-human trafficking approach. As part of this approach, the Directorate tasked an interdisciplinary team of national security researchers at the Massachusetts Institute of Technology's Lincoln Laboratory, a federally funded research and development center, to undertake a detailed examination of the human trafficking response across the Homeland Security Enterprise. The first phase of this effort was a government-wide systems analysis of major counter-trafficking thrust areas, including law enforcement and prosecution; public health and emergency medicine; victim services; and policy and legislation. The second phase built on this systems analysis to develop a human trafficking technology roadmap and implementation strategy for the Science and Technology Directorate, which is presented in this document.
READ LESS

Summary

Human trafficking is a form of modern-day slavery that involves the use of force, fraud, or coercion for the purposes of involuntary labor and sexual exploitation. It affects tens of million of victims worldwide and generates tens of billions of dollars in illicit pro fits annually. While agencies across the U.S...

READ MORE

The Human Trafficking Technology Roadmap: a targeted development strategy for the Department of Homeland Security

Summary

Human trafficking is a form of modern-day slavery that involves the use of force, fraud, or coercion for the purposes of involuntary labor and sexual exploitation. It affects tens of million of victims worldwide and generates tens of billions of dollars in illicit profits annually. While agencies across the U.S. Government employ a diverse range of resources to combat human trafficking in the U.S. and abroad, trafficking operations remain challenging to measure, investigate, and interdict. Within the Department of Homeland Security, the Science and Technology Directorate is addressing these challenges by incorporating computational social science research into their counter-human trafficking approach. As part of this approach, the Directorate tasked an interdisciplinary team of national security researchers at the Massachusetts Institute of Technology's Lincoln Laboratory, a federally funded research and development center, to undertake a detailed examination of the human trafficking response across the Homeland Security Enterprise. The first phase of this effort was a government-wide systems analysis of major counter-trafficking thrust areas, including law enforcement and prosecution; public health and emergency medicine; victim services; and policy and legislation. The second phase built on this systems analysis to develop a human trafficking technology roadmap and implementation strategy for the Science and Technology Directorate, which is presented in this document.
READ LESS

Summary

Human trafficking is a form of modern-day slavery that involves the use of force, fraud, or coercion for the purposes of involuntary labor and sexual exploitation. It affects tens of million of victims worldwide and generates tens of billions of dollars in illicit profits annually. While agencies across the U.S...

READ MORE