Publications

Refine Results

(Filters Applied) Clear All

Combating Misinformation: HLT Highlights from MIT Lincoln Laboratory

Published in:
Human Language Technology Conference (HLTCon), 16-18 March 2021.

Summary

Dr. Joseph Campbell shares several human language technologies highlights from MIT Lincoln Laboratory. These include key enabling technologies in combating misinformation to link personas, analyze content, and understand human networks. Developing operationally relevant technologies requires access to corresponding data with meaningful evaluations, as Dr. Douglas Reynolds presented in his keynote. As Dr. Danelle Shah discussed in her keynote, it’s crucial to develop these technologies to operate at deeper levels than the surface. Producing reliable information from the fusion of missing and inherently unreliable information channels is paramount. Furthermore, the dynamic misinformation environment and the coevolution of allied methods with adversarial methods represent additional challenges
READ LESS

Summary

Dr. Joseph Campbell shares several human language technologies highlights from MIT Lincoln Laboratory. These include key enabling technologies in combating misinformation to link personas, analyze content, and understand human networks. Developing operationally relevant technologies requires access to corresponding data with meaningful evaluations, as Dr. Douglas Reynolds presented in his keynote...

READ MORE

Combating Misinformation: What HLT Can (and Can't) Do When Words Don't Say What They Mean

Author:
Published in:
Human Language Technology Conference (HLTCon), 16-18 March 2021.

Summary

Misinformation, disinformation, and “fake news” have been used as a means of influence for millennia, but the proliferation of the internet and social media in the 21st century has enabled nefarious campaigns to achieve unprecedented scale, speed, precision, and effectiveness. In the past few years, there has been significant recognition of the threats posed by malign influence operations to geopolitical relations, democratic institutions and processes, public health and safety, and more. At the same time, the digitization of communication offers tremendous opportunities for human language technologies (HLT) to observe, interpret, and understand this publicly available content. The ability to infer intent and impact, however, remains much more elusive.
READ LESS

Summary

Misinformation, disinformation, and “fake news” have been used as a means of influence for millennia, but the proliferation of the internet and social media in the 21st century has enabled nefarious campaigns to achieve unprecedented scale, speed, precision, and effectiveness. In the past few years, there has been significant recognition...

READ MORE

The 2019 NIST Speaker Recognition Evaluation CTS Challenge

Published in:
The Speaker and Language Recognition Workshop: Odyssey 2020, 1-5 November 2020.

Summary

In 2019, the U.S. National Institute of Standards and Technology (NIST) conducted a leaderboard style speaker recognition challenge using conversational telephone speech (CTS) data extracted from the unexposed portion of the Call My Net 2 (CMN2) corpus previously used in the 2018 Speaker Recognition Evaluation (SRE). The SRE19 CTS Challenge was organized in a similar manner to SRE18, except it offered only the open training condition. In addition, similar to the NIST i-vector challenge, the evaluation set consisted of two subsets: a progress subset, and a test subset. The progress subset comprised 30% of the trials and was used to monitor progress on the leaderboad, while the remaining 70% of the trials formed the test subset, which was used to generate the official final results determined at the end of the challenge. Which subset (i.e., progress or test) a trial belonged to was unknown to challenge participants, and each system submission had to contain outputs for all of trials. The CTS Challenge also served as a prerequisite for entrance to the main SRE19 whose primary task was audio-visual person recognition. A total of 67 organizations (forming 51 teams) from academia and industry participated in the CTS Challenge and submitted 1347 valid system outputs. This paper presents an overview of the evaluation and several analyses of system performance for all primary conditions in the CTS Challenge. Compared to the CTS track of the SRE18, the SRE19 CTS Challenge results indicate remarkable improvements in performance which are mainly attributed to 1) the availability of large amounts of in-domain development data from a large number of labeled speakers, 2) speaker representations (aka embeddings) extracted using extended and more complex end-to-end neural network frameworks, and 3) effective use of the provided large development set.
READ LESS

Summary

In 2019, the U.S. National Institute of Standards and Technology (NIST) conducted a leaderboard style speaker recognition challenge using conversational telephone speech (CTS) data extracted from the unexposed portion of the Call My Net 2 (CMN2) corpus previously used in the 2018 Speaker Recognition Evaluation (SRE). The SRE19 CTS Challenge...

READ MORE

Using K-means in SVR-based text difficulty estimation

Published in:
8th ISCA Workshop on Speech and Language Technology in Education, SLaTE, 20-21 September 2019.

Summary

A challenge for second language learners, educators, and test creators is the identification of authentic materials at the right level of difficulty. In this work, we present an approach to automatically measure text difficulty, integrated into Auto-ILR, a web-based system that helps find text material at the right level for learners in 18 languages. The Auto-ILR subscription service scans web feeds, extracts article content, evaluates the difficulty, and notifies users of documents that match their skill level. Difficulty is measured on the standard ILR scale with language-specific support vector machine regression (SVR) models built from vectors incorporating length features, term frequencies, relative entropy, and K-means clustering.
READ LESS

Summary

A challenge for second language learners, educators, and test creators is the identification of authentic materials at the right level of difficulty. In this work, we present an approach to automatically measure text difficulty, integrated into Auto-ILR, a web-based system that helps find text material at the right level for...

READ MORE

The AFRL-MITLL WMT16 news-translation task systems

Published in:
Proc. First Conf. on Machine Translation, Vol. 2, 11-12 August 2016, pp. 296-302.

Summary

This paper describes the AFRL-MITLL statistical machine translation systems and the improvements that were developed during the WMT16 evaluation campaign. New techniques applied this year include Neural Machine Translation, a unique selection process for language modelling data, additional out-of-vocabulary transliteration techniques, and morphology generation.
READ LESS

Summary

This paper describes the AFRL-MITLL statistical machine translation systems and the improvements that were developed during the WMT16 evaluation campaign. New techniques applied this year include Neural Machine Translation, a unique selection process for language modelling data, additional out-of-vocabulary transliteration techniques, and morphology generation.

READ MORE

Operational assessment of keyword search on oral history

Published in:
10th Language Resources and Evaluation Conf., LREC 2016, 23-8 May 2016.

Summary

This project assesses the resources necessary to make oral history searchable by means of automatic speech recognition (ASR). There are many inherent challenges in applying ASR to conversational speech: smaller training set sizes and varying demographics, among others. We assess the impact of dataset size, word error rate and term-weighted value on human search capability through an information retrieval task on Mechanical Turk. We use English oral history data collected by StoryCorps, a national organization that provides all people with the opportunity to record, share and preserve their stories, and control for a variety of demographics including age, gender, birthplace, and dialect on four different training set sizes. We show comparable search performance using a standard speech recognition system as with hand-transcribed data, which is promising for increased accessibility of conversational speech and oral history archives.
READ LESS

Summary

This project assesses the resources necessary to make oral history searchable by means of automatic speech recognition (ASR). There are many inherent challenges in applying ASR to conversational speech: smaller training set sizes and varying demographics, among others. We assess the impact of dataset size, word error rate and term-weighted...

READ MORE

A fun and engaging interface for crowdsourcing named entities

Published in:
10th Language Resources and Evaluation Conf., LREC 2016, 23-28 May 2016.

Summary

There are many current problems in natural language processing that are best solved by training algorithms on an annotated in-language, in-domain corpus. The more representative the training corpus is of the test data, the better the algorithm will perform, but also the less likely it is that such a corpus has already been annotated. Annotating corpora for natural language processing tasks is typically a time consuming and expensive process. In this paper, we provide a case study in using crowd sourcing to curate an in-domain corpus for named entity recognition, a common problem in natural language processing. In particular, we present our use of fun, engaging user interfaces as a way to entice workers to partake in our crowd sourcing task while avoiding inflating our payments in a way that would attract more mercenary workers than conscientious ones. Additionally, we provide a survey of alternate interfaces for collecting annotations of named entities and compare our approach to those systems.
READ LESS

Summary

There are many current problems in natural language processing that are best solved by training algorithms on an annotated in-language, in-domain corpus. The more representative the training corpus is of the test data, the better the algorithm will perform, but also the less likely it is that such a corpus...

READ MORE

Analysis of factors affecting system performance in the ASpIRE challenge

Published in:
2015 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2015, 13-17 December 2015.

Summary

This paper presents an analysis of factors affecting system performance in the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge. In particular, overall word error rate (WER) of the solver systems is analyzed as a function of room, distance between talker and microphone, and microphone type. We also analyze speech activity detection performance of the solver systems and investigate its relationship to WER. The primary goal of the paper is to provide insight into the factors affecting system performance in the ASpIRE evaluation set across many systems given annotations and metadata that are not available to the solvers. This analysis will inform the design of future challenges and provide insight into the efficacy of current solutions addressing noisy reverberant speech in mismatched conditions.
READ LESS

Summary

This paper presents an analysis of factors affecting system performance in the ASpIRE (Automatic Speech recognition In Reverberant Environments) challenge. In particular, overall word error rate (WER) of the solver systems is analyzed as a function of room, distance between talker and microphone, and microphone type. We also analyze speech...

READ MORE

The MITLL-AFRL IWSLT 2015 Systems

Summary

This report summarizes the MITLL-AFRL MT, ASR and SLT systems and the experiments run using them during the 2015 IWSLT evaluation campaign. We build on the progress made last year, and additionally experimented with neural MT, unknown word processing, and system combination. We applied these techniques to translating Chinese to English and English to Chinese. ASR systems are also improved by reining improvements developed last year. Finally, we combine our ASR and MT systems to produce a English to Chinese SLT system.
READ LESS

Summary

This report summarizes the MITLL-AFRL MT, ASR and SLT systems and the experiments run using them during the 2015 IWSLT evaluation campaign. We build on the progress made last year, and additionally experimented with neural MT, unknown word processing, and system combination. We applied these techniques to translating Chinese to...

READ MORE

The AFRL-MITLL WMT15 System: there's more than one way to decode it!

Published in:
Proc. 10th Workshop on Statistical Machine Translation, 17-18 September 2015, pp. 112-9.

Summary

This paper describes the AFRL-MITLL statistical MT systems and the improvements that were developed during the WMT15 evaluation campaign. As part of these efforts we experimented with a number of extensions to the standard phrase-based model that improve performance on the Russian to English translation task creating three submission systems with different decoding strategies. Out of vocabulary words were addressed with named entity postprocessing.
READ LESS

Summary

This paper describes the AFRL-MITLL statistical MT systems and the improvements that were developed during the WMT15 evaluation campaign. As part of these efforts we experimented with a number of extensions to the standard phrase-based model that improve performance on the Russian to English translation task creating three submission systems...

READ MORE