Publications

Refine Results

(Filters Applied) Clear All

Advances in speaker recognition for multilingual conversational telephone speech: the JHU-MIT system for NIST SRE20 CTS challenge

Published in:
Speaker and Language Recognition Workshop, Odyssey 2022, pp. 338-345.

Summary

We present a condensed description of the joint effort of JHUCLSP/HLTCOE and MIT-LL for NIST SRE20. NIST SRE20 CTS consisted of multilingual conversational telephone speech. The set of languages included in the evaluation was not provided, encouraging the participants to develop systems robust to any language. We evaluated x-vector architectures based on ResNet, squeeze-excitation ResNets, Transformers and EfficientNets. Though squeeze-excitation ResNets and EfficientNets provide superior performance in in-domain tasks like VoxCeleb, regular ResNet34 was more robust in the challenge scenario. On the contrary, squeeze-excitation networks over-fitted to the training data, mostly in English. We also proposed a novel PLDA mixture and k-NN PLDA back-ends to handle the multilingual trials. The former clusters the x-vector space expecting that each cluster will correspond to a language family. The latter trains a PLDA model adapted to each enrollment speaker using the nearest speakers–i.e., those with similar language/channel. The k-NN back-end improved Act. Cprimary (Cp) by 68% in SRE16-19 and 22% in SRE20 Progress w.r.t. a single adapted PLDA back-end. Our best single system achieved Act. Cp=0.110 in SRE20 progress. Meanwhile, our best fusion obtained Act. Cp=0.110 in the progress–8% better than single– and Cp=0.087 in the eval set.
READ LESS

Summary

We present a condensed description of the joint effort of JHUCLSP/HLTCOE and MIT-LL for NIST SRE20. NIST SRE20 CTS consisted of multilingual conversational telephone speech. The set of languages included in the evaluation was not provided, encouraging the participants to develop systems robust to any language. We evaluated x-vector architectures...

READ MORE

AAM-Gym: Artificial intelligence testbed for advanced air mobility

Summary

We introduce AAM-Gym, a research and development testbed for Advanced Air Mobility (AAM). AAM has the potential to revolutionize travel by reducing ground traffic and emissions by leveraging new types of aircraft such as electric vertical take-off and landing (eVTOL) aircraft and new advanced artificial intelligence (AI) algorithms. Validation of AI algorithms require representative AAM scenarios, as well as a fast time simulation testbed to evaluate their performance. Until now, there has been no such testbed available for AAM to enable a common research platform for individuals in government, industry, or academia. MIT Lincoln Laboratory has developed AAM-Gym to address this gap by providing an ecosystem to develop, train, and validate new and established AI algorithms across a wide variety of AAM use-cases. In this paper, we use AAM-Gym to study the performance of two reinforcement learning algorithms on an AAM use-case, separation assurance in AAM corridors. The performance of the two algorithms is demonstrated based on a series of metrics provided by AAM-Gym, showing the testbed’s utility to AAM research.
READ LESS

Summary

We introduce AAM-Gym, a research and development testbed for Advanced Air Mobility (AAM). AAM has the potential to revolutionize travel by reducing ground traffic and emissions by leveraging new types of aircraft such as electric vertical take-off and landing (eVTOL) aircraft and new advanced artificial intelligence (AI) algorithms. Validation of...

READ MORE

Toward improving EN adoption: Bridging the gap between stated intention and actual use

Summary

As the COVID-19 pandemic swept the globe in the spring of 2020, technologists looked to enlist technology to assist public health authorities (PHAs) and help stem the tide of infections. As part of this technology push, experts in health care, cryptography, and other related fields developed the Private Automated Contact Tracing (PACT) protocol and related projects to assist the public health objective of slowing the spread of SARS-CoV-2 through digital contact tracing. The joint Google and Apple deployed protocol (Google-Apple Exposure Notifications, also known as GAEN or EN), which became the de facto standard in the U.S., employs the same features as detailed by PACT. The protocol leverages smartphone Bluetooth communications to alert users of potential contact with those carrying the COVID-19 virus in a way that preserves the privacy of both the known-infected individual, and the users receiving the alert. Contact tracing and subsequent personal precautions are more effective at reducing disease spread when more of the population participates, but there are known difficulties with the adoption of novel technology. In order to help the U.S. Centers for Disease Control and Prevention (CDC) and U.S. state-level public health teams address these difficulties, a team of staff from MIT's Lincoln Laboratory (MIT LL) and Computer Science and Artificial Intelligence Laboratory (MIT CSAIL) focused on studying user perception and information needs.
READ LESS

Summary

As the COVID-19 pandemic swept the globe in the spring of 2020, technologists looked to enlist technology to assist public health authorities (PHAs) and help stem the tide of infections. As part of this technology push, experts in health care, cryptography, and other related fields developed the Private Automated Contact...

READ MORE

The thundering herd: Amplifying kernel interference to attack response times

Published in:
2022 IEEE 28th Real-Time and Embedded Technology and Applications Symp., RTAS, 4-6 May 2022.

Summary

Embedded and real-time systems are increasingly attached to networks. This enables broader coordination beyond the physical system, but also opens the system to attacks. The increasingly complex workloads of these systems include software of varying assurance levels, including that which might be susceptible to compromise by remote attackers. To limit the impact of compromise, u-kernels focus on maintaining strong memory protection domains between different bodies of software, including system services. They enable limited coordination between processes through Inter-Process Communication (IPC). Real-time systems also require strong temporal guarantees for tasks, and thus need temporal isolation to limit the impact of malicious software. This is challenging as multiple client threads that use IPC to request service from a shared server will impact each other's response times. To constrain the temporal interference between threads, modern u-kernels often build priority and budget awareness into the system. Unfortunately, this paper demonstrates that this is more challenging than previously thought. Adding priority awareness to IPC processing can lead to significant interference due to the kernel's prioritization logic. Adding budget awareness similarly creates opportunities for interference due to the budget tracking and management operations. In both situations, a Thundering Herd of malicious threads can significantly delay the activation of mission-critical tasks. The Thundering Herd effects are evaluated on seL4 and results demonstrate that high-priority threads can be delayed by over 100,000 cycles per malicious thread. This paper reveals a challenging dilemma: the temporal protections u-kernels add can, themselves, provide means of threatening temporal isolation. Finally, to defend the system, we identify and empirically evaluate possible mitigations, and propose an admission-control test based upon an interference-aware analysis.
READ LESS

Summary

Embedded and real-time systems are increasingly attached to networks. This enables broader coordination beyond the physical system, but also opens the system to attacks. The increasingly complex workloads of these systems include software of varying assurance levels, including that which might be susceptible to compromise by remote attackers. To limit...

READ MORE

Modeling probability of alert of Bluetooth low energy-based automatic exposure notifications

Published in:
MIT Lincoln Laboratory Report ACTA-4

Summary

BLEMUR, or Bluetooth Low Energy Model of User Risk, is a model of the probability of alert at a given duration and distance of an index case for a specific configuration of settings for an Exposure Notification (EN) system.The Google-Apple EN framework operates in the duration and Bluetooth Low Energy (BLE) signal attenuation domains. However, many public health definitions of "exposure" to a disease are based upon the distance between an index case and another person. To bridge the conceptual gap for public health authorities (PHAs) from the familiar distance-and-duration space to the signal attenuation-and-duration space, BLEMUR uses BLE signal attenuation as a proxy for distance between people, albeit an imprecise one. This paper will discuss the EN settings that can be manipulated, the BLE data collected, how data support a model of the relationship between measured attenuation and distance between phones, and how BLEMUR calculates the probability of alert for a distance and duration based on the settings and data.
READ LESS

Summary

BLEMUR, or Bluetooth Low Energy Model of User Risk, is a model of the probability of alert at a given duration and distance of an index case for a specific configuration of settings for an Exposure Notification (EN) system.The Google-Apple EN framework operates in the duration and Bluetooth Low Energy...

READ MORE

Nearfield anechoic chamber and farfield on-site antenna calibration pattern comparison of an S-band planar phased array radar

Published in:
IEEE Annual Conf. on Wireless and Microwave Technology, WAMICON, 27-28 April 2022.

Summary

The Advanced Technology Demonstrator (ATD) is an active, S-band, dual-polarization phased array radar developed for weather sensing. The ATD is an active electronically scanned array (AESA) with a 4-m aperture comprised of 4,864 individual transmit/receive (T/R) modules. The antenna was calibrated at the element, subarray, and array levels. Calibration, validation, and verification testing was completed in two main stages, first in an anechoic chamber and second after it was installed on site in its permanent location. This paper describes the procedure used to collect antenna patterns at each stage and compares three key performance metrics: beamwidth, mean-squared sidelobe level (MSSL), and integrated sidelobe level (ISL).
READ LESS

Summary

The Advanced Technology Demonstrator (ATD) is an active, S-band, dual-polarization phased array radar developed for weather sensing. The ATD is an active electronically scanned array (AESA) with a 4-m aperture comprised of 4,864 individual transmit/receive (T/R) modules. The antenna was calibrated at the element, subarray, and array levels. Calibration, validation...

READ MORE

Graph-guided network for irregularly sampled multivariate time series

Published in:
International Conference on Learning Representations, ICLR 2022.

Summary

In many domains, including healthcare, biology, and climate science, time series are irregularly sampled with varying time intervals between successive readouts and different subsets of variables (sensors) observed at different time points. Here, we introduce RAINDROP, a graph neural network that embeds irregularly sampled and multivariate time series while also learning the dynamics of sensors purely from observational data. RAINDROP represents every sample as a separate sensor graph and models time-varying dependencies between sensors with a novel message passing operator. It estimates the latent sensor graph structure and leverages the structure together with nearby observations to predict misaligned readouts. This model can be interpreted as a graph neural network that sends messages over graphs that are optimized for capturing time-varying dependencies among sensors. We use RAINDROP to classify time series and interpret temporal dynamics on three healthcare and human activity datasets. RAINDROP outperforms state-of-the-art methods by up to 11.4% (absolute F1-score points), including techniques that deal with irregular sampling using fixed discretization and set functions. RAINDROP shows superiority in diverse setups, including challenging leave-sensor-out settings.
READ LESS

Summary

In many domains, including healthcare, biology, and climate science, time series are irregularly sampled with varying time intervals between successive readouts and different subsets of variables (sensors) observed at different time points. Here, we introduce RAINDROP, a graph neural network that embeds irregularly sampled and multivariate time series while also...

READ MORE

Preventing Kernel Hacks with HAKCs

Published in:
Network and Distributed System Security (NDSS) Symposium 2022.

Summary

Commodity operating system kernels remain monolithic for practical and historical reasons. All kernel code shares a single address space, executes with elevated processor privileges, and has largely unhindered access to all data, including data irrelevant to the completion of a specific task. Applying the principle of least privilege, which limits available resources only to those needed to perform a particular task, to compartmentalize the kernel would realize major security gains, similar to microkernels yet without the major redesign effort. Here, we introduce a compartmentalization design, called a Hardware-Assisted Kernel Compartmentalization (HAKC), that approximates least privilege separation, while minimizing both developer effort and performance overhead. HAKC divides code and data into separate partitions, and specifies an access policy for each partition. Data is owned by a single partition, and a partition’s access-control policy is enforced at runtime, preventing unauthorized data access. When a partition needs to transfer control flow to outside itself, data ownership is transferred to the target, and transferred back upon return. The HAKC design allows for isolating code and data from the rest of the kernel, without utilizing any additional Trusted Computing Base while compartmentalized code is executing. Instead, HAKC relies on hardware for enforcement. Loadable kernel modules (LKMs), which dynamically load kernel code and data providing specialized functionality, are the single largest part of the Linux source base. Unfortunately, their collective size and complexity makes LKMs the cause of the majority of CVEs issued for the Linux kernel. The combination of a large attack surface in kernel modules, and the monolithic design of the Linux kernel, make LKMs ideal candidates for compartmentalization. To demonstrate the effectiveness of our approach, we implement HAKC in Linux v5.10 using extensions to the Arm v8.5-A ISA, and compartmentalize the ipv6.ko LKM, which consists of over 55k LOC. The average overhead measured in Apachebench tests was just 1.6%–24%. Additionally, we compartmentalize the nf_tables.ko packet filtering LKM, and measure the combined impact of using both LKMs. We find a reasonable linear growth in overhead when both compartmentalized LKMs are used. Finally, we measure no significant difference in performance when using the compartmentalized ipv6.ko LKM over the unmodified LKM during real-world web browsing experiments on the Alexa Top 50 websites.
READ LESS

Summary

Commodity operating system kernels remain monolithic for practical and historical reasons. All kernel code shares a single address space, executes with elevated processor privileges, and has largely unhindered access to all data, including data irrelevant to the completion of a specific task. Applying the principle of least privilege, which limits...

READ MORE

Cross-language attacks

Published in:
Network and Distributed System Security (NDSS) Symposium 2022.

Summary

Memory corruption attacks against unsafe programming languages like C/C++ have been a major threat to computer systems for multiple decades. Various sanitizers and runtime exploit mitigation techniques have been shown to only provide partial protection at best. Recently developed ‘safe’ programming languages such as Rust and Go hold the promise to change this paradigm by preventing memory corruption bugs using a strong type system and proper compile-time and runtime checks. Gradual deployment of these languages has been touted as a way of improving the security of existing applications before entire applications can be developed in safe languages. This is notable in popular applications such as Firefox and Tor. In this paper, we systematically analyze the security of multi-language applications. We show that because language safety checks in safe languages and exploit mitigation techniques applied to unsafe languages (e.g., Control-Flow Integrity) break different stages of an exploit to prevent control hijacking attacks, an attacker can carefully maneuver between the languages to mount a successful attack. In essence, we illustrate that the incompatible set of assumptions made in various languages enables attacks that are not possible in each language alone. We study different variants of these attacks and analyze Firefox to illustrate the feasibility and extent of this problem. Our findings show that gradual deployment of safe programming languages, if not done with extreme care, can indeed be detrimental to security.
READ LESS

Summary

Memory corruption attacks against unsafe programming languages like C/C++ have been a major threat to computer systems for multiple decades. Various sanitizers and runtime exploit mitigation techniques have been shown to only provide partial protection at best. Recently developed ‘safe’ programming languages such as Rust and Go hold the promise...

READ MORE

System analysis for responsible design of modern AI/ML systems

Summary

The irresponsible use of ML algorithms in practical settings has received a lot of deserved attention in the recent years. We posit that the traditional system analysis perspective is needed when designing and implementing ML algorithms and systems. Such perspective can provide a formal way for evaluating and enabling responsible ML practices. In this paper, we review components of the System Analysis methodology and highlight how they connect and enable responsible practices of ML design.
READ LESS

Summary

The irresponsible use of ML algorithms in practical settings has received a lot of deserved attention in the recent years. We posit that the traditional system analysis perspective is needed when designing and implementing ML algorithms and systems. Such perspective can provide a formal way for evaluating and enabling responsible...

READ MORE