Publications

Refine Results

(Filters Applied) Clear All

AAM-Gym: Artificial intelligence testbed for advanced air mobility

Summary

We introduce AAM-Gym, a research and development testbed for Advanced Air Mobility (AAM). AAM has the potential to revolutionize travel by reducing ground traffic and emissions by leveraging new types of aircraft such as electric vertical take-off and landing (eVTOL) aircraft and new advanced artificial intelligence (AI) algorithms. Validation of AI algorithms require representative AAM scenarios, as well as a fast time simulation testbed to evaluate their performance. Until now, there has been no such testbed available for AAM to enable a common research platform for individuals in government, industry, or academia. MIT Lincoln Laboratory has developed AAM-Gym to address this gap by providing an ecosystem to develop, train, and validate new and established AI algorithms across a wide variety of AAM use-cases. In this paper, we use AAM-Gym to study the performance of two reinforcement learning algorithms on an AAM use-case, separation assurance in AAM corridors. The performance of the two algorithms is demonstrated based on a series of metrics provided by AAM-Gym, showing the testbed’s utility to AAM research.
READ LESS

Summary

We introduce AAM-Gym, a research and development testbed for Advanced Air Mobility (AAM). AAM has the potential to revolutionize travel by reducing ground traffic and emissions by leveraging new types of aircraft such as electric vertical take-off and landing (eVTOL) aircraft and new advanced artificial intelligence (AI) algorithms. Validation of...

READ MORE

Tools and practices for responsible AI engineering

Summary

Responsible Artificial Intelligence (AI)—the practice of developing, evaluating, and maintaining accurate AI systems that also exhibit essential properties such as robustness and explainability—represents a multifaceted challenge that often stretches standard machine learning tooling, frameworks, and testing methods beyond their limits. In this paper, we present two new software libraries—hydra-zen and the rAI-toolbox—that address critical needs for responsible AI engineering. hydra-zen dramatically simplifies the process of making complex AI applications configurable, and their behaviors reproducible. The rAI-toolbox is designed to enable methods for evaluating and enhancing the robustness of AI-models in a way that is scalable and that composes naturally with other popular ML frameworks. We describe the design principles and methodologies that make these tools effective, including the use of property-based testing to bolster the reliability of the tools themselves. Finally, we demonstrate the composability and flexibility of the tools by showing how various use cases from adversarial robustness and explainable AI can be concisely implemented with familiar APIs.
READ LESS

Summary

Responsible Artificial Intelligence (AI)—the practice of developing, evaluating, and maintaining accurate AI systems that also exhibit essential properties such as robustness and explainability—represents a multifaceted challenge that often stretches standard machine learning tooling, frameworks, and testing methods beyond their limits. In this paper, we present two new software libraries—hydra-zen and...

READ MORE

AI-enabled, ultrasound-guided handheld robotic device for femoral vascular access

Summary

Hemorrhage is a leading cause of trauma death, particularly in prehospital environments when evacuation is delayed. Obtaining central vascular access to a deep artery or vein is important for administration of emergency drugs and analgesics, and rapid replacement of blood volume, as well as invasive sensing and emerging life-saving interventions. However, central access is normally performed by highly experienced critical care physicians in a hospital setting. We developed a handheld AI-enabled interventional device, AI-GUIDE (Artificial Intelligence Guided Ultrasound Interventional Device), capable of directing users with no ultrasound or interventional expertise to catheterize a deep blood vessel, with an initial focus on the femoral vein. AI-GUIDE integrates with widely available commercial portable ultrasound systems and guides a user in ultrasound probe localization, venous puncture-point localization, and needle insertion. The system performs vascular puncture robotically and incorporates a preloaded guidewire to facilitate the Seldinger technique of catheter insertion. Results from tissue-mimicking phantom and porcine studies under normotensive and hypotensive conditions provide evidence of the technique's robustness, with key performance metrics in a live porcine model including: a mean time to acquire femoral vein insertion point of 53 plus or minus 36 s (5 users with varying experience, in 20 trials), a total time to insert catheter of 80 plus or minus 30 s (1 user, in 6 trials), and a mean number of 1.1 (normotensive, 39 trials) and 1.3 (hypotensive, 55 trials) needle insertion attempts (1 user). These performance metrics in a porcine model are consistent with those for experienced medical providers performing central vascular access on humans in a hospital.
READ LESS

Summary

Hemorrhage is a leading cause of trauma death, particularly in prehospital environments when evacuation is delayed. Obtaining central vascular access to a deep artery or vein is important for administration of emergency drugs and analgesics, and rapid replacement of blood volume, as well as invasive sensing and emerging life-saving interventions...

READ MORE

Principles for evaluation of AI/ML model performance and robustness, revision 1

Summary

The Department of Defense (DoD) has significantly increased its investment in the design, evaluation, and deployment of Artificial Intelligence and Machine Learning (AI/ML) capabilities to address national security needs. While there are numerous AI/ML successes in the academic and commercial sectors, many of these systems have also been shown to be brittle and nonrobust. In a complex and ever-changing national security environment, it is vital that the DoD establish a sound and methodical process to evaluate the performance and robustness of AI/ML models before these new capabilities are deployed to the field. Without an effective evaluation process, the DoD may deploy AI/ML models that are assumed to be effective given limited evaluation metrics but actually have poor performance and robustness on operational data. Poor evaluation practices lead to loss of trust in AI/ML systems by model operators and more frequent--often costly--design updates needed to address the evolving security environment. In contrast, an effective evaluation process can drive the design of more resilient capabilities, ag potential limitations of models before they are deployed, and build operator trust in AI/ML systems. This paper reviews the AI/ML development process, highlights common best practices for AI/ML model evaluation, and makes the following recommendations to DoD evaluators to ensure the deployment of robust AI/ML capabilities for national security needs: -Develop testing datasets with sufficient variation and number of samples to effectively measure the expected performance of the AI/ML model on future (unseen) data once deployed, -Maintain separation between data used for design and evaluation (i.e., the test data is not used to design the AI/ML model or train its parameters) in order to ensure an honest and unbiased assessment of the model's capability, -Evaluate performance given small perturbations and corruptions to data inputs to assess the smoothness of the AI/ML model and identify potential vulnerabilities, and -Evaluate performance on samples from data distributions that are shifted from the assumed distribution that was used to design the AI/ML model to assess how the model may perform on operational data that may differ from the training data. By following the recommendations for evaluation presented in this paper, the DoD can fully take advantage of the AI/ML revolution, delivering robust capabilities that maintain operational feasibility over longer periods of time, and increase warfighter confidence in AI/ML systems.
READ LESS

Summary

The Department of Defense (DoD) has significantly increased its investment in the design, evaluation, and deployment of Artificial Intelligence and Machine Learning (AI/ML) capabilities to address national security needs. While there are numerous AI/ML successes in the academic and commercial sectors, many of these systems have also been shown to...

READ MORE

Speaker separation in realistic noise environments with applications to a cognitively-controlled hearing aid

Summary

Future wearable technology may provide for enhanced communication in noisy environments and for the ability to pick out a single talker of interest in a crowded room simply by the listener shifting their attentional focus. Such a system relies on two components, speaker separation and decoding the listener's attention to acoustic streams in the environment. To address the former, we present a system for joint speaker separation and noise suppression, referred to as the Binaural Enhancement via Attention Masking Network (BEAMNET). The BEAMNET system is an end-to-end neural network architecture based on self-attention. Binaural input waveforms are mapped to a joint embedding space via a learned encoder, and separate multiplicative masking mechanisms are included for noise suppression and speaker separation. Pairs of output binaural waveforms are then synthesized using learned decoders, each capturing a separated speaker while maintaining spatial cues. A key contribution of BEAMNET is that the architecture contains a separation path, an enhancement path, and an autoencoder path. This paper proposes a novel loss function which simultaneously trains these paths, so that disabling the masking mechanisms during inference causes BEAMNET to reconstruct the input speech signals. This allows dynamic control of the level of suppression applied by BEAMNET via a minimum gain level, which is not possible in other state-of-the-art approaches to end-to-end speaker separation. This paper also proposes a perceptually-motivated waveform distance measure. Using objective speech quality metrics, the proposed system is demonstrated to perform well at separating two equal-energy talkers, even in high levels of background noise. Subjective testing shows an improvement in speech intelligibility across a range of noise levels, for signals with artificially added head-related transfer functions and background noise. Finally, when used as part of an auditory attention decoder (AAD) system using existing electroencephalogram (EEG) data, BEAMNET is found to maintain the decoding accuracy achieved with ideal speaker separation, even in severe acoustic conditions. These results suggest that this enhancement system is highly effective at decoding auditory attention in realistic noise environments, and could possibly lead to improved speech perception in a cognitively controlled hearing aid.
READ LESS

Summary

Future wearable technology may provide for enhanced communication in noisy environments and for the ability to pick out a single talker of interest in a crowded room simply by the listener shifting their attentional focus. Such a system relies on two components, speaker separation and decoding the listener's attention to...

READ MORE

Information Aware max-norm Dirichlet networks for predictive uncertainty estimation

Published in:
Neural Netw., Vol. 135, 2021, pp. 105–114.

Summary

Precise estimation of uncertainty in predictions for AI systems is a critical factor in ensuring trust and safety. Deep neural networks trained with a conventional method are prone to over-confident predictions. In contrast to Bayesian neural networks that learn approximate distributions on weights to infer prediction confidence, we propose a novel method, Information Aware Dirichlet networks, that learn an explicit Dirichlet prior distribution on predictive distributions by minimizing a bound on the expected max norm of the prediction error and penalizing information associated with incorrect outcomes. Properties of the new cost function are derived to indicate how improved uncertainty estimation is achieved. Experiments using real datasets show that our technique outperforms, by a large margin, state-of-the-art neural networks for estimating within-distribution and out-of-distribution uncertainty, and detecting adversarial examples.
READ LESS

Summary

Precise estimation of uncertainty in predictions for AI systems is a critical factor in ensuring trust and safety. Deep neural networks trained with a conventional method are prone to over-confident predictions. In contrast to Bayesian neural networks that learn approximate distributions on weights to infer prediction confidence, we propose a...

READ MORE

Adaptive stress testing: finding likely failure events with reinforcement learning

Published in:
J. Artif. Intell. Res., Vol. 69, 2020, pp. 1165-1201.

Summary

Finding the most likely path to a set of failure states is important to the analysis of safety critical systems that operate over a sequence of time steps, such as aircraft collision avoidance systems and autonomous cars. In many applications such as autonomous driving, failures cannot be completely eliminated due to the complex stochastic environment in which the system operates. As a result, safety validation is not only concerned about whether a failure can occur, but also discovering which failures are most likely to occur. This article presents adaptive stress testing (AST), a framework for finding the most likely path to a failure event in simulation. We consider a general black box setting for partially observable and continuous-valued systems operating in an environment with stochastic disturbances. We formulate the problem as a Markov decision process and use reinforcement learning to optimize it. The approach is simulation-based and does not require internal knowledge of the system, making it suitable for black-box testing of large systems. We present different formulations depending on whether the state is fully observable or partially observable. In the latter case, we present a modified Monte Carlo tree search algorithm that only requires access to the pseudorandom number generator of the simulator to overcome partial observability. We also present an extension of the framework, called differential adaptive stress testing (DAST), that can find failures that occur in one system but not in another. This type of differential analysis is useful in applications such as regression testing, where we are concerned with finding areas of relative weakness compared to a baseline. We demonstrate the effectiveness of the approach on an aircraft collision avoidance application, where a prototype aircraft collision avoidance system is stress tested to find the most likely scenarios of near mid-air collision.
READ LESS

Summary

Finding the most likely path to a set of failure states is important to the analysis of safety critical systems that operate over a sequence of time steps, such as aircraft collision avoidance systems and autonomous cars. In many applications such as autonomous driving, failures cannot be completely eliminated due...

READ MORE

Ultrasound and artificial intelligence

Published in:
Chapter 8 in Machine Learning in Cardiovascular Medicine, 2020, pp. 177-210.

Summary

Compared to other major medical imaging modalities such as X-ray, computed tomography (CT), and magnetic resonance imaging, medical ultrasound (US) has unique attributes that make it the preferred modality for many clinical applications. In particular, US is nonionizing, portable, and provides real-time imaging, with adequate spatial and depth resolution to visualize tissue dynamics. The ability to measure Doppler information is also important, particularly for measuring blood flows. The small size of US transducers is a key attribute for intravascular applications. In addition, accessibility has been increased with the use of portable US, which continues to move toward a smaller footprint and lower cost. Nowadays, some US probes can even be directly connected to a phone or tablet. On the other hand, US also has unique challenges, particularly in that image quality is highly dependent on the operator’s skill in acquiring images based on the proper position, orientation, and probe pressure. Additional challenges that further require operator skill include the presence of noise, artifacts, limited field of view, difficulty in imaging structures behind bone and air, and device variability across manufacturers. Sonographers become highly proficient through extensive training and long experience, but high intra- and interobserver variability remains. This skill dependence has limited the wider use of US by healthcare providers who are not US imaging specialists. Recent advances in machine learning (ML) have been increasingly applied to medical US (Brattain, Telfer, Dhyani, Grajo, & Samir, 2018), with a goal of reducing intra- and interobserver variability as well as interpretation time. As progress toward these goals is made, US use by nonspecialists is expected to proliferate, including nurses at the bedside or medics in the field. The acceleration in ML applications for medical US can be seen from the increasing number of publications (Fig. 8.1) and Food and Drug Administration (FDA) approvals (Table 8.1) in the past few years. Fig. 8.1 shows that cardiovascular applications (spanning the heart, brain and vessels) have received the most attention, compared to other organs. Table 8.1 shows that pace of US FDA-cleared artificial intelligence (AI) products that combine AI and ultrasound is accelerating. Of note, many of the products have been approved over the last couple of years. Companies such as Butterfly Network (Guilford, CT) have also demonstrated AI-driven applications for portable ultrasound and more FDA clearances are expected to be published. The goals of this chapter are to highlight the recent progress, as well as the current challenges and future opportunities. Specifically, this chapter addresses topics such as the following: (1) what is the current state of machine learning for medical US application, both in research and commercially; (2) what applications are receiving the most attention and have performance improvements been quantified; (3) how do ML solutions fit in an overall workflow; and (4) what open-source datasets are available for the broader community to contribute to progress in this field. The focus is on cardiovascular applications (Section Cardiovascular/echocardiography), but common themes and differences for other applications for medical US are also summarized (Section Breast, liver, and thyroid ultrasound). A discussion is offered in Discussion and outlook section.
READ LESS

Summary

Compared to other major medical imaging modalities such as X-ray, computed tomography (CT), and magnetic resonance imaging, medical ultrasound (US) has unique attributes that make it the preferred modality for many clinical applications. In particular, US is nonionizing, portable, and provides real-time imaging, with adequate spatial and depth resolution to...

READ MORE

A multi-task LSTM framework for improved early sepsis prediction

Summary

Early detection for sepsis, a high-mortality clinical condition, is important for improving patient outcomes. The performance of conventional deep learning methods degrades quickly as predictions are made several hours prior to the clinical definition. We adopt recurrent neural networks (RNNs) to improve early prediction of the onset of sepsis using times series of physiological measurements. Furthermore, physiological data is often missing and imputation is necessary. Absence of data might arise due to decisions made by clinical professionals which carries information. Using the missing data patterns into the learning process can further guide how much trust to place on imputed values. A new multi-task LSTM model is proposed that takes informative missingness into account during training that effectively attributes trust to temporal measurements. Experimental results demonstrate our method outperforms conventional CNN and LSTM models on the PhysioNet-2019 CiC early sepsis prediction challenge in terms of area under receiver-operating curve and precision-recall curve, and further improves upon calibration of prediction scores.
READ LESS

Summary

Early detection for sepsis, a high-mortality clinical condition, is important for improving patient outcomes. The performance of conventional deep learning methods degrades quickly as predictions are made several hours prior to the clinical definition. We adopt recurrent neural networks (RNNs) to improve early prediction of the onset of sepsis using...

READ MORE

GraphChallenge.org sparse deep neural network performance [e-print]

Summary

The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. The Sparse Deep Neural Network (DNN) Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a challenge that is reflective of emerging sparse AI systems. The sparse DNN challenge is based on a mathematically well-defined DNN inference computation and can be implemented in any programming environment. In 2019 several sparse DNN challenge submissions were received from a wide range of authors and organizations. This paper presents a performance analysis of the best performers of these submissions. These submissions show that their state-of-the-art sparse DNN execution time, TDNN, is a strong function of the number of DNN operations performed, Nop. The sparse DNN challenge provides a clear picture of current sparse DNN systems and underscores the need for new innovations to achieve high performance on very large sparse DNNs.
READ LESS

Summary

The MIT/IEEE/Amazon GraphChallenge.org encourages community approaches to developing new solutions for analyzing graphs and sparse data. Sparse AI analytics present unique scalability difficulties. The Sparse Deep Neural Network (DNN) Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a challenge that is reflective...

READ MORE