Publications

Refine Results

(Filters Applied) Clear All

Principles for evaluation of AI/ML model performance and robustness, revision 1

Summary

The Department of Defense (DoD) has significantly increased its investment in the design, evaluation, and deployment of Artificial Intelligence and Machine Learning (AI/ML) capabilities to address national security needs. While there are numerous AI/ML successes in the academic and commercial sectors, many of these systems have also been shown to be brittle and nonrobust. In a complex and ever-changing national security environment, it is vital that the DoD establish a sound and methodical process to evaluate the performance and robustness of AI/ML models before these new capabilities are deployed to the field. Without an effective evaluation process, the DoD may deploy AI/ML models that are assumed to be effective given limited evaluation metrics but actually have poor performance and robustness on operational data. Poor evaluation practices lead to loss of trust in AI/ML systems by model operators and more frequent--often costly--design updates needed to address the evolving security environment. In contrast, an effective evaluation process can drive the design of more resilient capabilities, ag potential limitations of models before they are deployed, and build operator trust in AI/ML systems. This paper reviews the AI/ML development process, highlights common best practices for AI/ML model evaluation, and makes the following recommendations to DoD evaluators to ensure the deployment of robust AI/ML capabilities for national security needs: -Develop testing datasets with sufficient variation and number of samples to effectively measure the expected performance of the AI/ML model on future (unseen) data once deployed, -Maintain separation between data used for design and evaluation (i.e., the test data is not used to design the AI/ML model or train its parameters) in order to ensure an honest and unbiased assessment of the model's capability, -Evaluate performance given small perturbations and corruptions to data inputs to assess the smoothness of the AI/ML model and identify potential vulnerabilities, and -Evaluate performance on samples from data distributions that are shifted from the assumed distribution that was used to design the AI/ML model to assess how the model may perform on operational data that may differ from the training data. By following the recommendations for evaluation presented in this paper, the DoD can fully take advantage of the AI/ML revolution, delivering robust capabilities that maintain operational feasibility over longer periods of time, and increase warfighter confidence in AI/ML systems.
READ LESS

Summary

The Department of Defense (DoD) has significantly increased its investment in the design, evaluation, and deployment of Artificial Intelligence and Machine Learning (AI/ML) capabilities to address national security needs. While there are numerous AI/ML successes in the academic and commercial sectors, many of these systems have also been shown to...

READ MORE

Multimodal representation learning via maximization of local mutual information [e-print]

Published in:
Intl. Conf. on Medical Image Computing and Computer Assisted Intervention, MICCAI, 27 September-1 October 2021.

Summary

We propose and demonstrate a representation learning approach by maximizing the mutual information between local features of images and text. The goal of this approach is to learn useful image representations by taking advantage of the rich information contained in the free text that describes the findings in the image. Our method learns image and text encoders by encouraging the resulting representations to exhibit high local mutual information. We make use of recent advances in mutual information estimation with neural network discriminators. We argue that, typically, the sum of local mutual information is a lower bound on the global mutual information. Our experimental results in the downstream image classification tasks demonstrate the advantages of using local features for image-text representation learning.
READ LESS

Summary

We propose and demonstrate a representation learning approach by maximizing the mutual information between local features of images and text. The goal of this approach is to learn useful image representations by taking advantage of the rich information contained in the free text that describes the findings in the image...

READ MORE

Learning emergent discrete message communication for cooperative reinforcement learning

Published in:
37th Conf. on Uncertainty in Artificial Intelligence, UAI 2021, early access, 26-30 July 2021.

Summary

Communication is a important factor that enables agents work cooperatively in multi-agent reinforcement learning (MARL). Most previous work uses continuous message communication whose high representational capacity comes at the expense of interpretability. Allowing agents to learn their own discrete message communication protocol emerged from a variety of domains can increase the interpretability for human designers and other agents. This paper proposes a method to generate discrete messages analogous to human languages, and achieve communication by a broadcast-and-listen mechanism based on self-attention. We show that discrete message communication has performance comparable to continuous message communication but with much a much smaller vocabulary size. Furthermore, we propose an approach that allows humans to interactively send discrete messages to agents.
READ LESS

Summary

Communication is a important factor that enables agents work cooperatively in multi-agent reinforcement learning (MARL). Most previous work uses continuous message communication whose high representational capacity comes at the expense of interpretability. Allowing agents to learn their own discrete message communication protocol emerged from a variety of domains can increase...

READ MORE

Beyond expertise and roles: a framework to characterize the stakeholders of interpretable machine learning and their needs

Published in:
Proc. Conf. on Human Factors in Computing Systems, 8-13 May 2021, article no. 74.

Summary

To ensure accountability and mitigate harm, it is critical that diverse stakeholders can interrogate black-box automated systems and find information that is understandable, relevant, and useful to them. In this paper, we eschew prior expertise- and role-based categorizations of interpretability stakeholders in favor of a more granular framework that decouples stakeholders' knowledge from their interpretability needs. We characterize stakeholders by their formal, instrumental, and personal knowledge and how it manifests in the contexts of machine learning, the data domain, and the general milieu. We additionally distill a hierarchical typology of stakeholder needs that distinguishes higher-level domain goals from lower-level interpretability tasks. In assessing the descriptive, evaluative, and generative powers of our framework, we find our more nuanced treatment of stakeholders reveals gaps and opportunities in the interpretability literature, adds precision to the design and comparison of user studies, and facilitates a more reflexive approach to conducting this research.
READ LESS

Summary

To ensure accountability and mitigate harm, it is critical that diverse stakeholders can interrogate black-box automated systems and find information that is understandable, relevant, and useful to them. In this paper, we eschew prior expertise- and role-based categorizations of interpretability stakeholders in favor of a more granular framework that decouples...

READ MORE

Automatic detection of influential actors in disinformation networks

Summary

The weaponization of digital communications and social media to conduct disinformation campaigns at immense scale, speed, and reach presents new challenges to identify and counter hostile influence operations (IO). This paper presents an end-to-end framework to automate detection of disinformation narratives, networks, and influential actors. The framework integrates natural language processing, machine learning, graph analytics, and a novel network causal inference approach to quantify the impact of individual actors in spreading IO narratives. We demonstrate its capability on real-world hostile IO campaigns with Twitter datasets collected during the 2017 French presidential elections, and known IO accounts disclosed by Twitter. Our system detects IO accounts with 96% precision, 79% recall, and 96% area-under-the-PR-curve, maps out salient network communities, and discovers high-impact accounts that escape the lens of traditional impact statistics based on activity counts and network centrality. Results are corroborated with independent sources of known IO accounts from U.S. Congressional reports, investigative journalism, and IO datasets provided by Twitter.
READ LESS

Summary

The weaponization of digital communications and social media to conduct disinformation campaigns at immense scale, speed, and reach presents new challenges to identify and counter hostile influence operations (IO). This paper presents an end-to-end framework to automate detection of disinformation narratives, networks, and influential actors. The framework integrates natural language...

READ MORE

Ultrasound diagnosis of COVID-19: robustness and explainability

Published in:
arXiv:2012.01145v1 [eess.IV]

Summary

Diagnosis of COVID-19 at point of care is vital to the containment of the global pandemic. Point of care ultrasound (POCUS) provides rapid imagery of lungs to detect COVID-19 in patients in a repeatable and cost effective way. Previous work has used public datasets of POCUS videos to train an AI model for diagnosis that obtains high sensitivity. Due to the high stakes application we propose the use of robust and explainable techniques. We demonstrate experimentally that robust models have more stable predictions and offer improved interpretability. A framework of contrastive explanations based on adversarial perturbations is used to explain model predictions that aligns with human visual perception.
READ LESS

Summary

Diagnosis of COVID-19 at point of care is vital to the containment of the global pandemic. Point of care ultrasound (POCUS) provides rapid imagery of lungs to detect COVID-19 in patients in a repeatable and cost effective way. Previous work has used public datasets of POCUS videos to train an...

READ MORE

Ultrasound and artificial intelligence

Published in:
Chapter 8 in Machine Learning in Cardiovascular Medicine, 2020, pp. 177-210.

Summary

Compared to other major medical imaging modalities such as X-ray, computed tomography (CT), and magnetic resonance imaging, medical ultrasound (US) has unique attributes that make it the preferred modality for many clinical applications. In particular, US is nonionizing, portable, and provides real-time imaging, with adequate spatial and depth resolution to visualize tissue dynamics. The ability to measure Doppler information is also important, particularly for measuring blood flows. The small size of US transducers is a key attribute for intravascular applications. In addition, accessibility has been increased with the use of portable US, which continues to move toward a smaller footprint and lower cost. Nowadays, some US probes can even be directly connected to a phone or tablet. On the other hand, US also has unique challenges, particularly in that image quality is highly dependent on the operator’s skill in acquiring images based on the proper position, orientation, and probe pressure. Additional challenges that further require operator skill include the presence of noise, artifacts, limited field of view, difficulty in imaging structures behind bone and air, and device variability across manufacturers. Sonographers become highly proficient through extensive training and long experience, but high intra- and interobserver variability remains. This skill dependence has limited the wider use of US by healthcare providers who are not US imaging specialists. Recent advances in machine learning (ML) have been increasingly applied to medical US (Brattain, Telfer, Dhyani, Grajo, & Samir, 2018), with a goal of reducing intra- and interobserver variability as well as interpretation time. As progress toward these goals is made, US use by nonspecialists is expected to proliferate, including nurses at the bedside or medics in the field. The acceleration in ML applications for medical US can be seen from the increasing number of publications (Fig. 8.1) and Food and Drug Administration (FDA) approvals (Table 8.1) in the past few years. Fig. 8.1 shows that cardiovascular applications (spanning the heart, brain and vessels) have received the most attention, compared to other organs. Table 8.1 shows that pace of US FDA-cleared artificial intelligence (AI) products that combine AI and ultrasound is accelerating. Of note, many of the products have been approved over the last couple of years. Companies such as Butterfly Network (Guilford, CT) have also demonstrated AI-driven applications for portable ultrasound and more FDA clearances are expected to be published. The goals of this chapter are to highlight the recent progress, as well as the current challenges and future opportunities. Specifically, this chapter addresses topics such as the following: (1) what is the current state of machine learning for medical US application, both in research and commercially; (2) what applications are receiving the most attention and have performance improvements been quantified; (3) how do ML solutions fit in an overall workflow; and (4) what open-source datasets are available for the broader community to contribute to progress in this field. The focus is on cardiovascular applications (Section Cardiovascular/echocardiography), but common themes and differences for other applications for medical US are also summarized (Section Breast, liver, and thyroid ultrasound). A discussion is offered in Discussion and outlook section.
READ LESS

Summary

Compared to other major medical imaging modalities such as X-ray, computed tomography (CT), and magnetic resonance imaging, medical ultrasound (US) has unique attributes that make it the preferred modality for many clinical applications. In particular, US is nonionizing, portable, and provides real-time imaging, with adequate spatial and depth resolution to...

READ MORE

Failure prediction by confidence estimation of uncertainty-aware Dirichlet networks

Published in:
https://arxiv.org/abs/2010.09865

Summary

Reliably assessing model confidence in deep learning and predicting errors likely to be made are key elements in providing safety for model deployment, in particular for applications with dire consequences. In this paper, it is first shown that uncertainty-aware deep Dirichlet neural networks provide an improved separation between the confidence of correct and incorrect predictions in the true class probability (TCP) metric. Second, as the true class is unknown at test time, a new criterion is proposed for learning the true class probability by matching prediction confidence scores while taking imbalance and TCP constraints into account for correct predictions and failures. Experimental results show our method improves upon the maximum class probability (MCP) baseline and predicted TCP for standard networks on several image classification tasks with various network architectures.
READ LESS

Summary

Reliably assessing model confidence in deep learning and predicting errors likely to be made are key elements in providing safety for model deployment, in particular for applications with dire consequences. In this paper, it is first shown that uncertainty-aware deep Dirichlet neural networks provide an improved separation between the confidence...

READ MORE

Image processing pipeline for liver fibrosis classification using ultrasound shear wave elastography

Published in:
Ultrasound in Med. & Biol., Vol. 46, No. 10, October 2020, pp. 2667-2676.

Summary

The purpose of this study was to develop an automated method for classifying liver fibrosis stage >=F2 based on ultrasound shear wave elastography (SWE) and to assess the system's performance in comparison with a reference manual approach. The reference approach consists of manually selecting a region of interest from each of eight or more SWE images, computing the mean tissue stiffness within each of the regions of interest and computing a resulting stiffness value as the median of the means. The 527-subject database consisted of 5526 SWE images and pathologist-scored biopsies, with data collected from a single system at a single site. The automated method integrates three modules that assess SWE image quality, select a region of interest from each SWE measurement and perform machine learning-based, multi-image SWE classification for fibrosis stage >=F2. Several classification methods were developed and tested using fivefold cross-validation with training, validation and test sets partitioned by subject. Performance metrics were area under receiver operating characteristic curve (AUROC), specificity at 95% sensitivity and number of SWE images required. The final automated method yielded an AUROC of 0.93 (95% confidence interval: 0.90-0.94) versus 0.69 (95% confidence interval: 0.65-0.72) for the reference method, 71% specificity with 95% sensitivity versus 5% and four images per decision versus eight or more. In conclusion, the automated method reported in this study significantly improved the accuracy for >=F2 classification of SWE measurements as well as reduced the number of measurements needed, which has the potential to reduce clinical workflow.
READ LESS

Summary

The purpose of this study was to develop an automated method for classifying liver fibrosis stage >=F2 based on ultrasound shear wave elastography (SWE) and to assess the system's performance in comparison with a reference manual approach. The reference approach consists of manually selecting a region of interest from each...

READ MORE

Towards a distributed framework for multi-agent reinforcement learning research

Summary

Some of the most important publications in deep reinforcement learning over the last few years have been fueled by access to massive amounts of computation through large scale distributed systems. The success of these approaches in achieving human-expert level performance on several complex video-game environments has motivated further exploration into the limits of these approaches as computation increases. In this paper, we present a distributed RL training framework designed for super computing infrastructures such as the MIT SuperCloud. We review a collection of challenging learning environments—such as Google Research Football, StarCraft II, and Multi-Agent Mujoco— which are at the frontier of reinforcement learning research. We provide results on these environments that illustrate the current state of the field on these problems. Finally, we also quantify and discuss the computational requirements needed for performing RL research by enumerating all experiments performed on these environments.
READ LESS

Summary

Some of the most important publications in deep reinforcement learning over the last few years have been fueled by access to massive amounts of computation through large scale distributed systems. The success of these approaches in achieving human-expert level performance on several complex video-game environments has motivated further exploration into...

READ MORE