Publications

Refine Results

(Filters Applied) Clear All

Geographic source estimation using airborne plant environmental DNA in dust

Summary

Information obtained from the analysis of dust, particularly biological particles such as pollen, plant parts, and fungal spores, has great utility in forensic geolocation. As an alternative to manual microscopic analysis, we developed a pipeline that utilizes the environmental DNA (eDNA) from plants in dust samples to estimate previous sample location(s). The species of plant-derived eDNA within dust samples were identified using metabarcoding and their geographic distributions were then derived from occurrence records in the USGS Biodiversity in Service of Our Nation (BISON) database. The distributions for all plant species identified in a sample were used to generate a probabilistic estimate of the sample source. With settled dust collected at four U.S. sites over a 15-month period, we demonstrated positive regional geolocation (within 600 km2 of the collection point) with 47.6% (20 of 42) of the samples analyzed. Attribution accuracy and resolution was dependent on the number of plant species identified in a dust sample, which was greatly affected by the season of collection. In dust samples that yielded a minimum of 20 identified plant species, positive regional attribution improved to 66.7% (16 of 24 samples). Using dust samples collected from 31 different U.S. sites, trace plant eDNA provided relevant regional attribution information on provenance in 32.2%. This demonstrated that analysis of plant eDNA in dust can provide an accurate estimate regional provenance within the U.S., and relevant forensic information, for a substantial fraction of samples analyzed.
READ LESS

Summary

Information obtained from the analysis of dust, particularly biological particles such as pollen, plant parts, and fungal spores, has great utility in forensic geolocation. As an alternative to manual microscopic analysis, we developed a pipeline that utilizes the environmental DNA (eDNA) from plants in dust samples to estimate previous sample...

READ MORE

Automated posterior interval evaluation for inference in probabilistic programming

Author:
Published in:
Intl. Conf. on Probabilistic Programming, PROBPROG, 22 October 2020.

Summary

In probabilistic inference, credible intervals constructed from posterior samples provide ranges of likely values for continuous parameters of interest. Intuitively, an inference procedure is optimal if it produces the most precise posterior intervals that cover the true parameter value with the expected frequency in repeated experiments. We present theories and methods for automating posterior interval evaluation of inference performance in probabilistic programming using two metrics: 1.) truth coverage, and 2.) ratio of the empirical over the ideal interval widths. Demonstrating with inference on popular regression and state-space models, we show how the metrics provide effective comparisons between different inference procedures, and capture the effects of collinearity and model misspecification. Overall, we claim such automated interval evaluation can accelerate the robust design and comparison of probabilistic inference programs by directly diagnosing how accurately and precisely they can estimate parameters of interest.
READ LESS

Summary

In probabilistic inference, credible intervals constructed from posterior samples provide ranges of likely values for continuous parameters of interest. Intuitively, an inference procedure is optimal if it produces the most precise posterior intervals that cover the true parameter value with the expected frequency in repeated experiments. We present theories and...

READ MORE

GraphChallenge.org triangle counting performance [e-print]

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems. GraphChallenge.org provides a wide range of preparsed graph data sets, graph generators, mathematically defined graph algorithms, example serial implementations in a variety of languages, and specific metrics for measuring performance. The triangle counting component of GraphChallenge.org tests the performance of graph processing systems to count all the triangles in a graph and exercises key graph operations found in many graph algorithms. In 2017, 2018, and 2019 many triangle counting submissions were received from a wide range of authors and organizations. This paper presents a performance analysis of the best performers of these submissions. These submissions show that their state-of-the-art triangle counting execution time, Ttri, is a strong function of the number of edges in the graph, Ne, which improved significantly from 2017 (Ttri \approx (Ne/10^8)^4=3) to 2018 (Ttri \approx Ne/10^9) and remained comparable from 2018 to 2019. Graph Challenge provides a clear picture of current graph analysis systems and underscores the need for new innovations to achieve high performance on very large graphs
READ LESS

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems...

READ MORE

Leveraging linear algebra to count and enumerate simple subgraphs

Published in:
2020 IEEE High Performance Extreme Computing Conf., HPEC, 22-24 September 2020.

Summary

Even though subgraph counting and subgraph matching are well-known NP-Hard problems, they are foundational building blocks for many scientific and commercial applications. In order to analyze graphs that contain millions to billions of edges, distributed systems can provide computational scalability through search parallelization. One recent approach for exposing graph algorithm parallelization is through a linear algebra formulation and the use of the matrix multiply operation, which conceptually is equivalent to a massively parallel graph traversal. This approach has several benefits, including 1) a mathematically-rigorous foundation, and 2) ability to leverage specialized linear algebra accelerators and high-performance libraries. In this paper, we explore and define a linear algebra methodology for performing exact subgraph counting and matching for 4-vertex subgraphs excluding the clique. Matches on these simple subgraphs can be joined as components for a larger subgraph. With thorough analysis, we demonstrate that the linear algebra formulation leverages path aggregation which allows it to be up 2x to 5x more efficient in traversing the search space and compressing the results as compared to tree-based subgraph matching techniques.
READ LESS

Summary

Even though subgraph counting and subgraph matching are well-known NP-Hard problems, they are foundational building blocks for many scientific and commercial applications. In order to analyze graphs that contain millions to billions of edges, distributed systems can provide computational scalability through search parallelization. One recent approach for exposing graph algorithm...

READ MORE

Data trust methodology: a blockchain-based approach to instrumenting complex systems

Summary

Increased data sharing and interoperability has created challenges in maintaining a level of trust and confidence in Department of Defense (DoD) systems. As tightly-coupled, unique, static, and rigorously validated mission processing solutions have been supplemented with newer, more dynamic, and complex counterparts, mission effectiveness has been impacted. On the one hand, newer deeper processing with more diverse data inputs can offer resilience against overconfident decisions under rapidly changing conditions. On the other hand, the multitude of diverse methods for reaching a decision may be in apparent conflict and decrease decision confidence. This has sometimes manifested itself in the presentation of simultaneous, divergent information to high-level decision makers. In some important specific instances, this has caused the operators to be less efficient in determining the best course of action. In this paper, we will describe an approach to more efficiently and effectively leverage new data sources and processing solutions, without requiring redesign of each algorithm or the system itself. We achieve this by instrumenting the processing chains with an enterprise blockchain framework. Once instrumented, we can collect, verify, and validate data processing chains by tracking data provenance using smart contracts to add dynamically calculated metadata to an immutable and distributed ledger. This noninvasive approach to verification and validation in data sharing environments has the power to improve decision confidence at larger scale than manual approaches, such as consulting individual developer subject matter experts to understand system behavior. In this paper, we will present our study of the following: 1. Types of information (i.e., knowledge) that are supplied and leveraged by decision makers and operational contextualized data processes (Figure 1) 2. Benefits to verifying data provenance, integrity, and validity within an operational processing chain 3. Our blockchain technology framework coupled with analytical techniques which leverage a verification and validation capability that could be deployed into existing DoD data-processing systems with insignificant performance and operational interference.
READ LESS

Summary

Increased data sharing and interoperability has created challenges in maintaining a level of trust and confidence in Department of Defense (DoD) systems. As tightly-coupled, unique, static, and rigorously validated mission processing solutions have been supplemented with newer, more dynamic, and complex counterparts, mission effectiveness has been impacted. On the one...

READ MORE

Toward an autonomous aerial survey and planning system for humanitarian aid and disaster response

Summary

In this paper we propose an integrated system concept for autonomously surveying and planning emergency response for areas impacted by natural disasters. Referred to as AASAPS-HADR, this system is composed of a network of ground stations and autonomous aerial vehicles interconnected by an ad hoc emergency communication network. The system objectives are three-fold: to provide situational awareness of the evolving disaster event, to generate dispatch and routing plans for emergency vehicles, and to provide continuous communication networks which augment pre-existing communication infrastructure that may have been damaged or destroyed. Lacking development in previous literature, we give particular emphasis to the situational awareness objective of disaster response by proposing an autonomous aerial survey that is tasked with assessing damage to existing road networks, detecting and locating human victims, and providing a cursory assessment of casualty types that can be used to inform medical response priorities. In this paper we provide a high-level system design concept, identify existing AI perception and planning algorithms that most closely suit our purposes as well as technology gaps within those algorithms, and provide initial experimental results for non-contact health monitoring using real-time pose recognition algorithms running on a Nvidia Jetson TX2 mounted on board a quadrotor UAV. Finally we provide technology development recommendations for future phases of the AASAPS-HADR system.
READ LESS

Summary

In this paper we propose an integrated system concept for autonomously surveying and planning emergency response for areas impacted by natural disasters. Referred to as AASAPS-HADR, this system is composed of a network of ground stations and autonomous aerial vehicles interconnected by an ad hoc emergency communication network. The system...

READ MORE

The Human Trafficking Technology Roadmap: a targeted development strategy for the Department of Homeland Security

Summary

Human trafficking is a form of modern-day slavery that involves the use of force, fraud, or coercion for the purposes of involuntary labor and sexual exploitation. It affects tens of million of victims worldwide and generates tens of billions of dollars in illicit profits annually. While agencies across the U.S. Government employ a diverse range of resources to combat human trafficking in the U.S. and abroad, trafficking operations remain challenging to measure, investigate, and interdict. Within the Department of Homeland Security, the Science and Technology Directorate is addressing these challenges by incorporating computational social science research into their counter-human trafficking approach. As part of this approach, the Directorate tasked an interdisciplinary team of national security researchers at the Massachusetts Institute of Technology's Lincoln Laboratory, a federally funded research and development center, to undertake a detailed examination of the human trafficking response across the Homeland Security Enterprise. The first phase of this effort was a government-wide systems analysis of major counter-trafficking thrust areas, including law enforcement and prosecution; public health and emergency medicine; victim services; and policy and legislation. The second phase built on this systems analysis to develop a human trafficking technology roadmap and implementation strategy for the Science and Technology Directorate, which is presented in this document.
READ LESS

Summary

Human trafficking is a form of modern-day slavery that involves the use of force, fraud, or coercion for the purposes of involuntary labor and sexual exploitation. It affects tens of million of victims worldwide and generates tens of billions of dollars in illicit profits annually. While agencies across the U.S...

READ MORE

Artificial intelligence: short history, present developments, and future outlook, final report

Summary

The Director's Office at MIT Lincoln Laboratory (MIT LL) requested a comprehensive study on artificial intelligence (AI) focusing on present applications and future science and technology (S&T) opportunities in the Cyber Security and Information Sciences Division (Division 5). This report elaborates on the main results from the study. Since the AI field is evolving so rapidly, the study scope was to look at the recent past and ongoing developments to lead to a set of findings and recommendations. It was important to begin with a short AI history and a lay-of-the-land on representative developments across the Department of Defense (DoD), intelligence communities (IC), and Homeland Security. These areas are addressed in more detail within the report. A main deliverable from the study was to formulate an end-to-end AI canonical architecture that was suitable for a range of applications. The AI canonical architecture, formulated in the study, serves as the guiding framework for all the sections in this report. Even though the study primarily focused on cyber security and information sciences, the enabling technologies are broadly applicable to many other areas. Therefore, we dedicate a full section on enabling technologies in Section 3. The discussion on enabling technologies helps the reader clarify the distinction among AI, machine learning algorithms, and specific techniques to make an end-to-end AI system viable. In order to understand what is the lay-of-the-land in AI, study participants performed a fairly wide reach within MIT LL and external to the Laboratory (government, commercial companies, defense industrial base, peers, academia, and AI centers). In addition to the study participants (shown in the next section under acknowledgements), we also assembled an internal review team (IRT). The IRT was extremely helpful in providing feedback and in helping with the formulation of the study briefings, as we transitioned from datagathering mode to the study synthesis. The format followed throughout the study was to highlight relevant content that substantiates the study findings, and identify a set of recommendations. An important finding is the significant AI investment by the so-called "big 6" commercial companies. These major commercial companies are Google, Amazon, Facebook, Microsoft, Apple, and IBM. They dominate in the AI ecosystem research and development (R&D) investments within the U.S. According to a recent McKinsey Global Institute report, cumulative R&D investment in AI amounts to about $30 billion per year. This amount is substantially higher than the R&D investment within the DoD, IC, and Homeland Security. Therefore, the DoD will need to be very strategic about investing where needed, while at the same time leveraging the technologies already developed and available from a wide range of commercial applications. As we will discuss in Section 1 as part of the AI history, MIT LL has been instrumental in developing advanced AI capabilities. For example, MIT LL has a long history in the development of human language technologies (HLT) by successfully applying machine learning algorithms to difficult problems in speech recognition, machine translation, and speech understanding. Section 4 elaborates on prior applications of these technologies, as well as newer applications in the context of multi-modalities (e.g., speech, text, images, and video). An end-to-end AI system is very well suited to enhancing the capabilities of human language analysis. Section 5 discusses AI's nascent role in cyber security. There have been cases where AI has already provided important benefits. However, much more research is needed in both the application of AI to cyber security and the associated vulnerability to the so-called adversarial AI. Adversarial AI is an area very critical to the DoD, IC, and Homeland Security, where malicious adversaries can disrupt AI systems and make them untrusted in operational environments. This report concludes with specific recommendations by formulating the way forward for Division 5 and a discussion of S&T challenges and opportunities. The S&T challenges and opportunities are centered on the key elements of the AI canonical architecture to strengthen the AI capabilities across the DoD, IC, and Homeland Security in support of national security.
READ LESS

Summary

The Director's Office at MIT Lincoln Laboratory (MIT LL) requested a comprehensive study on artificial intelligence (AI) focusing on present applications and future science and technology (S&T) opportunities in the Cyber Security and Information Sciences Division (Division 5). This report elaborates on the main results from the study. Since the...

READ MORE

GraphChallenge.org: raising the bar on graph analytic performance

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems. GraphChallenge.org provides a wide range of preparsed graph data sets, graph generators, mathematically defined graph algorithms, example serial implementations in a variety of languages, and specific metrics for measuring performance. Graph Challenge 2017 received 22 submissions by 111 authors from 36 organizations. The submissions highlighted graph analytic innovations in hardware, software, algorithms, systems, and visualization. These submissions produced many comparable performance measurements that can be used for assessing the current state of the art of the field. There were numerous submissions that implemented the triangle counting challenge and resulted in over 350 distinct measurements. Analysis of these submissions show that their execution time is a strong function of the number of edges in the graph, Ne, and is typically proportional to N4=3 e for large values of Ne. Combining the model fits of the submissions presents a picture of the current state of the art of graph analysis, which is typically 108 edges processed per second for graphs with 108 edges. These results are 30 times faster than serial implementations commonly used by many graph analysts and underscore the importance of making these performance benefits available to the broader community. Graph Challenge provides a clear picture of current graph analysis systems and underscores the need for new innovations to achieve high performance on very large graphs.
READ LESS

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems...

READ MORE

Detecting intracranial hemorrhage with deep learning

Published in:
40th Int. Conf. of the IEEE Engineering in Medicine and Biology Society, EMBC, 17-21 July 2018.

Summary

Initial results are reported on automated detection of intracranial hemorrhage from CT, which would be valuable in a computer-aided diagnosis system to help the radiologist detect subtle hemorrhages. Previous work has taken a classic approach involving multiple steps of alignment, image processing, image corrections, handcrafted feature extraction, and classification. Our current work instead uses a deep convolutional neural network to simultaneously learn features and classification, eliminating the multiple hand-tuned steps. Performance is improved by computing the mean output for rotations of the input image. Postprocessing is additionally applied to the CNN output to significantly improve specificity. The database consists of 134 CT cases (4,300 images), divided into 60, 5, and 69 cases for training, validation, and test. Each case typically includes multiple hemorrhages. Performance on the test set was 81% sensitivity per lesion (34/42 lesions) and 98% specificity per case (45/46 cases). The sensitivity is comparable to previous results (on different datasets), but with a significantly higher specificity. In addition, insights are shared to improve performance as the database is expanded.
READ LESS

Summary

Initial results are reported on automated detection of intracranial hemorrhage from CT, which would be valuable in a computer-aided diagnosis system to help the radiologist detect subtle hemorrhages. Previous work has taken a classic approach involving multiple steps of alignment, image processing, image corrections, handcrafted feature extraction, and classification. Our...

READ MORE