Publications

Refine Results

(Filters Applied) Clear All

Artificial intelligence: short history, present developments, and future outlook, final report

Summary

The Director's Office at MIT Lincoln Laboratory (MIT LL) requested a comprehensive study on artificial intelligence (AI) focusing on present applications and future science and technology (S&T) opportunities in the Cyber Security and Information Sciences Division (Division 5). This report elaborates on the main results from the study. Since the AI field is evolving so rapidly, the study scope was to look at the recent past and ongoing developments to lead to a set of findings and recommendations. It was important to begin with a short AI history and a lay-of-the-land on representative developments across the Department of Defense (DoD), intelligence communities (IC), and Homeland Security. These areas are addressed in more detail within the report. A main deliverable from the study was to formulate an end-to-end AI canonical architecture that was suitable for a range of applications. The AI canonical architecture, formulated in the study, serves as the guiding framework for all the sections in this report. Even though the study primarily focused on cyber security and information sciences, the enabling technologies are broadly applicable to many other areas. Therefore, we dedicate a full section on enabling technologies in Section 3. The discussion on enabling technologies helps the reader clarify the distinction among AI, machine learning algorithms, and specific techniques to make an end-to-end AI system viable. In order to understand what is the lay-of-the-land in AI, study participants performed a fairly wide reach within MIT LL and external to the Laboratory (government, commercial companies, defense industrial base, peers, academia, and AI centers). In addition to the study participants (shown in the next section under acknowledgements), we also assembled an internal review team (IRT). The IRT was extremely helpful in providing feedback and in helping with the formulation of the study briefings, as we transitioned from datagathering mode to the study synthesis. The format followed throughout the study was to highlight relevant content that substantiates the study findings, and identify a set of recommendations. An important finding is the significant AI investment by the so-called "big 6" commercial companies. These major commercial companies are Google, Amazon, Facebook, Microsoft, Apple, and IBM. They dominate in the AI ecosystem research and development (R&D) investments within the U.S. According to a recent McKinsey Global Institute report, cumulative R&D investment in AI amounts to about $30 billion per year. This amount is substantially higher than the R&D investment within the DoD, IC, and Homeland Security. Therefore, the DoD will need to be very strategic about investing where needed, while at the same time leveraging the technologies already developed and available from a wide range of commercial applications. As we will discuss in Section 1 as part of the AI history, MIT LL has been instrumental in developing advanced AI capabilities. For example, MIT LL has a long history in the development of human language technologies (HLT) by successfully applying machine learning algorithms to difficult problems in speech recognition, machine translation, and speech understanding. Section 4 elaborates on prior applications of these technologies, as well as newer applications in the context of multi-modalities (e.g., speech, text, images, and video). An end-to-end AI system is very well suited to enhancing the capabilities of human language analysis. Section 5 discusses AI's nascent role in cyber security. There have been cases where AI has already provided important benefits. However, much more research is needed in both the application of AI to cyber security and the associated vulnerability to the so-called adversarial AI. Adversarial AI is an area very critical to the DoD, IC, and Homeland Security, where malicious adversaries can disrupt AI systems and make them untrusted in operational environments. This report concludes with specific recommendations by formulating the way forward for Division 5 and a discussion of S&T challenges and opportunities. The S&T challenges and opportunities are centered on the key elements of the AI canonical architecture to strengthen the AI capabilities across the DoD, IC, and Homeland Security in support of national security.
READ LESS

Summary

The Director's Office at MIT Lincoln Laboratory (MIT LL) requested a comprehensive study on artificial intelligence (AI) focusing on present applications and future science and technology (S&T) opportunities in the Cyber Security and Information Sciences Division (Division 5). This report elaborates on the main results from the study. Since the...

READ MORE

GraphChallenge.org: raising the bar on graph analytic performance

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems. GraphChallenge.org provides a wide range of preparsed graph data sets, graph generators, mathematically defined graph algorithms, example serial implementations in a variety of languages, and specific metrics for measuring performance. Graph Challenge 2017 received 22 submissions by 111 authors from 36 organizations. The submissions highlighted graph analytic innovations in hardware, software, algorithms, systems, and visualization. These submissions produced many comparable performance measurements that can be used for assessing the current state of the art of the field. There were numerous submissions that implemented the triangle counting challenge and resulted in over 350 distinct measurements. Analysis of these submissions show that their execution time is a strong function of the number of edges in the graph, Ne, and is typically proportional to N4=3 e for large values of Ne. Combining the model fits of the submissions presents a picture of the current state of the art of graph analysis, which is typically 108 edges processed per second for graphs with 108 edges. These results are 30 times faster than serial implementations commonly used by many graph analysts and underscore the importance of making these performance benefits available to the broader community. Graph Challenge provides a clear picture of current graph analysis systems and underscores the need for new innovations to achieve high performance on very large graphs.
READ LESS

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems...

READ MORE

Cloud computing in tactical environments

Summary

Ground personnel at the tactical edge often lack data and analytics that would increase their effectiveness. To address this problem, this work investigates methods to deploy cloud computing capabilities in tactical environments. Our approach is to identify representative applications and to design a system that spans the software/hardware stack to support such applications while optimizing the use of scarce resources. This paper presents our high-level design and the results of initial experiments that indicate the validity of our approach.
READ LESS

Summary

Ground personnel at the tactical edge often lack data and analytics that would increase their effectiveness. To address this problem, this work investigates methods to deploy cloud computing capabilities in tactical environments. Our approach is to identify representative applications and to design a system that spans the software/hardware stack to...

READ MORE

Dynamically correlating network terrain to organizational missions

Published in:
Proc. NATO IST-153/RWS-21 Workshop on Cyber Resilience, 23-25 October 2017.

Summary

A precondition for assessing mission resilience in a cyber context is identifying which cyber assets support the mission. However, determining the asset dependencies of a mission is typically a manual process that is time consuming, labor intensive and error-prone. Automating the process of mapping between network assets and organizational missions is highly desirable but technically challenging because it is difficult to find an appropriate proxy within available cyber data for an asset's mission utilization. In this paper we discuss strategies to automate the processes of both breaking an organization into its constituent mission areas, and mapping those mission areas onto network assets, using a data-driven approach. We have implemented these strategies to mine network data at MIT Lincoln Laboratory, and provide examples. We also discuss examples of how such mission mapping tools can help an analyst to identify patterns and develop contextual insight that would otherwise have been obscure.
READ LESS

Summary

A precondition for assessing mission resilience in a cyber context is identifying which cyber assets support the mission. However, determining the asset dependencies of a mission is typically a manual process that is time consuming, labor intensive and error-prone. Automating the process of mapping between network assets and organizational missions...

READ MORE

Streaming graph challenge: stochastic block partition

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.
READ LESS

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive...

READ MORE

Static graph challenge: subgraph isomorphism

Summary

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges to move these communities forward. The proposed Subgraph Isomorphism Graph Challenge draws upon prior challenges from machine learning, high performance computing, and visual analytics to create a graph challenge that is reflective of many real-world graph analytics processing systems. The Subgraph Isomorphism Graph Challenge is a holistic specification with multiple integrated kernels that can be run together or independently. Each kernel is well defined mathematically and can be implemented in any programming environment. Subgraph isomorphism is amenable to both vertex-centric implementations and array-based implementations (e.g., using the Graph-BLAS.org standard). The computations are simple enough that performance predictions can be made based on simple computing hardware models. The surrounding kernels provide the context for each kernel that allows rigorous definition of both the input and the output for each kernel. Furthermore, since the proposed graph challenge is scalable in both problem size and hardware, it can be used to measure and quantitatively compare a wide range of present day and future systems. Serial implementations in C++, Python, Python with Pandas, Matlab, Octave, and Julia have been implemented and their single threaded performance have been measured. Specifications, data, and software are publicly available at GraphChallenge.org.
READ LESS

Summary

The rise of graph analytic systems has created a need for ways to measure and compare the capabilities of these systems. Graph analytics present unique scalability difficulties. The machine learning, high performance computing, and visual analytics communities have wrestled with these difficulties for decades and developed methodologies for creating challenges...

READ MORE

Collaborative Data Analysis and Discovery for Cyber Security

Published in:
Proceedings of the 12th Symposium on Usable Privacy and Security (SOUPS 2016)

Summary

In this paper, we present the Cyber Analyst Real-Time Integrated Notebook Application (CARINA). CARINA is a collaborative investigation system that aids in decision making by co-locating the analysis environment with centralized cyber data sources, and providing next generation analysts with increased visibility to the work of others.
READ LESS

Summary

In this paper, we present the Cyber Analyst Real-Time Integrated Notebook Application (CARINA). CARINA is a collaborative investigation system that aids in decision making by co-locating the analysis environment with centralized cyber data sources, and providing next generation analysts with increased visibility to the work of others.

READ MORE

BubbleNet: A Cyber Security Dashboard for Visualizing Patterns

Published in:
Proceeding of 2016 Eurographics Conference on Visualization (EuroVis)

Summary

The field of cyber security is faced with ever-expanding amounts of data and a constant barrage of cyber attacks. Within this space, we have designed BubbleNet as a cyber security dashboard to help network analysts identify and summarize patterns within the data.
READ LESS

Summary

The field of cyber security is faced with ever-expanding amounts of data and a constant barrage of cyber attacks. Within this space, we have designed BubbleNet as a cyber security dashboard to help network analysts identify and summarize patterns within the data.

READ MORE

A data-stream classification system for investigating terrorist threats

Published in:
Proc. SPIE 9851, Next-Generation Analyst IV, 98510L (May 12, 2016); doi:10.1117/12.2224104.

Summary

The role of cyber forensics in criminal investigations has greatly increased in recent years due to the wealth of data that is collected and available to investigators. Physical forensics has also experienced a data volume and fidelity revolution due to advances in methods for DNA and trace evidence analysis. Key to extracting insight is the ability to correlate across multi-modal data, which depends critically on identifying a touch-point connecting the separate data streams. Separate data sources may be connected because they refer to the same individual, entity or event. In this paper we present a data source classification system tailored to facilitate the investigation of potential terrorist activity. This taxonomy is structured to illuminate the defining characteristics of a particular terrorist effort and designed to guide reporting to decision makers that is complete, concise, and evidence-based. The classification system has been validated and empirically utilized in the forensic analysis of a simulated terrorist activity. Next-generation analysts can use this schema to label and correlate across existing data streams, assess which critical information may be missing from the data, and identify options for collecting additional data streams to fill information gaps.
READ LESS

Summary

The role of cyber forensics in criminal investigations has greatly increased in recent years due to the wealth of data that is collected and available to investigators. Physical forensics has also experienced a data volume and fidelity revolution due to advances in methods for DNA and trace evidence analysis. Key...

READ MORE

Cloudbreak: answering the challenges of cyber command and control

Published in:
Lincoln Laboratory Journal, Vol. 22, No. 1, 2016, pp. 60-73.

Summary

Lincoln Laboratory's flexible, user-centered framework for the development of command-and-control systems allows the rapid prototyping of new system capabilities. This methodology, Cloudbreak, effectively supports the insertion of new capabilities into existing systems and fosters user acceptance of new tools.
READ LESS

Summary

Lincoln Laboratory's flexible, user-centered framework for the development of command-and-control systems allows the rapid prototyping of new system capabilities. This methodology, Cloudbreak, effectively supports the insertion of new capabilities into existing systems and fosters user acceptance of new tools.

READ MORE