Publications

Refine Results

(Filters Applied) Clear All

Artificial intelligence: short history, present developments, and future outlook, final report

Summary

The Director's Office at MIT Lincoln Laboratory (MIT LL) requested a comprehensive study on artificial intelligence (AI) focusing on present applications and future science and technology (S&T) opportunities in the Cyber Security and Information Sciences Division (Division 5). This report elaborates on the main results from the study. Since the AI field is evolving so rapidly, the study scope was to look at the recent past and ongoing developments to lead to a set of findings and recommendations. It was important to begin with a short AI history and a lay-of-the-land on representative developments across the Department of Defense (DoD), intelligence communities (IC), and Homeland Security. These areas are addressed in more detail within the report. A main deliverable from the study was to formulate an end-to-end AI canonical architecture that was suitable for a range of applications. The AI canonical architecture, formulated in the study, serves as the guiding framework for all the sections in this report. Even though the study primarily focused on cyber security and information sciences, the enabling technologies are broadly applicable to many other areas. Therefore, we dedicate a full section on enabling technologies in Section 3. The discussion on enabling technologies helps the reader clarify the distinction among AI, machine learning algorithms, and specific techniques to make an end-to-end AI system viable. In order to understand what is the lay-of-the-land in AI, study participants performed a fairly wide reach within MIT LL and external to the Laboratory (government, commercial companies, defense industrial base, peers, academia, and AI centers). In addition to the study participants (shown in the next section under acknowledgements), we also assembled an internal review team (IRT). The IRT was extremely helpful in providing feedback and in helping with the formulation of the study briefings, as we transitioned from datagathering mode to the study synthesis. The format followed throughout the study was to highlight relevant content that substantiates the study findings, and identify a set of recommendations. An important finding is the significant AI investment by the so-called "big 6" commercial companies. These major commercial companies are Google, Amazon, Facebook, Microsoft, Apple, and IBM. They dominate in the AI ecosystem research and development (R&D) investments within the U.S. According to a recent McKinsey Global Institute report, cumulative R&D investment in AI amounts to about $30 billion per year. This amount is substantially higher than the R&D investment within the DoD, IC, and Homeland Security. Therefore, the DoD will need to be very strategic about investing where needed, while at the same time leveraging the technologies already developed and available from a wide range of commercial applications. As we will discuss in Section 1 as part of the AI history, MIT LL has been instrumental in developing advanced AI capabilities. For example, MIT LL has a long history in the development of human language technologies (HLT) by successfully applying machine learning algorithms to difficult problems in speech recognition, machine translation, and speech understanding. Section 4 elaborates on prior applications of these technologies, as well as newer applications in the context of multi-modalities (e.g., speech, text, images, and video). An end-to-end AI system is very well suited to enhancing the capabilities of human language analysis. Section 5 discusses AI's nascent role in cyber security. There have been cases where AI has already provided important benefits. However, much more research is needed in both the application of AI to cyber security and the associated vulnerability to the so-called adversarial AI. Adversarial AI is an area very critical to the DoD, IC, and Homeland Security, where malicious adversaries can disrupt AI systems and make them untrusted in operational environments. This report concludes with specific recommendations by formulating the way forward for Division 5 and a discussion of S&T challenges and opportunities. The S&T challenges and opportunities are centered on the key elements of the AI canonical architecture to strengthen the AI capabilities across the DoD, IC, and Homeland Security in support of national security.
READ LESS

Summary

The Director's Office at MIT Lincoln Laboratory (MIT LL) requested a comprehensive study on artificial intelligence (AI) focusing on present applications and future science and technology (S&T) opportunities in the Cyber Security and Information Sciences Division (Division 5). This report elaborates on the main results from the study. Since the...

READ MORE

Intersection and convex combination in multi-source spectral planted cluster detection

Published in:
IEEE Global Conf. on Signal and Information Processing, GlobalSIP, 7-9 December 2016.

Summary

Planted cluster detection is an important form of signal detection when the data are in the form of a graph. When there are multiple graphs representing multiple connection types, the method of aggregation can have significant impact on the results of a detection algorithm. This paper addresses the tradeoff between two possible aggregation methods: convex combination and intersection. For a spectral detection method, convex combination dominates when the cluster is relatively sparse in at least one graph, while the intersection method dominates in cases where it is dense across graphs. Experimental results confirm the theory. We consider the context of adversarial cluster placement, and determine how an adversary would distribute connections among the graphs to best avoid detection.
READ LESS

Summary

Planted cluster detection is an important form of signal detection when the data are in the form of a graph. When there are multiple graphs representing multiple connection types, the method of aggregation can have significant impact on the results of a detection algorithm. This paper addresses the tradeoff between...

READ MORE

A reverse approach to named entity extraction and linking in microposts

Published in:
Proc. of the 6th Workshop on "Making Sense of Microposts" (part of: 25th Int. World Wide Web Conf., 11 April 2016), #Microposts2016, pp. 67-69.

Summary

In this paper, we present a pipeline for named entity extraction and linking that is designed specifically for noisy, grammatically inconsistent domains where traditional named entity techniques perform poorly. Our approach leverages a large knowledge base to improve entity recognition, while maintaining the use of traditional NER to identify mentions that are not co-referent with any entities in the knowledge base.
READ LESS

Summary

In this paper, we present a pipeline for named entity extraction and linking that is designed specifically for noisy, grammatically inconsistent domains where traditional named entity techniques perform poorly. Our approach leverages a large knowledge base to improve entity recognition, while maintaining the use of traditional NER to identify mentions...

READ MORE

Improved hidden clique detection by optimal linear fusion of multiple adjacency matrices

Published in:
2015 Asilomar Conf. on Signals, Systems and Computers, 8-11 November 2015.

Summary

Graph fusion has emerged as a promising research area for addressing challenges associated with noisy, uncertain, multi-source data. While many ad-hoc graph fusion techniques exist in the current literature, an analytical approach for analyzing the fundamentals of the graph fusion problem is lacking. We consider the setting where we are given multiple Erdos-Renyi modeled adjacency matrices containing a common hidden or planted clique. The objective is to combine them linearly so that the principal eigenvectors of the resulting matrix best reveal the vertices associated with the clique. We utilize recent results from random matrix theory to derive the optimal weighting coefficients and use these insights to develop a data-driven fusion algorithm. We demonstrate the improved performance of the algorithm relative to other simple heuristics.
READ LESS

Summary

Graph fusion has emerged as a promising research area for addressing challenges associated with noisy, uncertain, multi-source data. While many ad-hoc graph fusion techniques exist in the current literature, an analytical approach for analyzing the fundamentals of the graph fusion problem is lacking. We consider the setting where we are...

READ MORE

Residuals-based subgraph detection with cue vertices

Published in:
2015 Asilomar Conf. on Signals, Systems and Computers, 8-11 November 2015.

Summary

A common problem in modern graph analysis is the detection of communities, an example of which is the detection of a single anomalously dense subgraph. Recent results have demonstrated a fundamental limit for this problem when using spectral analysis of modularity. In this paper, we demonstrate the implication of these results on subgraph detection when a cue vertex is provided, indicating one of the vertices in the community of interest. Several recent algorithms for local community detection are applied in this context, and we compare their empirical performance to that of the simple method used to derive the theoretical detection limits.
READ LESS

Summary

A common problem in modern graph analysis is the detection of communities, an example of which is the detection of a single anomalously dense subgraph. Recent results have demonstrated a fundamental limit for this problem when using spectral analysis of modularity. In this paper, we demonstrate the implication of these...

READ MORE

Joint audio-visual mining of uncooperatively collected video: FY14 Line-Supported Information, Computation, and Exploitation Program

Summary

The rate at which video is being created and gathered is rapidly accelerating as access to means of production and distribution expand. This rate of increase, however, is greatly outpacing the development of content-based tools to help users sift through this unstructured, multimedia data. The need for such technologies becomes more acute when considering their potential value in critical, media-rich government applications such as Seized Media Analysis, Social Media Forensics, and Foreign Media Monitoring. A fundamental challenge in developing technologies in these application areas is that they are typically in low-resource data domains. Low-resource domains are ones where the lack of ground-truth labels and statistical support prevent the direct application of traditional machine learning approaches. To help bridge this capability gap, the Joint Audio and Visual Mining of Uncooperatively Collected Video ICE Line Program (2236-1301) is developing new technologies for better content-based search, summarization, and browsing of large collections of unstructured, uncooperatively collected multimedia. In particular, this effort seeks to improve capabilities in video understanding by jointly exploiting time aligned audio, visual, and text information, an approach which has been underutilized in both the academic and commercial communities. Exploiting subtle connections between and across multiple modalities in low-resource multimedia data helps enable deeper video understanding, and in some cases provides new capability where it previously didn't exist. This report outlines work done in Fiscal Year 2014 (FY14) by the cross-divisional, interdisciplinary team tasked to meet these objectives. In the following sections, we highlight technologies developed in FY14 to support efficient Query-by-Example, Attribute, Keyword Search and Cross-Media Exploration and Summarization. Additionally, we preview work proposed for Fiscal Year 2015 as well as summarize our external sponsor interactions and publications/presentations.
READ LESS

Summary

The rate at which video is being created and gathered is rapidly accelerating as access to means of production and distribution expand. This rate of increase, however, is greatly outpacing the development of content-based tools to help users sift through this unstructured, multimedia data. The need for such technologies becomes...

READ MORE

Showing Results

1-6 of 6