Publications

Refine Results

(Filters Applied) Clear All

Next-generation embedded processors: an update

Published in:
GOMACTech Conf., 12-15 March 2018.

Summary

For mission assurance, Department of Defense (DoD) embedded systems should be designed to mitigate various aspects of cyber risks, while maintaining performance (size, weight, power, cost, and schedule). This paper reports our latest research effort in the development of a next-generation System-on-Chip (SoC) for DoD applications, which we first presented in GOMACTech 2014. This paper focuses on our ongoing work to enhance the mission assurance of its programmable processor. We will explain our updated processor architecture, justify the use of resources, and assess the processor's suitability for mission assurance.
READ LESS

Summary

For mission assurance, Department of Defense (DoD) embedded systems should be designed to mitigate various aspects of cyber risks, while maintaining performance (size, weight, power, cost, and schedule). This paper reports our latest research effort in the development of a next-generation System-on-Chip (SoC) for DoD applications, which we first presented...

READ MORE

Classifier performance estimation with unbalanced, partially labeled data

Published in:
Proc. Machine Learning Research, Vol. 88, 2018, pp. 4-16.

Summary

Class imbalance and lack of ground truth are two significant problems in modern machine learning research. These problems are especially pressing in operational contexts where the total number of data points is extremely large and the cost of obtaining labels is very high. In the face of these issues, accurate estimation of the performance of a detection or classification system is crucial to inform decisions based on the observations. This paper presents a framework for estimating performance of a binary classifier in such a context. We focus on the scenario where each set of measurements has been reduced to a score, and the operator only investigates data when the score exceeds a threshold. The operator is blind to the number of missed detections, so performance estimation targets two quantities: recall and the derivative of precision with respect to recall. Measuring with respect to error in these two metrics, simulations in this context demonstrate that labeling outliers not only outperforms random labeling, but often matches performance of an adaptive method that attempts to choose the optimal data for labeling. Application to real anomaly detection data confirms the utility of the approach, and suggests direction for future work.
READ LESS

Summary

Class imbalance and lack of ground truth are two significant problems in modern machine learning research. These problems are especially pressing in operational contexts where the total number of data points is extremely large and the cost of obtaining labels is very high. In the face of these issues, accurate...

READ MORE

Improving security at the system-call boundary in a type-safe operating system

Published in:
Thesis (M.E.)--Massachusetts Institute of Technology, 2018.

Summary

Historically, most approaches to operating sytems security aim to either protect the kernel (e.g., the MMU) or protect user applications (e.g., W exclusive or X). However, little study has been done into protecting the boundary between these layers. We describe a vulnerability in Tock, a type-safe operating system, at the system-call boundary. We then introduce a technique for providing memory safety at the boundary between userland and the kernel in Tock. We demonstrate that this technique works to prevent against the aforementioned vulnerability and a class of similar vulnerabilities, and we propose how it might be used to protect against simliar vulnerabilities in other operating systems.
READ LESS

Summary

Historically, most approaches to operating sytems security aim to either protect the kernel (e.g., the MMU) or protect user applications (e.g., W exclusive or X). However, little study has been done into protecting the boundary between these layers. We describe a vulnerability in Tock, a type-safe operating system, at the...

READ MORE

Detecting pathogen exposure during the non-symptomatic incubation period using physiological data

Summary

Early pathogen exposure detection allows better patient care and faster implementation of public health measures (patient isolation, contact tracing). Existing exposure detection most frequently relies on overt clinical symptoms, namely fever, during the infectious prodromal period. We have developed a robust machine learning based method to better detect asymptomatic states during the incubation period using subtle, sub-clinical physiological markers. Starting with highresolution physiological waveform data from non-human primate studies of viral (Ebola, Marburg, Lassa, and Nipah viruses) and bacterial (Y. pestis) exposure, we processed the data to reduce short-term variability and normalize diurnal variations, then provided these to a supervised random forest classification algorithm and post-classifier declaration logic step to reduce false alarms. In most subjects detection is achieved well before the onset of fever; subject cross-validation across exposure studies (varying viruses, exposure routes, animal species, and target dose) lead to 51h mean early detection (at 0.93 area under the receiver-operating characteristic curve [AUCROC]). Evaluating the algorithm against entirely independent datasets for Lassa, Nipah, and Y. pestis exposures un-used in algorithm training and development yields a mean 51h early warning time (at AUCROC=0.95). We discuss which physiological indicators are most informative for early detection and options for extending this capability to limited datasets such as those available from wearable, non-invasive, ECG-based sensors.
READ LESS

Summary

Early pathogen exposure detection allows better patient care and faster implementation of public health measures (patient isolation, contact tracing). Existing exposure detection most frequently relies on overt clinical symptoms, namely fever, during the infectious prodromal period. We have developed a robust machine learning based method to better detect asymptomatic states...

READ MORE

Designing agility and resilience into embedded systems

Summary

Cyber-Physical Systems (CPS) such as Unmanned Aerial Systems (UAS) sense and actuate their environment in pursuit of a mission. The attack surface of these remotely located, sensing and communicating devices is both large, and exposed to adversarial actors, making mission assurance a challenging problem. While best-practice security policies should be followed, they are rarely enough to guarantee mission success as not all components in the system may be trusted and the properties of the environment (e.g., the RF environment) may be under the control of the attacker. CPS must thus be built with a high degree of resilience to mitigate threats that security cannot alleviate. In this paper, we describe the Agile and Resilient Embedded Systems (ARES) methodology and metric set. The ARES methodology pursues cyber security and resilience (CSR) as high level system properties to be developed in the context of the mission. An analytic process guides system developers in defining mission objectives, examining principal issues, applying CSR technologies, and understanding their interactions.
READ LESS

Summary

Cyber-Physical Systems (CPS) such as Unmanned Aerial Systems (UAS) sense and actuate their environment in pursuit of a mission. The attack surface of these remotely located, sensing and communicating devices is both large, and exposed to adversarial actors, making mission assurance a challenging problem. While best-practice security policies should be...

READ MORE

Towards a universal CDAR device: a high performance adapter-based inline media encryptor

Summary

As the rate at which digital data is generated continues to grow, so does the need to ensure that data can be stored securely. The use of an NSA-certified Inline Media Encryptor (IME) is often required to protect classified data, as its security properties can be fully analyzed and certified with minimal coupling to the environment in which it is embedded. However, these devices are historically purpose-built and must often be redesigned and recertified for each target system. This tedious and costly (but necessary) process limits the ability for an information system architect to leverage advances made in storage technology. Our universal Classified Data At Rest (CDAR) architecture represents a modular approach to reduce this burden and maximize interface flexibility. The core module is designed around NVMe, a high-performance storage interface built directly on PCIe. Interfacing with non-NVMe interfaces such as SATA is achieved with adapters which are outside the certification boundary and therefore can be less costly and leverage rapidly evolving commercial technology. This work includes an analysis for both the functionality and security of this architecture. A prototype was developed with peak throughput of 23.9 Gb/s at a power consumption of 8.5W, making it suitable for a wide range of storage applications.
READ LESS

Summary

As the rate at which digital data is generated continues to grow, so does the need to ensure that data can be stored securely. The use of an NSA-certified Inline Media Encryptor (IME) is often required to protect classified data, as its security properties can be fully analyzed and certified...

READ MORE

Dynamically correlating network terrain to organizational missions

Published in:
Proc. NATO IST-153/RWS-21 Workshop on Cyber Resilience, 23-25 October 2017.

Summary

A precondition for assessing mission resilience in a cyber context is identifying which cyber assets support the mission. However, determining the asset dependencies of a mission is typically a manual process that is time consuming, labor intensive and error-prone. Automating the process of mapping between network assets and organizational missions is highly desirable but technically challenging because it is difficult to find an appropriate proxy within available cyber data for an asset's mission utilization. In this paper we discuss strategies to automate the processes of both breaking an organization into its constituent mission areas, and mapping those mission areas onto network assets, using a data-driven approach. We have implemented these strategies to mine network data at MIT Lincoln Laboratory, and provide examples. We also discuss examples of how such mission mapping tools can help an analyst to identify patterns and develop contextual insight that would otherwise have been obscure.
READ LESS

Summary

A precondition for assessing mission resilience in a cyber context is identifying which cyber assets support the mission. However, determining the asset dependencies of a mission is typically a manual process that is time consuming, labor intensive and error-prone. Automating the process of mapping between network assets and organizational missions...

READ MORE

Bringing physical construction and real-world data collection into a massively open online course (MOOC)

Summary

This Work-In-Progress paper details the process and lessons learned when converting a hands-on engineering minicourse to a scalable, self-paced Massively Open Online Course (MOOC). Online courseware has been part of academic and industry training and learning for decades. Learning activities in online courses strive to mimic in-person delivery by including lectures, homework assignments, software exercises and exams. While these instructional activities provide "theory and practice" for many disciplines, engineering courses often require hands-on activities with physical tools, devices and equipment. To accommodate the need for this type of learning, MIT Lincoln Laboratory's "Build A Small Radar" (BSR) course was used to explore teaching and learning strategies that support the inclusion of physical construction and real world data collection in a MOOC. These tasks are encountered across a range of engineering disciplines and the methods illustrated here are easily generalized to the learning experiences in engineering and science disciplines.
READ LESS

Summary

This Work-In-Progress paper details the process and lessons learned when converting a hands-on engineering minicourse to a scalable, self-paced Massively Open Online Course (MOOC). Online courseware has been part of academic and industry training and learning for decades. Learning activities in online courses strive to mimic in-person delivery by including...

READ MORE

Streaming graph challenge: stochastic block partition

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive benchmarks and challenges have proven to be an effective means to advance state-of-the-art performance and foster community collaboration. This paper describes a graph partition challenge with a baseline partition algorithm of sub-quadratic complexity. The algorithm employs rigorous Bayesian inferential methods based on a statistical model that captures characteristics of the real-world graphs. This strong foundation enables the algorithm to address limitations of well-known graph partition approaches such as modularity maximization. This paper describes various aspects of the challenge including: (1) the data sets and streaming graph generator, (2) the baseline partition algorithm with pseudocode, (3) an argument for the correctness of parallelizing the Bayesian inference, (4) different parallel computation strategies such as node-based parallelism and matrix-based parallelism, (5) evaluation metrics for partition correctness and computational requirements, (6) preliminary timing of a Python-based demonstration code and the open source C++ code, and (7) considerations for partitioning the graph in streaming fashion. Data sets and source code for the algorithm as well as metrics, with detailed documentation are available at GraphChallenge.org.
READ LESS

Summary

An important objective for analyzing real-world graphs is to achieve scalable performance on large, streaming graphs. A challenging and relevant example is the graph partition problem. As a combinatorial problem, graph partition is NP-hard, but existing relaxation methods provide reasonable approximate solutions that can be scaled for large graphs. Competitive...

READ MORE

A cloud-based brain connectivity analysis tool

Summary

With advances in high throughput brain imaging at the cellular and sub-cellular level, there is growing demand for platforms that can support high performance, large-scale brain data processing and analysis. In this paper, we present a novel pipeline that combines Accumulo, D4M, geohashing, and parallel programming to manage large-scale neuron connectivity graphs in a cloud environment. Our brain connectivity graph is represented using vertices (fiber start/end nodes), edges (fiber tracks), and the 3D coordinates of the fiber tracks. For optimal performance, we take the hybrid approach of storing vertices and edges in Accumulo and saving the fiber track 3D coordinates in flat files. Accumulo database operations offer low latency on sparse queries while flat files offer high throughput for storing, querying, and analyzing bulk data. We evaluated our pipeline by using 250 gigabytes of mouse neuron connectivity data. Benchmarking experiments on retrieving vertices and edges from Accumulo demonstrate that we can achieve 1-2 orders of magnitude speedup in retrieval time when compared to the same operation from traditional flat files. The implementation of graph analytics such as Breadth First Search using Accumulo and D4M offers consistent good performance regardless of data size and density, thus is scalable to very large dataset. Indexing of neuron subvolumes is simple and logical with geohashing-based binary tree encoding. This hybrid data management backend is used to drive an interactive web-based 3D graphical user interface, where users can examine the 3D connectivity map in a Google Map-like viewer. Our pipeline is scalable and extensible to other data modalities.
READ LESS

Summary

With advances in high throughput brain imaging at the cellular and sub-cellular level, there is growing demand for platforms that can support high performance, large-scale brain data processing and analysis. In this paper, we present a novel pipeline that combines Accumulo, D4M, geohashing, and parallel programming to manage large-scale neuron...

READ MORE