Publications

Refine Results

(Filters Applied) Clear All

Designing secure and resilient embedded avionics systems

Summary

With an increased reliance on Unmanned Aerial Systems (UAS) as mission assets and the dependency of UAS on cyber resources, cyber security of UAS must be improved by adopting sound security principles and relevant technologies from the computing community. On the other hand, the traditional avionics community, being aware of the importance of cyber security, is looking at new architecture and designs that can accommodate both the safety oriented principles as well as the cyber security principles and techniques. The Air Force Research Laboratories (AFRL) Information Directorate has created the Agile Resilient Embedded System (ARES) program to investigate mitigations that offer a method to "design-in" cyber protections while maintaining mission assurance. ARES specifically seeks to 'build security in' for unmanned aerial vehicles incorporating security and hardening best practices, while inserting resilience as a system attribute to maintain a level of system operation despite successful exploitation of residual vulnerabilities.
READ LESS

Summary

With an increased reliance on Unmanned Aerial Systems (UAS) as mission assets and the dependency of UAS on cyber resources, cyber security of UAS must be improved by adopting sound security principles and relevant technologies from the computing community. On the other hand, the traditional avionics community, being aware of...

READ MORE

Hyperscaling internet graph analysis with D4M on the MIT SuperCloud

Summary

Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of novel computer network traffic analytics requires: high level programming environments, massive amount of packet capture (PCAP) data, and diverse data products for "at scale" algorithm pipeline development. D4M (Dynamic Distributed Dimensional Data Model) combines the power of sparse linear algebra, associative arrays, parallel processing, and distributed databases (such as SciDB and Apache Accumulo) to provide a scalable data and computation system that addresses the big data problems associated with network analytics development. Combining D4M with the MIT SuperCloud manycore processors and parallel storage system enables network analysts to interactively process massive amounts of data in minutes. To demonstrate these capabilities, we have implemented a representative analytics pipeline in D4M and benchmarked it on 96 hours of Gigabit PCAP data with MIT SuperCloud. The entire pipeline from uncompressing the raw files to database ingest was implemented in 135 lines of D4M code and achieved speedups of over 20,000.
READ LESS

Summary

Detecting anomalous behavior in network traffic is a major challenge due to the volume and velocity of network traffic. For example, a 10 Gigabit Ethernet connection can generate over 50 MB/s of packet headers. For global network providers, this challenge can be amplified by many orders of magnitude. Development of...

READ MORE

GraphChallenge.org: raising the bar on graph analytic performance

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems. GraphChallenge.org provides a wide range of preparsed graph data sets, graph generators, mathematically defined graph algorithms, example serial implementations in a variety of languages, and specific metrics for measuring performance. Graph Challenge 2017 received 22 submissions by 111 authors from 36 organizations. The submissions highlighted graph analytic innovations in hardware, software, algorithms, systems, and visualization. These submissions produced many comparable performance measurements that can be used for assessing the current state of the art of the field. There were numerous submissions that implemented the triangle counting challenge and resulted in over 350 distinct measurements. Analysis of these submissions show that their execution time is a strong function of the number of edges in the graph, Ne, and is typically proportional to N4=3 e for large values of Ne. Combining the model fits of the submissions presents a picture of the current state of the art of graph analysis, which is typically 108 edges processed per second for graphs with 108 edges. These results are 30 times faster than serial implementations commonly used by many graph analysts and underscore the importance of making these performance benefits available to the broader community. Graph Challenge provides a clear picture of current graph analysis systems and underscores the need for new innovations to achieve high performance on very large graphs.
READ LESS

Summary

The rise of graph analytic systems has created a need for new ways to measure and compare the capabilities of graph processing systems. The MIT/Amazon/IEEE Graph Challenge has been developed to provide a well-defined community venue for stimulating research and highlighting innovations in graph analysis software, hardware, algorithms, and systems...

READ MORE

Interactive supercomputing on 40,000 cores for machine learning and data analysis

Summary

Interactive massively parallel computations are critical for machine learning and data analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop unique interactive supercomputing capabilities. Scaling interactive machine learning frameworks, such as TensorFlow, and data analysis environments, such as MATLAB/Octave, to tens of thousands of cores presents many technical challenges – in particular, rapidly dispatching many tasks through a scheduler, such as Slurm, and starting many instances of applications with thousands of dependencies. Careful tuning of launches and prepositioning of applications overcome these challenges and allow the launching of thousands of tasks in seconds on a 40,000-core supercomputer. Specifically, this work demonstrates launching 32,000 TensorFlow processes in 4 seconds and launching 262,000 Octave processes in 40 seconds. These capabilities allow researchers to rapidly explore novel machine learning architecture and data analysis algorithms.
READ LESS

Summary

Interactive massively parallel computations are critical for machine learning and data analysis. These computations are a staple of the MIT Lincoln Laboratory Supercomputing Center (LLSC) and has required the LLSC to develop unique interactive supercomputing capabilities. Scaling interactive machine learning frameworks, such as TensorFlow, and data analysis environments, such as...

READ MORE

Functionality and security co-design environment for embedded systems

Published in:
IEEE High Performance Extreme Computing Conf., HPEC, 25-27 September 2018.

Summary

For decades, embedded systems, ranging from intelligence, surveillance, and reconnaissance (ISR) sensors to electronic warfare and electronic signal intelligence systems, have been an integral part of U.S. Department of Defense (DoD) mission systems. These embedded systems are increasingly the targets of deliberate and sophisticated attacks. Developers thus need to focus equally on functionality and security in both hardware and software development. For critical missions, these systems must be entrusted to perform their intended functions, prevent attacks, and even operate with resilience under attacks. The processor in a critical system must thus provide not only a root of trust, but also a foundation to monitor mission functions, detect anomalies, and perform recovery. We have developed a Lincoln Asymmetric Multicore Processing (LAMP) architecture, which mitigates adversarial cyber effects with separation and cryptography and provides a foundation to build a resilient embedded system. We will describe a design environment that we have created to enable the co-design of functionality and security for mission assurance.
READ LESS

Summary

For decades, embedded systems, ranging from intelligence, surveillance, and reconnaissance (ISR) sensors to electronic warfare and electronic signal intelligence systems, have been an integral part of U.S. Department of Defense (DoD) mission systems. These embedded systems are increasingly the targets of deliberate and sophisticated attacks. Developers thus need to focus...

READ MORE

High performance computing techniques with power systems simulations

Published in:
IEEE High Performance Extreme Computing Conf., HPEC, 25-27 September 2018.
R&D group:

Summary

Small electrical networks (i.e., microgrids) and machine models (synchronous generators, induction motors) can be simulated fairly easily, on sequential processes. However, running a large simulation on a single process becomes infeasible because of complexity and timing issues. Scalability becomes an increasingly important issue for larger simulations, and the platform for running such large simulations, like the MIT Supercloud, becomes more important. The distributed computing network used to simulate an electrical network as the physical system presents new challenges, however. Different simulation models, different time steps, and different computation times for each process in the distributed computing network introduce new challenges not present with typical problems that are addressed with high performance computing techniques. A distributed computing network is established for some example electrical networks, and then adjustments are made in the parallel simulation set-up to alleviate the new kinds of challenges that come with modeling and simulating a physical system as diverse as an electrical network. Also, methods are shown to simulate the same electrical network in hundreds of milliseconds, as opposed to several seconds--a dramatic speedup once the simulation is parallelized.
READ LESS

Summary

Small electrical networks (i.e., microgrids) and machine models (synchronous generators, induction motors) can be simulated fairly easily, on sequential processes. However, running a large simulation on a single process becomes infeasible because of complexity and timing issues. Scalability becomes an increasingly important issue for larger simulations, and the platform for...

READ MORE

Large-scale Bayesian kinship analysis

Summary

Kinship prediction in forensics is limited to first degree relatives due to the small number of short tandem repeat loci characterized. The Genetic Chain Rule for Probabilistic Kinship Estimation can leverage large panels of single nucleotide polymorphisms (SNPs) or sets of sequence linked SNPs, called haploblocks, to estimate more distant relationships between individuals. This method uses allele frequencies and Markov Chain Monte Carlo methods to determine kinship probabilities. Allele frequencies are a crucial input to this method. Since these frequencies are estimated from finite populations and many alleles are rare, a Bayesian extension to the algorithm has been developed to determine credible intervals for kinship estimates as a function of the certainty in allele frequency estimates. Generation of sufficiently large samples to accurately estimate credible intervals can take significant computational resources. In this paper, we leverage hundreds of compute cores to generate large numbers of Dirichlet random samples for Bayesian kinship prediction. We show that it is possible to generate 2,097,152 random samples on 32,768 cores at a rate of 29.68 samples per second. The ability to generate extremely large number of samples enables the computation of more statistically significant results from a Bayesian approach to kinship analysis.
READ LESS

Summary

Kinship prediction in forensics is limited to first degree relatives due to the small number of short tandem repeat loci characterized. The Genetic Chain Rule for Probabilistic Kinship Estimation can leverage large panels of single nucleotide polymorphisms (SNPs) or sets of sequence linked SNPs, called haploblocks, to estimate more distant...

READ MORE

A parallel implementation of FANO using OpenMP and MPI

Published in:
IEEE High Performance Extreme Computing Conf., HPEC, 25-27 September 2018.

Summary

We present a parallel implementation of the Fast Accurate NURBS Optimization (FANO) program using OpenMP and MPI. The software is used for designing imaging freeform optical systems comprised of NURBS surfaces. An important step in the design process is the optimization of the shape and position of the optical surfaces within the optical system. FANO uses the Levenberg-Marquardt (LM) algorithm for minimization of the merit function. The parallelization of the code is achieved without modifying readily available commercial or open source implementations of the LM algorithm. Instead, MPI instructions are being used to distribute the computation of the Jacobian over multiple nodes, each of which performs the computationally intensive task of raytracing. The results from the raytracing are collected on the master and used for calculating the values of the variable parameters for the next iteration. Speed increases of ~100x and more are possible when running on the cluster of the MIT Lincoln Laboratory Super Computing Center (LLSC).
READ LESS

Summary

We present a parallel implementation of the Fast Accurate NURBS Optimization (FANO) program using OpenMP and MPI. The software is used for designing imaging freeform optical systems comprised of NURBS surfaces. An important step in the design process is the optimization of the shape and position of the optical surfaces...

READ MORE

TabulaROSA: tabular operating system architecture for massively parallel heterogeneous compute engines

Summary

The rise in computing hardware choices is driving a reevaluation of operating systems. The traditional role of an operating system controlling the execution of its own hardware is evolving toward a model whereby the controlling processor is distinct from the compute engines that are performing most of the computations. In this context, an operating system can be viewed as software that brokers and tracks the resources of the compute engines and is akin to a database management system. To explore the idea of using a database in an operating system role, this work defines key operating system functions in terms of rigorous mathematical semantics (associative array algebra) that are directly translatable into database operations. These operations possess a number of mathematical properties that are ideal for parallel operating systems by guaranteeing correctness over a wide range of parallel operations. The resulting operating system equations provide a mathematical specification for a Tabular Operating System Architecture (TabulaROSA) that can be implemented on any platform. Simulations of forking in TabularROSA are performed using an associative array implementation and compared to Linux on a 32,000+ core supercomputer. Using over 262,000 forkers managing over 68,000,000,000 processes, the simulations show that TabulaROSA has the potential to perform operating system functions on a massively parallel scale. The TabulaROSA simulations show 20x higher performance as compared to Linux while managing 2000x more processes in fully searchable tables.
READ LESS

Summary

The rise in computing hardware choices is driving a reevaluation of operating systems. The traditional role of an operating system controlling the execution of its own hardware is evolving toward a model whereby the controlling processor is distinct from the compute engines that are performing most of the computations. In...

READ MORE

Measuring the impact of Spectre and Meltdown

Summary

The Spectre and Meltdown flaws in modern microprocessors represent a new class of attacks that have been difficult to mitigate. The mitigations that have been proposed have known performance impacts. The reported magnitude of these impacts varies depending on the industry sector and expected workload characteristics. In this paper, we measure the performance impact on several workloads relevant to HPC systems. We show that the impact can be significant on both synthetic and realistic workloads. We also show that the performance penalties are difficult to avoid even in dedicated systems where security is a lesser concern.
READ LESS

Summary

The Spectre and Meltdown flaws in modern microprocessors represent a new class of attacks that have been difficult to mitigate. The mitigations that have been proposed have known performance impacts. The reported magnitude of these impacts varies depending on the industry sector and expected workload characteristics. In this paper, we...

READ MORE