Publications

Refine Results

(Filters Applied) Clear All

LLGrid: supercomputer for sensor processing

Summary

MIT Lincoln Laboratory is a federally funded research and development center that applies advanced technology to problems of national interest. Research and development activities focus on long-term technology development as well as rapid system prototyping and demonstration. A key part of this mission is to develop and deploy advanced sensor systems. Developing the algorithms for these systems requires interactive access to large scale computing and data storage. Deploying these systems requires that the computing and storage capabilities are transportable and energy efficient. The LLGrid system of supercomputers allows hundreds of researchers simultaneous interactive access to large amounts of processing and storage for development and testing of their sensor processing algorithms. The requirements of the LLGrid user base are as diverse as the sensors they are developing: sonar, radar, infrared, optical, hyperspectral, video, bio and cyber. However, there are two common elements: delivering large amounts of data interactively to many processors and high level user interfaces that require minimal user training. The LLGrid software stack provides these capabilities on dozens of LLGrid computing clusters across Lincoln Laboratory. LLGrid systems range from very small (a few nodes) to very large (40+ racks).
READ LESS

Summary

MIT Lincoln Laboratory is a federally funded research and development center that applies advanced technology to problems of national interest. Research and development activities focus on long-term technology development as well as rapid system prototyping and demonstration. A key part of this mission is to develop and deploy advanced sensor...

READ MORE

Driving big data with big compute

Summary

Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the big data and big compute communities together is an active area of research. The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds. LLGrid MapReduce allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster. D4M (Dynamic Distributed Dimensional Data Model) provided a high level distributed arrays interface to the Apache Accumulo database. The accessibility of these technologies is assessed by measuring the effort to use these tools and is typically a few lines of code. The performance is assessed by measuring the insert rate into the Accumulo database. Using these tools a database insert rate of 4M inserts/second has been achieved on an 8 node cluster.
READ LESS

Summary

Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the...

READ MORE

HPC-VMs: virtual machines in high performance computing systems

Published in:
HPEC 2012: IEEE Conf. on High Performance Extreme Computing, 10-12 September 2012.

Summary

The concept of virtual machines dates back to the 1960s. Both IBM and MIT developed operating system features that enabled user and peripheral time sharing, the underpinnings of which were early virtual machines. Modern virtual machines present a translation layer of system devices between a guest operating system and the host operating system executing on a computer system, while isolating each of the guest operating systems from each other. In the past several years, enterprise computing has embraced virtual machines to deploy a wide variety of capabilities from business management systems to email server farms. Those who have adopted virtual deployment environments have capitalized on a variety of advantages including server consolidation, service migration, and higher service reliability. But they have also ended up with some challenges including a sacrifice in performance and more complex system management. Some of these advantages and challenges also apply to HPC in virtualized environments. In this paper, we analyze the effectiveness of using virtual machines in a high performance computing (HPC) environment. We propose adding some virtual machine capability to already robust HPC environments for specific scenarios where the productivity gained outweighs the performance lost for using virtual machines. Finally, we discuss an implementation of augmenting virtual machines into the software stack of a HPC cluster, and we analyze the affect on job launch time of this implementation.
READ LESS

Summary

The concept of virtual machines dates back to the 1960s. Both IBM and MIT developed operating system features that enabled user and peripheral time sharing, the underpinnings of which were early virtual machines. Modern virtual machines present a translation layer of system devices between a guest operating system and the...

READ MORE

Scalable cryptographic authentication for high performance computing

Summary

High performance computing (HPC) uses supercomputers and computing clusters to solve large computational problems. Frequently HPC resources are shared systems and access to restricted data sets or resources must be authenticated. These authentication needs can take multiple forms, both internal and external to the HPC cluster. A computational stack that uses web services among nodes in the HPC may need to perform authentication between nodes of the same job or a job may need to reach out to data sources outside the HPC. Traditional authentication mechanisms such as passwords or digital certificates encounter issues with the distributed and potentially disconnected nature of HPC systems. Distributing and storing plain-text passwords or cryptographic keys among nodes in a HPC system without special protection is a poor security practice. Systems that reach back to the user's terminal for access to the authenticator are possible, but only in fully interactive supercomputing where connectivity to the user's terminal can be guaranteed. Point solutions can be enabled for these use cases, such as software-based role or self-signed certificates, however they require significant expertise in digital certificates to configure. A more general solution is called for that is both secure and easy to use. This paper presents an overview of a solution implemented on the interactive, on-demand LLGrid computing system at MIT Lincoln Laboratory and its use to solve one such authentication problem.
READ LESS

Summary

High performance computing (HPC) uses supercomputers and computing clusters to solve large computational problems. Frequently HPC resources are shared systems and access to restricted data sets or resources must be authenticated. These authentication needs can take multiple forms, both internal and external to the HPC cluster. A computational stack that...

READ MORE

Scalable cryptographic authentication for high performance computing

Summary

High performance computing (HPC) uses supercomputers and computing clusters to solve large computational problems. Frequently HPC resources are shared systems and access to restricted data sets or resources must be authenticated. These authentication needs can take multiple forms, both internal and external to the HPC cluster. A computational stack that uses web services among nodes in the HPC may need to perform authentication between nodes of the same job or a job may need to reach out to data sources outside the HPC. Traditional authentication mechanisms such as passwords or digital certificates encounter issues with the distributed and potentially disconnected nature of HPC systems. Distributing and storing plain-text passwords or cryptographic keys among nodes in a HPC system without special protection is a poor security practice. Systems that reach back to the user's terminal for access to the authenticator are possible, but only in fully interactive supercomputing where connectivity to the user's terminal can be guaranteed. Point solutions can be enabled for these use cases, such as software-based role or self-signed certificates, however they require significant expertise in digital certificates to configure. A more general solution is called for that is both secure and easy to use. This paper presents an overview of a solution implemented on the interactive, on-demand LLGrid computing system at MIT Lincoln Laboratory and its use to solve one such authentication problem.
READ LESS

Summary

High performance computing (HPC) uses supercomputers and computing clusters to solve large computational problems. Frequently HPC resources are shared systems and access to restricted data sets or resources must be authenticated. These authentication needs can take multiple forms, both internal and external to the HPC cluster. A computational stack that...

READ MORE

Driving big data with big compute

Summary

Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the big data and big compute communities together is an active area of research. The LLGrid team has developed and deployed a number of technologies that aim to provide the best of both worlds. LLGrid MapReduce allows the map/reduce parallel programming model to be used quickly and efficiently in any language on any compute cluster. D4M (Dynamic Distributed Dimensional Data Model) provided a high level distributed arrays interface to the Apache Accumulo database. The accessibility of these technologies is assessed by measuring the effort to use these tools and is typically a few lines of code. The performance is assessed by measuring the insert rate into the Accumulo database. Using these tools a database insert rate of 4M inserts/second has been achieved on an 8 node cluster.
READ LESS

Summary

Big Data (as embodied by Hadoop clusters) and Big Compute (as embodied by MPI clusters) provide unique capabilities for storing and processing large volumes of data. Hadoop clusters make distributed computing readily accessible to the Java community and MPI clusters provide high parallel efficiency for compute intensive workloads. Bringing the...

READ MORE

Dynamic Distributed Dimensional Data Model (D4M) database and computation system

Summary

A crucial element of large web companies is their ability to collect and analyze massive amounts of data. Tuple store databases are a key enabling technology employed by many of these companies (e.g., Google Big Table and Amazon Dynamo). Tuple stores are highly scalable and run on commodity clusters, but lack interfaces to support efficient development of mathematically based analytics. D4M (Dynamic Distributed Dimensional Data Model) has been developed to provide a mathematically rich interface to tuple stores (and structured query language "SQL" databases). D4M allows linear algebra to be readily applied to databases. Using D4M, it is possible to create composable analytics with significantly less effort than using traditional approaches. This work describes the D4M technology and its application and performance.
READ LESS

Summary

A crucial element of large web companies is their ability to collect and analyze massive amounts of data. Tuple store databases are a key enabling technology employed by many of these companies (e.g., Google Big Table and Amazon Dynamo). Tuple stores are highly scalable and run on commodity clusters, but...

READ MORE

Creating a cyber moving target for critical infrastructure applications

Published in:
5th IFIP Int. Conf. on Critical Infrastructure Protection, ICCIP 2011, 19-21 March 2011.

Summary

Despite the significant amount of effort that often goes into securing critical infrastructure assets, many systems remain vulnerable to advanced, targeted cyber attacks. This paper describes the design and implementation of the Trusted Dynamic Logical Heterogeneity System (TALENT), a framework for live-migrating critical infrastructure applications across heterogeneous platforms. TALENT permits a running critical application to change its hardware platform and operating system, thus providing cyber survivability through platform diversity. TALENT uses containers (operating-system-level virtualization) and a portable checkpoint compiler to create a virtual execution environment and to migrate a running application across different platforms while preserving the state of the application (execution state, open files and network connections). TALENT is designed to support general applications written in the C programming language. By changing the platform on-the-fly, TALENT creates a cyber moving target and significantly raises the bar for a successful attack against a critical application. Experiments demonstrate that a complete migration can be completed within about one second.
READ LESS

Summary

Despite the significant amount of effort that often goes into securing critical infrastructure assets, many systems remain vulnerable to advanced, targeted cyber attacks. This paper describes the design and implementation of the Trusted Dynamic Logical Heterogeneity System (TALENT), a framework for live-migrating critical infrastructure applications across heterogeneous platforms. TALENT permits...

READ MORE

TALENT: dynamic platform heterogeneity for cyber survivability of mission critical applications

Published in:
Proc. Secure and Resilient Cyber Architecture Conf., SRCA, 29 October 2010.

Summary

Despite the significant amount of effort that often goes into securing mission critical systems, many remain vulnerable to advanced, targeted cyber attacks. In this work, we design and implement TALENT (Trusted dynAmic Logical hEterogeNeity sysTem), a framework to live-migrate mission critical applications across heterogeneous platforms. TALENT enables us to change the hardware and operating system on top of which a sensitive application is running, thus providing cyber survivability through platform diversity. Using containers (a.k.a. operating system-level virtualization) and a portable checkpoint compiler, TALENT creates a virtual execution environment and migrates a running application across different platforms while preserving the state of the application. The state, here, refers to the execution state of the process as well as its open files and sockets. TALENT is designed to support a general C application. By changing the platform on-the-fly, TALENT creates a moving target against cyber attacks and significantly raises the bar for a successful attack against a critical application. Our measurements show that a full migration can be completed in about one second.
READ LESS

Summary

Despite the significant amount of effort that often goes into securing mission critical systems, many remain vulnerable to advanced, targeted cyber attacks. In this work, we design and implement TALENT (Trusted dynAmic Logical hEterogeNeity sysTem), a framework to live-migrate mission critical applications across heterogeneous platforms. TALENT enables us to change...

READ MORE

Benchmarking the MIT LL HPCMP DHPI system

Published in:
Annual High Performance Computer Modernization Program Users Group Conf., 19-21 June 2007.

Summary

The Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) High Performance Computing Modernization Program (HPCMP) Dedicated High Performance Computing Project Investment (DHPI) system was designed to address interactive algorithm development for Department of Defense (DoD) sensor processing systems. The results of the system acceptance test provide a clear quantitative picture of the capabilities of the system. The system acceptance test for MIT LL HPCMP DHPI hardware involved an array of benchmarks that exercised each of the components of the memory hierarchy, the scheduler, and the disk arrays. These benchmarks isolated the components to verify the functionality and performance of the system, and several system issues were discovered and rectified by using these benchmarks. The memory hierarchy was evaluated using the HPC Challenge benchmark suite, which is comprised of the following benchmarks: High Performance Linpack (HPL, also known as Top 500), Fast Fourier Transform (FFT), STREAM, RandomAccess, and Effective Bandwidth. The compute nodes' Random Array of Independent Disks (RAID) arrays were evaluated with the Iozone benchmark. Finally, the scheduler and the reliability of the entire system were tested using both the HPC Challenge suite and the Iozone benchmark. For example executing the HPC Challenge benchmark suite on 416 processors, the system was able to achieve 1.42 TFlops (HPL), 34.7 GFlops (FFT), 1.24 TBytes/sec (STREAM Triad), and 0.16 GUPS (RandomAccess). This paper describes the components of the MIT Lincoln Laboratory HPCMP DHPI system, including its memory hierarchy. We present the HPC Challenge benchmark suite and Iozone benchmark and describe how each of the component benchmarks stress various components of the TX-2500 system. The results of the benchmarks are discussed, and the implications they have on the performance of the system. We conclude with a presentation of the findings.
READ LESS

Summary

The Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) High Performance Computing Modernization Program (HPCMP) Dedicated High Performance Computing Project Investment (DHPI) system was designed to address interactive algorithm development for Department of Defense (DoD) sensor processing systems. The results of the system acceptance test provide a clear quantitative picture...

READ MORE