MIT Lincoln Laboratory activates 1500-processor interactive parallel computing system

LLGrid parallel computing cluster racksLLGrid parallel computing cluster racks

MIT Lincoln Laboratory has established its next-generation Lincoln Laboratory Grid (LLGrid) interactive, on-demand parallel computing system. The LLGrid computing capability, introduced in 2003 and rolled out for full Laboratory use in 2006, uses a large computing cluster made available to the Laboratory by the Department of Defense (DoD) High Performance Computing (HPC) Modernization Office. The LLGrid system has enabled Lincoln Laboratory researchers to augment the processing power of desktop systems with high performance computational cluster nodes to interactively process larger sets of sensor data, create higher-fidelity simulations, and develop entirely new algorithms.

Designed to provide on-demand utility computational capability throughout the Laboratory, LLGrid is composed of the following node types:

  • Compute nodes that perform actual computation
  • Service nodes that support the following services:
    • Network file server that stores LLGrid user accounts, software code, and data
    • Resource manager that manages user jobs and compute nodes
    • Configuration server that maintains the software on the cluster
    • In totality, the LLGrid system now contains 1,500 processors and a petabyte of disk storage, and it supports 200+ users at Lincoln Laboratory. An integral component of the Laboratory’s computing infrastructure, LLGrid is used to conduct large simulations, analyze large datasets, and prototype complex processing algorithms.

LLGrid supports numerous programming languages and software libraries, including C, C++, Fortran, Java, MPI, PVL, and VSIPL; however, approximately 85% of Laboratory users run parallel MATLAB(R) codes using the Lincoln Laboratory-developed pMatlab library (http://www.ll.mit.edu/pMatlab) or The MathWorks-developed MATLAB Distributed Computing Toolbox (DCT).

The flagship LLGrid system is the TX-2500, which consists of over 400 computational servers, each with an InfiniBand network interface and a 6-disk RAID storage system, which provides 1.5 terabytes of local storage on each node. In aggregate, this makes ~0.8 petabytes of local high-bandwidth storage available for sensor data. It also provides a unique experimental platform for testing next-generation parallel file system technology. Using the Lincoln Laboratory pMatlabXVM (eXtreme Virtual Memory) software, the entire storage can be treated as a single large global array of data enabling datasets as large as 50,000 x 50,000 x 50,000 grid elements to be processed.

The acceptance test mandated by the DoD’s High Performance Computing Modernization Program was conducted using the HPC Challenge benchmark suite. This benchmark suite is designed to stress a wide variety of parallel components: processor, memory, network bandwidth, and network latency. As part of the acceptance test, Lincoln Laboratory ran 170 variations using different processors and memory sizes. This baseline performance data is now publicly available at the HPC Challenge website (http://www.HPCchallenge.org).

MATLAB is a registered trademark of The MathWorks. Reference to commercial products, trade names, trademarks, or manufacturers does not constitute or imply endorsement.

Posted April 2008

top of page