Publications

Refine Results

(Filters Applied) Clear All

High-productivity software development with pMATLAB

Published in:
Comput. Sci. Eng., Vol. 11, No. 1, January/February 2009, pp. 75-79.

Summary

In this paper, we explore the ease of tackling a communication-intensive parallel computing task - namely, the 2D fast Fourier transform (FFT). We start with a simple serial Matlab code, explore in detail a ID parallel FFT, and illustrate how it can be extended to multidimensional FFTs.
READ LESS

Summary

In this paper, we explore the ease of tackling a communication-intensive parallel computing task - namely, the 2D fast Fourier transform (FFT). We start with a simple serial Matlab code, explore in detail a ID parallel FFT, and illustrate how it can be extended to multidimensional FFTs.

READ MORE

PVTOL: providing productivity, performance, and portability to DoD signal processing applications on multicore processors

Published in:
DoD HPCMP 2008, High Performance Computing Modernization Program Users Group Conf., 14 July 2008, pp. 327-333.

Summary

PVTOL provides an object-oriented C++ API that hides the complexity of multicore architectures within a PGAS programming model, improving programmer productivity. Tasks and conduits enable data flow patterns such as pipelining and round-robining. Hierarchical maps concisely describe how to allocate hierarchical arrays across processor and memory hierarchies and provide a simple API for moving data across these hierarchies. Functors encapsulate computational kernels; new functors can be easily developed using the PVTOL API and can be fused for more efficient computation. Existing computation and communication technologies that are optimized for various architectures are used to achieve high performance. PVTOL abstracts the details of the underlying processor architectures to provide portability. We are actively developing PVTOL for Intel, PowerPC and Cell architectures and intend to add support for more computational kernels on these architectures. FPGAs are becoming popular for accelerating computation in both the high performance computing (HPC) and high performance embedded computing (HPEC) communities. Integrated processor-FPGA technologies are now available from both HPC and HPEC vendors, e.g. Cray and Mercury Computer Systems. We plan to support FPGAs as co-processors in PVTOL. Finally, automated mapping technology has been demonstrated with pMatlab. We plan to begin implementing automated mapping in PVTOL next year. Similar to PVL, as PVTOL matures and is used in more projects at Lincoln, we plan to propose concepts demonstrated in PVTOL to HPEC-SI for adoption into future versions of VSIPL++.
READ LESS

Summary

PVTOL provides an object-oriented C++ API that hides the complexity of multicore architectures within a PGAS programming model, improving programmer productivity. Tasks and conduits enable data flow patterns such as pipelining and round-robining. Hierarchical maps concisely describe how to allocate hierarchical arrays across processor and memory hierarchies and provide a...

READ MORE

Parallel and Distributed Processing

Author:
Published in:
High Performance Embedded Computing Handbook, Chapter 18

Summary

This chapter discusses parallel and distributed programming technologies for high performance embedded systems. Computational or memory constraints can be overcome with parallel processing. The primary goal of parallel processing is to improve performance by distributing computation across multiple processors or increasing dataset sizes by distributing data across multiple processors’ memory. The typical programmer has little to no experience writing programs that run on multiple processors. The transition from serial to parallel programming requires significant changes in the programmer’s way of thinking. For example, the programmer must worry about how to distribute data and computation across multiple processors to maximize performance and how to synchronize and communicate between processors. Although most programmers will likely admit to having no experience with parallel programming, many have indeed had exposure to a rudimentary type in the form of threads. A typical threaded program starts execution as a single thread.
READ LESS

Summary

This chapter discusses parallel and distributed programming technologies for high performance embedded systems. Computational or memory constraints can be overcome with parallel processing. The primary goal of parallel processing is to improve performance by distributing computation across multiple processors or increasing dataset sizes by distributing data across multiple processors’ memory...

READ MORE

Technical challenges of supporting interactive HPC

Published in:
Ann. High Performance Computer Modernization Program Users Group Conf., 19-21 June 2007.

Summary

Users' demand for interactive, on-demand access to a large pool of high performance computing (HPC) resources is increasing. The majority of users at Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) are involved in the interactive development of sensor processing algorithms. This development often requires a large amount of computation due to the complexity of the algorithms being explored and/or the size of the data set being analyzed. These researchers also require rapid turnaround of their jobs because each iteration directly influences code changes made for the following iteration. Historically, batch queue systems have not been a good match for this kind of user. The Lincoln Laboratory Grid (LLGrid) system at MIT-LL is the largest dedicated interactive, on-demand HPC system in the world. While the system also accommodates some batch queue jobs, the vast majority of jobs submitted are interactive, on-demand jobs. Choosing between running a system with a batch queue or in an interactive, on-demand manner involves tradeoffs. This paper discusses the tradeoffs between operating a cluster as a batch system, an interactive, ondemand system, or a hybrid system. The LLGrid system has been operational for over three years, and now serves over 200 users from across Lincoln. The system has run over 100,000 interactive jobs. It has become an integral part of many researchers' algorithm development workflows. For instance, in batch queue systems, an individual user commonly can gain access to 25% of the processors in the system after the job has waited in the queue; in our experience with on-demand, interactive operation, individual users often can also gain access to 20-25% of the cluster processors. This paper will share a variety of the new data on our experiences with running an interactive, on-demand system that also provides some batch queue access. Keywords: grid computing, on-demand, interactive high performance computing, cluster computing, parallel MATLAB.
READ LESS

Summary

Users' demand for interactive, on-demand access to a large pool of high performance computing (HPC) resources is increasing. The majority of users at Massachusetts Institute of Technology Lincoln Laboratory (MIT LL) are involved in the interactive development of sensor processing algorithms. This development often requires a large amount of computation...

READ MORE

PMatlab: parallel Matlab library for signal processing applications

Published in:
ICASSP, 32nd IEEE Int. Conf. on Acoustics Speech and Signal Processing, April 2007, pp. IV-1189 - IV-1192.

Summary

MATLAB is one of the most commonly used languages for scientific computing with approximately one million users worldwide. At MIT Lincoln Laboratory, MATLAB is used by technical staff to develop sensor processing algorithms. MATLAB'S popularity is based on availability of high-level abstractions leading to reduced code development time. Due to the compute intensive nature of scientific computing, these applications often require long running times and would benefit greatly from increased performance offered by parallel computing. pMatlab implements partitioned global address space (PGAS) support via standard operator overloading techniques. The core data structures in pMatlab are distributed arrays and maps, which simplify parallel programming by removing the need for explicit message passing. This paper presents the pMaltab design and results for the HPC Challenge benchmark suite. Additionally, two case studies of pMatlab use are described.
READ LESS

Summary

MATLAB is one of the most commonly used languages for scientific computing with approximately one million users worldwide. At MIT Lincoln Laboratory, MATLAB is used by technical staff to develop sensor processing algorithms. MATLAB'S popularity is based on availability of high-level abstractions leading to reduced code development time. Due to...

READ MORE

Parallel out-of-core Matlab for extreme virtual memory (Abstract)

Published in:
2005 IEEE Int. Conf. on Cluster Computing, 27-30 September 2005, p. 482 [abstract only].

Summary

Large data sets that cannot fit in memory can be addressed with out-of-core methods, which use memory as a "window" to view a section of the data stored on disk at a time. The Parallel Matlab for eXtreme Virtual Memory (pMatlab XVM) library adds out-of-core extensions to the Parallel Matlab (pMatlab) library. We have applied pMatlab XVM to the DARPA High Productivity Computing Systems? HPCchallenge FFT benchmark. The benchmark was run using several different implementations: C+MPI, pMatlab, pMatlab hand coded for out-of-core and pMatlab XVM. These experiments found 1) the performance of the C+MPI and pMatlab versions were comparable; 2) the out-of-core versions deliver 80% of the performance of the in-core versions; 3) the out-of-core versions were able to perform a 1 terabyte (64 billion point) FFT and 4) the pMatlab XVM program was smaller, easier to implement and verify, and more efficient than its hand coded equivalent. We are transitioning this technology to several DoD signal processing applications and plan to apply pMatlab XVM to the full HPCchallenge benchmark suite. Using next generation hardware, problems sizes a factor of 100 to 1000 times larger should be feasible.
READ LESS

Summary

Large data sets that cannot fit in memory can be addressed with out-of-core methods, which use memory as a "window" to view a section of the data stored on disk at a time. The Parallel Matlab for eXtreme Virtual Memory (pMatlab XVM) library adds out-of-core extensions to the Parallel Matlab...

READ MORE

Introduction to parallel programming and pMatlab v2.0

Published in:
Lincoln Laboratory external web site, [2005].

Summary

The computational demands of software continue to outpace the capacities of processor and memory technologies, especially in scientific and engineering programs. One option to improve performance is parallel processing. However, despite decades of research and development, writing parallel programs continues to be difficult. This is especially the case for scientists and engineers who have limited backgrounds in computer science. MATLAB®, due to its ease of use compared to other programming languages like C and Fortran, is one of the most popular languages for implementing numerical computations, thus making it an excellent platform for developing an accessible parallel computing framework. The MIT Lincoln Laboratory has developed two libraries, pMatlab and MatlabMPI, that not only enables parallel programming with MATLAB in a simple fashion, accessible to non-computer scientists. This document will overview basic concepts in parallel programming and introduce pMatlab.
READ LESS

Summary

The computational demands of software continue to outpace the capacities of processor and memory technologies, especially in scientific and engineering programs. One option to improve performance is parallel processing. However, despite decades of research and development, writing parallel programs continues to be difficult. This is especially the case for scientists...

READ MORE

Writing parallel parameter sweep applications with pMATLAB

Published in:
Lincoln Laboratory external web site [2005].

Summary

Parameter sweep applications execute the same piece of code multiple times with unique sets of input parameters. This type of application is extremely amenable to parallelization. This document describes how to parallelize parameter sweep applications with pMATLAB by introducting a simple serial parameter sweep applicaiton written in MATLAB, then parallelizing the application using pMATLAB.
READ LESS

Summary

Parameter sweep applications execute the same piece of code multiple times with unique sets of input parameters. This type of application is extremely amenable to parallelization. This document describes how to parallelize parameter sweep applications with pMATLAB by introducting a simple serial parameter sweep applicaiton written in MATLAB, then parallelizing...

READ MORE

Parallel MATLAB for extreme virtual memory

Published in:
Proc. of the HPCMP Users Group Conf., 27-30 June 2005, pp. 381-387.

Summary

Many DoD applications have extreme memory requirements, often with data sets larger than memory on a single computer. Such data sets can be addressed with out-of-core methods, which use memory as a "window" to view a section of the data stored on disk at a time. The Parallel Matlab for eXtreme Virtual Memory (pMatlab XVM) library adds out-of-core extensions to the Parallel Matlab (pMatlab) library. The DARPA High Productivity Computing Systems' HPC challenge FFT benchmark has been implemented in C+MPI, pMatlab, pMatlab hand coded for out-of-core and pMatlab XVM. We found that 1) the performance of the C+MPI and pMatlab versions were comparable; 2) the out-of-core versions deliver 80% of the performance of the in-core versions; 3) the out-of-core versions were able to perform a 1 TB (64 billion point) FFT; and 4) the pMatlab XVM program was smaller, easier to implement and verify, and more efficient than its hand coded equivalent. We plan to apply pMatlab XVM to the full HPC challenge benchmark suite. Using next generation hardware, problems sizes a factor of 100 to 1000 times larger should be feasible. We are also transitioning this technology to several DoD signal processing applications. Finally, the flexibility of pMatlab XVM allows hardware designers to experiment with FFT parameters in software before designing hardware for a real-time, ultra-long FFT.
READ LESS

Summary

Many DoD applications have extreme memory requirements, often with data sets larger than memory on a single computer. Such data sets can be addressed with out-of-core methods, which use memory as a "window" to view a section of the data stored on disk at a time. The Parallel Matlab for...

READ MORE

Technology requirements for supporting on-demand interactive grid computing

Summary

It is increasingly being recognized that a large pool of High Performance Computing (HPC) users requires interactive, on-demand access to HPC resources. How to provide these resources is a significant technical challenge that can be addressed from two directions. The first approach is to adapt existing batch queue based HPC systems to make them more interactive. The second approach is to start with existing interactive desktop environments (e.g., MATLAB) and design a system from the ground up that allows interactive parallel computing. The Lincoln Laboratory Grid (LLGrid) project has taken the latter approach. The LLGrid system has been operational for over a year with a few hundred processors and roughly 70 users, having run over 13,000 interactive jobs and consumed approximately 10,000 processor days of computation. This paper compares the on-demand and interactive computing features of four prominent batch queuing systems: openPBS, Sun GridEngine, Condor, and LSF. It goes on to briefly describe the LLGrid system, and how interactive, on-demand computing was achieved on it by binding to a resource management system. Finally, usage characteristics of the LLGrid system are discussed.
READ LESS

Summary

It is increasingly being recognized that a large pool of High Performance Computing (HPC) users requires interactive, on-demand access to HPC resources. How to provide these resources is a significant technical challenge that can be addressed from two directions. The first approach is to adapt existing batch queue based HPC...

READ MORE

Showing Results

1-10 of 10