A linear algebraic approach to graph algorithms that exploits the sparse adjacency matrix representation of graphs can provide a variety of benefits. These benefits include syntactic simplicity, easier implementation, and higher performance. Selected examples are presented illustrating these benefits. These examples are drawn from the remainder of the book in the areas of algorithms, data analysis, and computation.

READ LESS

Summary

Graphs and matrices

3-d graph processor

January 1, 2010

Presentation

Author:

William S. Song

…

Published in:

HPEC 2010, High Performance Embedded Computing Workshop

Topic:

graph processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Lincoln Laboratory Supercomputing Center

Summary

Graph algorithms are used for numerous database applications such as analysis of financial transactions, social networking patterns, and internet data. While graph algorithms can work well with moderate size databases, processors often have difficulty providing sufficient throughput when the databases are large. This is because the processor architectures are poorly matched to the graph computational flow. For example, most modern processors utilize cache based memory in order to take advantage of highly localized memory access patterns. However, memory access patterns associated with graph processing are often random in nature and can result in high cache miss rates. In addition, graph algorithms require significant overhead computation for dealing with indices of vertices and edges.

READ LESS

Summary

3-d graph processor

Rapid prototyping of radar algorithms

November 1, 2009

Journal Article

Author:

Albert I. Reuther

…

Jeremy Kepner

Published in:

IEEE Sig. Proc. Mag., Vol. 26, No. 6, November 2009, pp. 158-162.

Topic:

signal processing

R&D area:

R&D group:

Embedded and Open Systems

Summary

Rapid prototyping of advanced signal processing algorithms is critical to developing new radars. Signal processing engineers usually use high level languages like MATLAB, IDL, or Python to develop advanced algorithms and to determine the optimal parameters for these algorithms. Many of these algorithms have very long execution times due to computational complexity and/or very large data sets, which hinders an efficient engineering development workflow. That is, signal processing engineers must wait hours, or even days, to get the results of the current algorithm, parameters, and data set before making changes and refinements for the next iteration. In the meantime, the engineer may have thought of several more permutations that he or she wants to test.

READ LESS

Summary

Rapid prototyping of radar algorithms

Automatic registration of LIDAR and optical images of urban scenes

June 20, 2009

Conference Paper

Author:

Andrew Mastin

…

Published in:

CVPR 2009, IEEE Conf. on Computer Vision and Pattern Recognition, 20-25 June 2009, pp. 2639-2646.

Topic:

ladar

R&D area:

R&D group:

Embedded and Open Systems

Summary

Fusion of 3D laser radar (LIDAR) imagery and aerial optical imagery is an efficient method for constructing 3D virtual reality models. One difficult aspect of creating such models is registering the optical image with the LIDAR point cloud, which is characterized as a camera pose estimation problem. We propose a novel application of mutual information registration methods, which exploits the statistical dependency in urban scenes of optical apperance with measured LIDAR elevation. We utilize the well known downhill simplex optimization to infer camera pose parameters. We discuss three methods for measuring mutual information between LIDAR imagery and optical imagery. Utilization of OpenGL and graphics hardware in the optimization process yields registration times dramatically lower than previous methods. Using an initial registration comparable to GPS/INS accuracy, we demonstrate the utility of our algorithm with a collection of urban images and present 3D models created with the fused imagery.

READ LESS

Summary

Automatic registration of LIDAR and optical images of urban scenes

High-productivity software development with pMATLAB

January 1, 2009

Journal Article

Author:

Julia Mullen

…

Published in:

Comput. Sci. Eng., Vol. 11, No. 1, January/February 2009, pp. 75-79.

Topic:

computing

R&D area:

R&D group:

Embedded and Open Systems

Summary

In this paper, we explore the ease of tackling a communication-intensive parallel computing task - namely, the 2D fast Fourier transform (FFT). We start with a simple serial Matlab code, explore in detail a ID parallel FFT, and illustrate how it can be extended to multidimensional FFTs.

READ LESS

Summary

High-productivity software development with pMATLAB

Language, dialect, and speaker recognition using Gaussian mixture models on the cell processor

September 23, 2008

Conference Paper

Author:

Nicolas Malyska

…

Published in:

Twelfth Annual High Performance Embedded Computing Workshop, HPEC 2008, 23-25 September 2008.

Topic:

language recognition

R&D area:

Cyber Security and Information Sciences

R&D group:

Summary

Automatic recognition systems are commonly used in speech processing to classify observed utterances by the speaker's identity, dialect, and language. These problems often require high processing throughput, especially in applications involving multiple concurrent incoming speech streams, such as in datacenter-level processing. Recent advances in processor technology allow multiple processors to reside within the same chip, allowing high performance per watt. Currently the Cell Broadband Engine has the leading performance-per-watt specifications in its class. Each Cell processor consists of a PowerPC Processing Element (PPE) working together with eight Synergistic Processing Elements (SPE). The SPEs have 256KB of memory (local store), which is used for storing both program and data. This paper addresses the implementation of language, dialect, and speaker recognition on the Cell architecture. Classically, the problem of performing speech-domain recognition has been approached as embarrassingly parallel, with each utterance being processed in parallel to the others. As we will discuss, efficient processing on the Cell requires a different approach, whereby computation and data for each utterance are subdivided to be handled by separate processors. We present a computational model for automatic recognition on the Cell processor that takes advantage of its architecture, while mitigating its limitations. Using the proposed design, we predict a system able to concurrently score over 220 real-time speech streams on a single Cell.

READ LESS

Summary

Language, dialect, and speaker recognition using Gaussian mixture models on the cell processor

PVTOL: providing productivity, performance, and portability to DoD signal processing applications on multicore processors

July 14, 2008

Conference Paper

Author:

Hahn G. Kim

…

Published in:

DoD HPCMP 2008, High Performance Computing Modernization Program Users Group Conf., 14 July 2008, pp. 327-333.

Topic:

high performance computing

R&D area:

R&D group:

Embedded and Open Systems

Summary

PVTOL provides an object-oriented C++ API that hides the complexity of multicore architectures within a PGAS programming model, improving programmer productivity. Tasks and conduits enable data flow patterns such as pipelining and round-robining. Hierarchical maps concisely describe how to allocate hierarchical arrays across processor and memory hierarchies and provide a simple API for moving data across these hierarchies. Functors encapsulate computational kernels; new functors can be easily developed using the PVTOL API and can be fused for more efficient computation. Existing computation and communication technologies that are optimized for various architectures are used to achieve high performance. PVTOL abstracts the details of the underlying processor architectures to provide portability. We are actively developing PVTOL for Intel, PowerPC and Cell architectures and intend to add support for more computational kernels on these architectures. FPGAs are becoming popular for accelerating computation in both the high performance computing (HPC) and high performance embedded computing (HPEC) communities. Integrated processor-FPGA technologies are now available from both HPC and HPEC vendors, e.g. Cray and Mercury Computer Systems. We plan to support FPGAs as co-processors in PVTOL. Finally, automated mapping technology has been demonstrated with pMatlab. We plan to begin implementing automated mapping in PVTOL next year. Similar to PVL, as PVTOL matures and is used in more projects at Lincoln, we plan to propose concepts demonstrated in PVTOL to HPEC-SI for adoption into future versions of VSIPL++.

READ LESS

Summary

PVTOL: providing productivity, performance, and portability to DoD signal processing applications on multicore processors

Multicore programming in pMatlab using distributed arrays

June 1, 2008

Abstract

Author:

Jeremy Kepner

Published in:

CLADE '08: Proceedings of the 6th international workshop on Challenges of large applications in distributed environments

Topic:

computing

R&D area:

Cyber Security and Information Sciences

R&D group:

Lincoln Laboratory Supercomputing Center

Summary

Matlab is one of the most commonly used languages for scientific computing with approximately one million users worldwide. Many of the programs written in matlab can benefit from the increased performance offered by multicore processors and parallel computing clusters. The Lincoln pMatlab library (http://www.ll.mit.edu/pMatlab) allows high performance parallel programs to be written quickly using the distributed arrays programming paradigm. This talk provides an introduction to distributed arrays programming and will describe the best programming practices for using distributed arrays to produce programs that perform well on multicore processors and parallel computing clusters. These practices include understanding the concepts of parallel concurrency vs. parallel data locality

READ LESS

Summary

Multicore programming in pMatlab using distributed arrays

Analytic theory of power law graphs

March 12, 2008

Abstract

Author:

Jeremy Kepner

Published in:

SIAM Conference on Parallel Processing for Scientific Computing

Topic:

graph processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Lincoln Laboratory Supercomputing Center

Summary

An analytical theory of power law graphs is presented basedon the Kronecker graph generation technique. The analysisuses Kronecker exponentials of complete bipartite graphsto formulate the sub-structure of such graphs. This allows various high level quantities (e.g. degree distribution,betweenness centrality, diameter, eigenvalues, and isoparametric ratio) to be computed directly from the model pa-rameters. The implications of this work on “clustering”and “dendragram” heuristics are also discussed.

READ LESS

Summary

Analytic theory of power law graphs

Performance metrics and software architecture

January 1, 2008

Book Chapter

Author:

Jeremy Kepner

…

Published in:

High Performance Embedded Computing Handbook, Chapter 15

Topic:

software

R&D area:

Cyber Security and Information Sciences

R&D group:

Lincoln Laboratory Supercomputing Center

Summary

This chapter presents that high performance embedded computing (HPEC) software architectures and evaluation metrics. A canonical HPEC application is used to illustrate basic concepts. The chapter discusses different types of parallelism are reviewed, and performance analysis techniques. It presents a typical programmable multicomputer and explores the performance trade-offs of different parallel mappings on this computer using key system performance metrics. HPEC systems are amongst the most challenging systems in the world to build. Synthetic Aperture Radar (SAR) is one of the most common modes in a radar system and one of the most computationally stressing to implement. Often the first step in the development of a system is to produce a rough estimate of how many processors will be needed. The parallel opportunities at each stage of the calculation discussed in the previous section show that there are many different ways to exploit parallelism in this application. The chapter concludes with a discussion of the impact of different software implementations approaches.

READ LESS

Summary

Performance metrics and software architecture

Publications

Refine Results

By

Summary

Summary

3-d graph processor

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Showing Results