Publications

Refine Results

(Filters Applied) Clear All

Moments of parameter estimates for Chung-Lu random graph models

Published in:
ICASSP 2012, Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 25-30 March 2012, pp. 3961-4.

Summary

As abstract representations of relational data, graphs and networks find wide use in a variety of fields, particularly when working in non- Euclidean spaces. Yet for graphs to be truly useful in in the context of signal processing, one ultimately must have access to flexible and tractable statistical models. One model currently in use is the Chung- Lu random graph model, in which edge probabilities are expressed in terms of a given expected degree sequence. An advantage of this model is that its parameters can be obtained via a simple, standard estimator. Although this estimator is used frequently, its statistical properties have not been fully studied. In this paper, we develop a central limit theory for a simplified version of the Chung-Lu parameter estimator. We then derive approximations for moments of the general estimator using the delta method, and confirm the effectiveness of these approximations through empirical examples.
READ LESS

Summary

As abstract representations of relational data, graphs and networks find wide use in a variety of fields, particularly when working in non- Euclidean spaces. Yet for graphs to be truly useful in in the context of signal processing, one ultimately must have access to flexible and tractable statistical models. One...

READ MORE

Dynamic Distributed Dimensional Data Model (D4M) database and computation system

Summary

A crucial element of large web companies is their ability to collect and analyze massive amounts of data. Tuple store databases are a key enabling technology employed by many of these companies (e.g., Google Big Table and Amazon Dynamo). Tuple stores are highly scalable and run on commodity clusters, but lack interfaces to support efficient development of mathematically based analytics. D4M (Dynamic Distributed Dimensional Data Model) has been developed to provide a mathematically rich interface to tuple stores (and structured query language "SQL" databases). D4M allows linear algebra to be readily applied to databases. Using D4M, it is possible to create composable analytics with significantly less effort than using traditional approaches. This work describes the D4M technology and its application and performance.
READ LESS

Summary

A crucial element of large web companies is their ability to collect and analyze massive amounts of data. Tuple store databases are a key enabling technology employed by many of these companies (e.g., Google Big Table and Amazon Dynamo). Tuple stores are highly scalable and run on commodity clusters, but...

READ MORE

Identification and compensation of Wiener-Hammerstein systems with feedback

Published in:
ICASSP 2011, IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, 22-27 May 2011, pp. 4056-4059.

Summary

Efficient operation of RF power amplifiers requires compensation strategies to mitigate nonlinear behavior. As bandwidth increases, memory effects become more pronounced, and Volterra series based compensation becomes onerous due to the exponential growth in the number of necessary coefficients. Behavioral models such as Wiener-Hammerstein systems with a parallel feedforward or feedback filter are more tractable but more difficult to identify. In this paper, we extend a Wiener-Hammerstein identification method to such systems showing that identification is possible (up to inherent model ambiguities) from single- and two-tone measurements. We also calculate the Cramer-Rao bound for the system parameters and compare to our identification method in simulation. Finally, we demonstrate equalization performance using measured data from a wideband GaN power amplifier.
READ LESS

Summary

Efficient operation of RF power amplifiers requires compensation strategies to mitigate nonlinear behavior. As bandwidth increases, memory effects become more pronounced, and Volterra series based compensation becomes onerous due to the exponential growth in the number of necessary coefficients. Behavioral models such as Wiener-Hammerstein systems with a parallel feedforward or...

READ MORE

Hogs and slackers: using operations balance in a genetic algorithm to optimize sparse algebra computation on distributed architectures

Published in:
Parallel Comput., Vol. 36, No. 10-11, October-November 2010, pp. 635-644.

Summary

We present a framework for optimizing the distributed performance of sparse matrix computations. These computations are optimally parallelized by distributing their operations across processors in a subtly uneven balance. Because the optimal balance point depends on the non-zero patterns in the data, the algorithm, and the underlying hardware architecture, it is difficult to determine. The Hogs and Slackers genetic algorithm (GA) identifies processors with many operations - hogs, and processors with few operations - slackers. Its intelligent operation-balancing mutation operator swaps data blocks between hogs and slackers to explore new balance points. We show that this operator is integral to the performance of the genetic algorithm and use the framework to conduct an architecture study that varies network specifications. The Hogs and Slackers GA is itself a parallel algorithm with near linear speedup on a large computing cluster.
READ LESS

Summary

We present a framework for optimizing the distributed performance of sparse matrix computations. These computations are optimally parallelized by distributing their operations across processors in a subtly uneven balance. Because the optimal balance point depends on the non-zero patterns in the data, the algorithm, and the underlying hardware architecture, it...

READ MORE

Rapid prototyping of radar algorithms

Author:
Published in:
IEEE Sig. Proc. Mag., Vol. 26, No. 6, November 2009, pp. 158-162.

Summary

Rapid prototyping of advanced signal processing algorithms is critical to developing new radars. Signal processing engineers usually use high level languages like MATLAB, IDL, or Python to develop advanced algorithms and to determine the optimal parameters for these algorithms. Many of these algorithms have very long execution times due to computational complexity and/or very large data sets, which hinders an efficient engineering development workflow. That is, signal processing engineers must wait hours, or even days, to get the results of the current algorithm, parameters, and data set before making changes and refinements for the next iteration. In the meantime, the engineer may have thought of several more permutations that he or she wants to test.
READ LESS

Summary

Rapid prototyping of advanced signal processing algorithms is critical to developing new radars. Signal processing engineers usually use high level languages like MATLAB, IDL, or Python to develop advanced algorithms and to determine the optimal parameters for these algorithms. Many of these algorithms have very long execution times due to...

READ MORE

Automatic registration of LIDAR and optical images of urban scenes

Published in:
CVPR 2009, IEEE Conf. on Computer Vision and Pattern Recognition, 20-25 June 2009, pp. 2639-2646.

Summary

Fusion of 3D laser radar (LIDAR) imagery and aerial optical imagery is an efficient method for constructing 3D virtual reality models. One difficult aspect of creating such models is registering the optical image with the LIDAR point cloud, which is characterized as a camera pose estimation problem. We propose a novel application of mutual information registration methods, which exploits the statistical dependency in urban scenes of optical apperance with measured LIDAR elevation. We utilize the well known downhill simplex optimization to infer camera pose parameters. We discuss three methods for measuring mutual information between LIDAR imagery and optical imagery. Utilization of OpenGL and graphics hardware in the optimization process yields registration times dramatically lower than previous methods. Using an initial registration comparable to GPS/INS accuracy, we demonstrate the utility of our algorithm with a collection of urban images and present 3D models created with the fused imagery.
READ LESS

Summary

Fusion of 3D laser radar (LIDAR) imagery and aerial optical imagery is an efficient method for constructing 3D virtual reality models. One difficult aspect of creating such models is registering the optical image with the LIDAR point cloud, which is characterized as a camera pose estimation problem. We propose a...

READ MORE

Extending the dynamic range of RF receivers using nonlinear equalization

Summary

Systems currently being developed to operate across wide bandwidths with high sensitivity requirements are limited by the inherent dynamic range of a receiver's analog and mixed-signal components. To increase a receiver's overall linearity, we have developed a digital NonLinear EQualization (NLEQ) processor which is capable of extending a receiver's dynamic range from one to three orders of magnitude. In this paper we describe the NLEQ architecture and present measurements of its performance.
READ LESS

Summary

Systems currently being developed to operate across wide bandwidths with high sensitivity requirements are limited by the inherent dynamic range of a receiver's analog and mixed-signal components. To increase a receiver's overall linearity, we have developed a digital NonLinear EQualization (NLEQ) processor which is capable of extending a receiver's dynamic...

READ MORE

High-productivity software development with pMATLAB

Published in:
Comput. Sci. Eng., Vol. 11, No. 1, January/February 2009, pp. 75-79.

Summary

In this paper, we explore the ease of tackling a communication-intensive parallel computing task - namely, the 2D fast Fourier transform (FFT). We start with a simple serial Matlab code, explore in detail a ID parallel FFT, and illustrate how it can be extended to multidimensional FFTs.
READ LESS

Summary

In this paper, we explore the ease of tackling a communication-intensive parallel computing task - namely, the 2D fast Fourier transform (FFT). We start with a simple serial Matlab code, explore in detail a ID parallel FFT, and illustrate how it can be extended to multidimensional FFTs.

READ MORE

Language, dialect, and speaker recognition using Gaussian mixture models on the cell processor

Published in:
Twelfth Annual High Performance Embedded Computing Workshop, HPEC 2008, 23-25 September 2008.

Summary

Automatic recognition systems are commonly used in speech processing to classify observed utterances by the speaker's identity, dialect, and language. These problems often require high processing throughput, especially in applications involving multiple concurrent incoming speech streams, such as in datacenter-level processing. Recent advances in processor technology allow multiple processors to reside within the same chip, allowing high performance per watt. Currently the Cell Broadband Engine has the leading performance-per-watt specifications in its class. Each Cell processor consists of a PowerPC Processing Element (PPE) working together with eight Synergistic Processing Elements (SPE). The SPEs have 256KB of memory (local store), which is used for storing both program and data. This paper addresses the implementation of language, dialect, and speaker recognition on the Cell architecture. Classically, the problem of performing speech-domain recognition has been approached as embarrassingly parallel, with each utterance being processed in parallel to the others. As we will discuss, efficient processing on the Cell requires a different approach, whereby computation and data for each utterance are subdivided to be handled by separate processors. We present a computational model for automatic recognition on the Cell processor that takes advantage of its architecture, while mitigating its limitations. Using the proposed design, we predict a system able to concurrently score over 220 real-time speech streams on a single Cell.
READ LESS

Summary

Automatic recognition systems are commonly used in speech processing to classify observed utterances by the speaker's identity, dialect, and language. These problems often require high processing throughput, especially in applications involving multiple concurrent incoming speech streams, such as in datacenter-level processing. Recent advances in processor technology allow multiple processors to...

READ MORE

PVTOL: providing productivity, performance, and portability to DoD signal processing applications on multicore processors

Published in:
DoD HPCMP 2008, High Performance Computing Modernization Program Users Group Conf., 14 July 2008, pp. 327-333.

Summary

PVTOL provides an object-oriented C++ API that hides the complexity of multicore architectures within a PGAS programming model, improving programmer productivity. Tasks and conduits enable data flow patterns such as pipelining and round-robining. Hierarchical maps concisely describe how to allocate hierarchical arrays across processor and memory hierarchies and provide a simple API for moving data across these hierarchies. Functors encapsulate computational kernels; new functors can be easily developed using the PVTOL API and can be fused for more efficient computation. Existing computation and communication technologies that are optimized for various architectures are used to achieve high performance. PVTOL abstracts the details of the underlying processor architectures to provide portability. We are actively developing PVTOL for Intel, PowerPC and Cell architectures and intend to add support for more computational kernels on these architectures. FPGAs are becoming popular for accelerating computation in both the high performance computing (HPC) and high performance embedded computing (HPEC) communities. Integrated processor-FPGA technologies are now available from both HPC and HPEC vendors, e.g. Cray and Mercury Computer Systems. We plan to support FPGAs as co-processors in PVTOL. Finally, automated mapping technology has been demonstrated with pMatlab. We plan to begin implementing automated mapping in PVTOL next year. Similar to PVL, as PVTOL matures and is used in more projects at Lincoln, we plan to propose concepts demonstrated in PVTOL to HPEC-SI for adoption into future versions of VSIPL++.
READ LESS

Summary

PVTOL provides an object-oriented C++ API that hides the complexity of multicore architectures within a PGAS programming model, improving programmer productivity. Tasks and conduits enable data flow patterns such as pipelining and round-robining. Hierarchical maps concisely describe how to allocate hierarchical arrays across processor and memory hierarchies and provide a...

READ MORE