Abstract:
This briefing describes the hardware implementation of an ultra-high performance, next-generation, Single-Instruction/Multiple-Data (SIMD) processing architecture performing continuous pulse compression on actual radar data sets. This low-power, scalable architecture provides for hundreds and ultimately thousands of processing elements, each with optional floating-point hardware, to perform data parallel processing on image and signal processing applications. The COTS board includes two, 25 GFLOPS chips running optimized code generated by WorldScape within a flexible development environment, including assembly language and C-based language support, as well as a cycle accurate simulator. Cycle accurate simulation and hardware performance was measured and accurate to within 1%. Benchmark measurements are provided that include input/output across multiple chips and extrapolated performance. Chip level design and architectural options based on existing building blocks will be discussed and potential impacts on certain types of processing tasks will be described. Descriptions of architectural scalability and library enhancements will provide meaningful input for those currently architecting large-scale embedded processing systems.

The underlying technology, being developed by WorldScape and ClearSpeed, has been shown to provide ten to one hundred times the overall performance of a PowerPC or Pentium-based architectures, especially when performing image and signal processing functions, such as FFTs or filters. Viable architectural scaling will enable significant throughput, size, and power advantages for embedded processing applications. This briefing will also provide updated hardware benchmarks including, among others, FFTs per second per watt.

WorldScape Defense Company has been developing key algorithms and library functions such as FFTs, linear algebra, and filters which efficiently utilize the architecture and floating point per PE hardware to gain exceptional performance at very low power dissipation levels. Contract work supported by the Office of Naval Research is currently progressing towards UAV-based SAR radar systems and other multi-sensor applications. High level C and VSIPL library support are planned and currently under development.
Lockheed Martin Maritime Systems and Sensors Division has been trained in the use of the SDK, and has ported some key, high-performance application benchmarks, such as radar pulse compression, for performance comparison with general purpose processing architectures. Results have shown the potential for considerable size, weight, power dissipation, and throughput performance enhancement for airborne, shipboard, ground-based and undersea tactical signal and image processing systems.

ClearSpeed Technology Limited is developing the MTAP architecture that provides a scalable array of Processing Elements (PEs) on a single die. Currently 64, 96, and 256 PE devices are planned with customizable levels of floating point and I/O performance. The technology is complemented by a scalable packet switched bus architecture called ClearConnect that has been designed to support the high bandwidth I/O required for many applications. The processor is supported by a COTS Software Development Kit (SDK) and includes an optimizing C compiler, graphical debugger and a full suite of supporting tools and libraries.

In this briefing, we describe an embedded processing architecture and COTS hardware that promises a performance advantage over conventional general purpose processors of one or two orders of magnitude. Finally, the results of a DoD benchmark algorithm run on the hardware and cycle-accurate simulator in summer, 2004, will be presented and compared with general-purpose processor performance.