Pixel-Processing Imager

In a traditional tracking or wave-front sensing system, a focal plane array (FPA) captures an image, an analog-to-digital converter (ADC) digitizes the light intensity value that was captured at each pixel, and the entire digitized image is transferred off of the readout integrated circuit (ROIC) and into a processing unit. This remote processing unit then performs all of the necessary computations required to implement the tracking or wave-front sensing algorithm. The latency of these traditional image processing systems is often dominated by the time it takes the ROIC to transfer an entire frame of image data to the downstream processing unit.

For some image processing algorithms, this latency bottleneck can be avoided by integrating the FPA and processing unit together into a single array, reducing the volume of data that needs to be transferred. We have developed a new digital pixel CMOS focal plane array, the Pixel-Processing Imager (PPI), which integrates the FPA, ROIC, and basic processing elements together in a single integrated circuit, enabling tracking and wave-front sensing algorithms to achieve low latency, which is crucial to the real-time application of these systems. The advantage of on-chip ROIC processing lies in significantly reducing the data load within the FPA prior to data transfer.

The PPI was designed to accommodate four algorithms: centroid tracking, Shack-Hartmann wave-front sensing, imaging peaking, and Fitts correlation tracking. A fundamental mathematical operation that is common to all the algorithms listed above is the multiply-accumulate (MAC) operation, which involves multiplying two numbers together and adding the result to a running sum. We have estimated the latency of the four selected algorithms for an image processing system that uses a traditional CCD focal plane array and for the PPI device. Our results show that dramatic improvements of between 20x and 45x reduction in latency time are possible with this new architecture.

A comparison of the signal flow path for a Fitts correlation tracker with a conventional CCD implementation is shown in Figure 1. The Fitts correlation tracker is useful for tracking objects within a noisy or high-clutter environment, and its algorithm involves accumulating weighted sums of spatial and temporal gradients.  

Comparison of data path between conventional CCD system and PPI implementationFigure 1. Comparison of data paths between conventional CCD system and PPI implementation. (Click on image to see larger version.)


In the conventional implementation on top, a raw analog image is digitized, sent off chip, and captured using a frame grabber, with the settling time of the readout and analog-to-digital conversion path generally determining the trade window between frame rate and noise floor. The resulting raw digitized image is thus given by the array size times the bit depth. To this raw image, numerous preprocessing steps must be applied, such as nonuniformity compensation, bad pixel mitigation, thresholding, clutter suppression techniques, and others. Then, to implement the Fitts algorithm, spatial and temporal gradients must be computed based on the raw image data, resulting in a set of sparse matrices of the same format, for which most of the array has zero value. These sparse matrices are then reduced to sum-of-product results through a series of MAC operations. Note that the focal plane readout occurs at a point in the signal flow for which the data volume is at its maximum.

The PPI in contrast performs imaging, ADC, nonuniformity correction, bad pixel mitigation, and thresholding all in the pixels, and then carries out the calculation of spatial and temporal gradients, region-of-interest operations for clutter suppression, and sparse matrix reduction through MAC operations also on the chip at the edges of the imaging array. Carrying out this processing on the sensor chip significantly reduces the amount of data required to be sent to the downstream processor, thus improving the overall latency of the algorithm.

The PPI is implemented as a 256 × 256 pixel, 30 μm pixel pitch, single-chip image sensor that enables algorithms such as centroid computation, wave-front sensing, correlation tracking, and intensity-squared image peaking to achieve low latency. Each pixel in the ROIC mates to a single InGaAs photodetector input and includes a sigma-delta ADC that is capable of producing up to 28 bits of in-pixel digital output. The entire digital ADC result is resident within each corresponding pixel, which enables real-time signal processing on chip. The ROIC is based on architectures used in previous Lincoln Laboratory designs dating back to 2007. The CMOS ROIC bonds to an infrared photodetector array using indium bumps. The ROIC structure consists of three main functional blocks: the pixel analog front end, the pixel digital back end, and the readout circuitry. A significant advantage of the digital FPA architecture lies in the on-chip signal processing capabilities enabled by in-pixel digitization.

PPI single chip top level circuit block diagramFigure 2. PPI single-chip top-level circuit block diagram. (Click on image to see larger version.)

A top-level block diagram of the PPI is shown in Figure 2. The digital registers of the pixel array may be preloaded using input shift registers located on the north and west sides of the chip. The 256 × 256 pixel array performs photocurrent measurement and pixel-level signal processing functions. Data may then be simultaneously transferred toward the east and south into the peripheral logic blocks that perform MAC-based functions and serialize the output, which is transmitted off the chip by a high-speed differential output buffer.


top of page