Summary
When digital signal processing operations are implemented on a computer or with special-purpose hardware, errors and constraints due to finite word length are unavoidable. The main categories of finite register length effects are errors due to A/D conversion, errors due to roundoffs in the arithmetic, constraints on signal levels imposed by the need to prevent overflow, and quantization of system coefficients. The effects of finite register length on implementations of linear recursive difference equation digital filters, and the fast Fourier transform (FFT), are discussed in some detail. For these algorithms, the differing quantization effects of fixed point, floating point, and block floating point arithmetic are examined and compared. The paper is intended primarily as a tutorial review of a subject which has received considerable attention over the past few years. The groundwork is set through a discussion of the relationship between the binary representation of numbers and truncation or rounding, and a formulation of a statistical model for arithmetic roundoff. The analyses presented here are intended to illustrate techniques of working with particular models. Results of previous work are discussed and summarized when appropriate. Some examples are presented to indicate how the results developed for simple digital filters and the FFT can be applied to the analysis of more complicated systems which use these algorithms as building blocks.