Summary
When a digital filter is implemented on a computer or with special-purpose hardware, errors and constraints due to finite word length are unavoidable. These quantization effects must be considered, both in deciding what register length is needed for a given filter implementation and in choosing between several possible implementations of the same filter design, which will be affected differently by quantization. Quantization effects in digital filters can be divided into four main categories: quantization of system coefficients, errors due to analog-digital (A-D) conversion, errors due to roundoffs in the arithmetic, and a constraint on signal level due to the requirement that overflow be prevented in the computation. The effects of these errors and constraints will vary, depending on the type of arithmetic used. Fixed point, floating point, and block floating point are three alternate types of arithmetic often employed in digital filtering. A very large portion of the computation performed in digital filtering is composed of two basic algorithms the first- or second-order, linear, constant coefficient, recursive difference equation; and computation of the discrete Fourier transform (DFT) by means of the fast Fourier transform (FFT). These algorithms serve as building blocks from which the most complicated digital filtering systems can be constructed. The effects of quantization on implementations of these basic algorithms are studied in some detail. Sensitivity formulas are presented for the effects of coefficient quantization on the poles of simple recursive filters. The mean-squared error in a computed DFT, due to coefficient quantization in the FFT, is estimated. For both recursions and the FFT, the differing effects of fixed and floating point coefficients are investigated. Statistical models for roundoff errors and A-D conversion errors, and linear system noise theory, are employed to estimate output noise variance in simple recursive filters and in the FFT. By considering the overflow constraint in conjunction with these noise analyses, output noise-to-signal ratios are derived. Noise-to-signal ratio analyses are carried out for fixed, floating, and block floating point arithmetic, and the results are compared. All the noise analyses are based on simple statistical models for roundoff errors (and A-D conversion errors). Of course, somewhat different models are applied for the different types of arithmetic. These models cannot in general be verified theoretically, and thus one must resort to experimental noise measurements to support the predictions obtained via the models. A good deal of experimental data on noise measurements is presented here, and the empirical results are generally in good agreement with the predictions based on the statistical models. The ideas developed in the study of simple recursive filters and the FFTare applied to analyze quantization effects in two more complicated types of digital filters frequency sampling and FFT filters. The frequency sampling filter is realized by means of a comb filter and a bank of second-order recursive filters; while an FFT filter implements a convolution via an FFT, a multiplication in the frequency domain, and an inverse FFT. Any finite duration impulse response filter can be realized by either of these methods. The effects of coefficient quantization, roundoff noise, and the overflow constraint are investigated for these two filter types. Through use of a specific example, realizations of the same filter design, by means of the frequency sampling and FFT methods, are compared on the basis of differing quantization effects.