Chapter 4

MULTIRATE CONVOLUTION

This chapter combines the material from Chapters Two and Three. It demonstrates that convolution in subbands is accurate and, for some implementations, more efficient than the typical single channel algorithm. Specific sections discuss project goals and relevant research, and a proposed variation for two-channel time-domain convolution.

Introduction

Recall from Chapter One that the goal of this research is to apply subband convolution in order to significantly reduce the amount of computation required by single channel methods, thereby reducing the computational load on system resources and reducing possible input to output latency. In this sense, it is important to address three parameters: computational complexity, implementation complexity, and input-to-output latency. Computational complexity can be quantitatively defined as the number of multiplications and additions required for each sample of the convolved output. Frequency-domain ("fast") convolution is less computationally complex than an equivalent direct time-domain convolution, as discussed in Chapter Two. Implementation complexity can be more qualitatively defined as the intricacy of software or hardware required to realize the algorithm. In this sense, frequency-domain convolution is more complex than its time-domain equivalent due to the addition of the transform and inverse transform blocks. Input to output latency is predominantly governed by the theoretical system’s intrinsic delay and, more practically, the system implementation’s load on computational resources. In the case of sample-by-sample streaming, the intrinsic delay (the order of H(z) if H(z) represents the entire system) critically affects latency. For buffered implementations, system delay is very small compared to buffer size and, if dealt with properly, can be eliminated. Of more importance is the system’s load on computing resources, which is directly proportional to computational complexity.

Orthonormal and Biorthonormal Convolvers

Vaidyanathan (1993b) and Phoong and Vaidyanathan (1995) have shown that, in general, convolution can be performed in the subbands of a multirate filter bank with approximately the same computational complexity as in the single band case. For the case of subband quantization, proper bit allocation schemes reduce roundoff error and produce more accurate convolutions for a given bit rate. The motivation for this approach does not originate from a desire to obtain algorithms that are faster than well-known fast convolution techniques, but to show that convolution in subbands is a feasible alternative (with some beneficial properties) to known methods.

Convolution Theorem for Orthonormal Filter Banks

A consequence of a paraunitary filter bank, as defined in Chapter 3, is that the analysis and synthesis filters also satisfy the orthonormality condition given by

(58)

Given the system shown in Figure 21, where A and S are perfect reconstruction filter banks that satisfy the paraunitary or orthonormal property,

 

Figure 21. The M-channel filter bank used for orthonormal convolvers.

 

then the decimated convolution of x(n) and h*(-n) is

(59)

or, equivalently,

(60)

where (z) represents the paraconjugate of H(z) or H*(z-1). Notice that this scheme only provides the decimated response. To obtain all the samples, the process must be repeated for h(n-i) for 0 £ i £ M-1.

The theorem can be generalized to maximally decimated filter banks with unequal decimation (sum of 1/nk = 1). For the non-uniform case where L is the least common multiple of the nk order decimations where nk,m represents the greatest common divisor of nk and nm,

(61)

where pk = L/nk.

Convolution Theorem for Biorthonormal Filter Banks

For maximally decimated perfect reconstruction filter banks that satisfy the so-called biorthonormal property,

(62)

the convolution of x(n) and h(n) as opposed to h*(-n) can be computed using the new architecture in Figure 22 with

(63)

 

 

Figure 22. The biorthonormal convolution architecture. Ak and Sk are the analysis and synthesis filters, respectively, of a maximally decimated perfect reconstruction system.

 

Again, this scheme only provides the decimated response. To obtain all the samples, the process must be repeated for h(n-i) for 0 £ i £ M-1. When the filter bank is indeed orthornormal (paraunitary), that is, Sk(z) = Âk(z), then this case is similar to the decomposition of x(n) and h*(-n) with the same set of filters, Ak(z). Table 1 illustrates the process for two simple signals with M = 2 and nk = 2. The perfect reconstruction filter bank is defined as

(64)

Although at first glance, trivial, this system indeed has the properties of perfect reconstruction and linear phase as defined in Chapter 3.

 

Table 1. Time domain signals in a biorthonormal filter bank convolver.

x(n) = [ 1 2 3 4 ] h(n) = [ 5 6 7 8 ] x*h = [ 5 16 34 60 61 52 32 ]

n= 00 1 2 3 4 5 6 7
for h(n):
h0(n): 0 6 8          
h1(n): 5 7            
x0(n): 1 3            
x1(n): 0 2 4          
y0=x0*h0 0 6 26 24        
y1=x1*h1 0 10 34 28        
(y0+y1)­ 2 0 0 16 0 60 0 52 0
for h(n-1):
h0(n): 5 7            
h1(n): 6 8            
x0(n): 1 3            
x1(n): 0 2 4          
y0 = x0*h0: 5 22 21          
y1 = x1*h1: 0 12 40 32        
(y0 + y1)­ 2: 5 0 34 0 61 0 32 0
delay by 1: 0 5 0 34 0 61 0 32
add previous result: 0 0 16 0 60 0 52 0
y(n): 0 5 16 34 60 61 52 32

 

Coding Gain

In this case, coding gain applies to the accuracy of the convolution calculation with respect to quantized coefficients. The coding gain derived in great detail in Vaidyanathan’s (1993b) paper specifically involves quantization noise and the noise model’s error variance in the multirate and single rate case. Specifically, the optimal coding gain (the ratio of the error variance of the single rate case to the error variance of the multirate case) is always greater than 1 for all M > 1. That is, this multirate method will always yield lower re-quantization noise variance in the convolution calculation.

"Running" Convolution with Multirate Filter Banks

In contrast with the orthonormal and biorthonormal method outlined above, Vetterli (1988) has proposed a method of reducing computational complexity associated with FIR and IIR filtering using multirate filter banks. Consider the implementation shown in Figure 23.

 

Figure 23. Multirate running convolution example (adapted from Vetterli 1988).

 

With the help of Equations 17 and 19, the output can be expressed as

(65)

After simplifying, the alias term, X(-z), cancels and the resultant output is

(66)

Now by choosing

(67)

(68)

The output reduces to the desired result with unit delay,

(69)

If H(z) is of length N, then the two filters H0(z) and H1(z), and their sum are of length N/2. So instead of filtering with H(z), this scheme filters with three filters of half the length at half the speed. At the cost of added implementation complexity, the computational complexity is reduced by about 25%. The method can be expanded into a tree structure or a generalized parallel M-channel structure derived from the tree. But the savings begin to disappear when the number of extra additions required by extended versions of Equation 66 catch up to the savings in multiplications. Table 2 illustrates the computational breakdown for various structures using this method.

 

Table 2. Comparison of computational complexity for various extensions of Vetterli’s running convolution method. H(z) is a 32 point FIR filter.

Number
of channels
Filter
length
Decimation
factor
Delay Mults per
sample
Adds per
sample
1 32 1 0 32 31
3 16 2 1 24 24.5
9 8 4 3 18 17.5
27 4 8 7 13.5 19.6
81 2 16 15 10.1 21.3
243 1 32 31 7.6 26.4

(adapted from Vetterli 1988)

 

Multirate Convolution with Cross-Convolved Channels

Consider the multirate convolution system shown in Figure 24. A and S are the analysis and synthesis filters for x(n) and B and T are the analysis and synthesis filters for h(n). In this example, both sets of filters are defined by Equation 64.

 

Figure 24. Two-channel cross-convolution for (a) a simple delay chain filter bank and (b) the generalized case.

 

Although rather complex, this filter bank does produce the correct convolution result with a two sample delay (due to the addition of a second synthesis filter) as illustrated in Table 3. The desired impulse response must also be filtered and decimated, as shown in Figure 24.

Recall that the (M = 2) decimation operation doubles the effective width of each filtered signal’s spectrum. Filtering the impulse response ensures that the proper input and impulse response values and subsequent frequency ranges correspond to each other. Decimating the filtered impulse response ensures that the input and impulse frequency resolutions are equivalent. Also note the addition of the T synthesis filters to each band, which compensates for the "hidden" analysis filter applied to the impulse.

 

 

Table 3. Time domain signals in a perfect reconstruction filter bank with cross-convolution.

x(n) = [ 1 2 3 4 ] h(n) = [ 5 6 7 8 ] x*h = [ 5 16 34 60 61 52 32 ]

n = 00 1 2 3 4 5 6 7 8
h0(n): 5 6 7 8          
h1(n): 0 5 6 7 8 0 5 6 7 8        
f0(n): 5 7 5 7              
f1(n): 0 6 8 0 6 8            
x0(n): 1 2 3 4 1 2 3 4          
x1(n): 0 1 2 3 4 0 1 2 3 4        
d0(n): 1 3 1 3              
d1(n): 0 2 4 0 2 4            
c00(n): 5 22 21 5 22 21            
c01(n): 0 6 26 24 0 6 26 24          
c10(n): 0 10 34 28 0 10 34 28          
c11(n): 0 0 12 40 32 0 0 12 40 32        
u00(n): 5 0 22 0 21 5 0 22 0 21        
u01(n): 0 0 6 0 26 0 24 0 0 6 0 26 0 24    
u10(n): 0 0 10 0 34 0 28 0 0 10 0 34 0 28    
u11(n): 0 0 0 0 12 0 40 0 32 0 0 0 0 12 0 40 0 32
u01(n-1): 0 0 0 6 0 26 0 24 0 0 0 6 0 26 0 24  
u10(n-1): 0 0 0 10 0 34 0 28 0 0 0 10 0 34 0 28  
u00(n-2): 0 0 5 0 22 0 21 0 0 5 0 22 0 21    
y(n): 0 0 5 16 34 60 61 52 32 0 0 5 16 34 60 61 52 32

 

The cross-convolutions in each channel are necessary for alias cancellation. That is, the terms in Y(z) containing the alias components X(-z) or H(-z) cancel each other out. After some simplification, the expression for Y(z) is

(70)

Then substituting in the analysis and synthesis filters as defined in Equation 64,

(71)

Indeed, for any pair of alias canceling filter banks, A-S and B-T, Equation 70 reduces to the desired result plus the delay introduced by the filter banks themselves. Although this method does produce the desired result, its complexity is rather equivalent to the multiple iterations of the biorthonormal case. This method, however, introduces the delay associated with multiple filter banks instead of just one.

Multirate Convolution without Cross-Convolved Channels

Now consider the case where additional convolution in each band is omitted, as shown in Figure 25.

 

Figure 25. The generalized two-channel case without cross-convolution.

 

The output of this system can be expressed as

(72)

This system maintains perfect reconstruction only when the first term of the filter matrix reduces to a delay and the other three reduce to zero. Consider the choice of A and S as chosen in Equation 46, restated here.

(46)

Again, the additional delay terms are necessary only to ensure that the filters are causal. For two real-coefficient filter banks,

(73)

(74)

Substituting these filters into Equation 72,

(75)

Now the aliasing terms do not so conveniently cancel. In the case of Equation 70, the matrix terms resulting from the cross-convolved channels cancelled with the corresponding terms from the opposite channel. In this case, the lower three matrix terms define the alias contributions from X(z)H(-z), X(-z)H(z), and X(-z)H(-z), respectively. It is worth noting, however, that the transition band is the only problematic area when selective filters are used. Figure 26 illustrates the frequency responses of the filters associated with the filter matrix in Equation 72. Both filter banks, A and B are derived from the 64-tap perfect reconstruction bank from Vaidyanathan and Hoang (1988) (Filter #64D in the reference).

 

(a)

(b)

(c)

(d)

(e)

Figure 26. Frequency responses of filters specified by Equation 75, (a) original analysis bank, (b) A0(z)B0(z)S0(z)T0(z) + A1(z)B1(z)S1(z)T1(z), (c) A0(z)B0(-z)S0(z)T0(z) + A1(z)B1(-z)S1(z)T1(z), (d) A0(-z)B0(z)S0(z)T0(z) + A1(-z)B1(z)S1(z)T1(z), and (e) A0(-z)B0(-z)S0(z)T0(z) + A1(-z)B1(-z)S1(z)T1(z).

 

Similarly, if the filter banks fit the criteria set forth in Ekanayake and Premaratne (1995), where

(76)

(77)

then Equation 72 becomes,

(78)

Again, the aliasing terms do not cancel and the transition band remains the problematic area, as shown in Figure 27. Both filter banks, A and B, are derived from the optimized IIR bank of order 32 from Ekanayake and Premaratne (1995).

 

(a)

(b)

(c)

(d)

(e)

Figure 27. Frequency responses of filters specified by Equation 78, (a) original analysis bank, (b) A0(z)B0(z)A0(z)B0(z) + A0(-z)B0(-z)A0(-z)B0(-z), (c) A0(z)B0(-z)A0(z)B0(z) + A0(-z)B0(z)A0(-z)B0(-z), (d) A0(-z)B0(z)A0(z)B0(z) + A0(z)B0(-z)A0(-z)B0(-z), and (e) A0(-z)B0(-z)A0(z)B0(z) + A0(z)B0(z)A0(-z)B0(-z).

 

These methods have the disadvantage of significant aliasing in the transition band. However, such aliasing may be acceptable in some applications, especially when the impulse response or signal has little information in the transition band. Figure 27 shows the aliasing effects for a two channel convolution of two random sequences. The filter bank used is the same as that illustrated in Figure 26. Depending on signal content and length, the calculated mean square error varies significantly (from 0.02 to well over 1.0 in simulation), but Figure 28 more accurately depicts the inherent error for a median case. The error in the transition band is evident in the frequency domain plot in Figure 28c

 

(a)

(b)

(c)

Figure 28. Aliasing from two-channel convolution. (a) Expected and actual output, (b) error and expected output, and (c) frequency content of expected and actual output.

 

Listening tests were performed in order to evaluate the audibility of the transition band aliasing. Several 44.1 kHz audio signals were convolved with four measured room impulse responses. Results indicated that inexperienced listeners detected a difference between the ideal and aliased results, but had difficulty in verbally describing or quantifying the difference. Most descriptions addressed high frequency characteristics of the result. For example, that the cymbals weren’t as "crisp" or that the vocals had "less treble." That makes sense since the aliasing occurs at around 11 kHz (p/2). The least audible differences occurred when the analysis and synthesis filters were more selective (narrower transition band) and when the input material had little or no vocal or percussive (high frequency) material. A more complete description of the listening test is given in Appendix A.

In order to further characterize the aliasing error from this method, Figure 29 illustrates the output of a swept sinusoid convolved with a unit impulse. The output frequency response should be equivalent to the input as indicated by the diagonal response in Figure 29a. The two-channel case illustrated in Figures 29b,c, and d exhibits a notched frequency response right around the expected frequency of p/2. Clearly, the width of the notch depends on the filter resolution. This corroborates the earlier claim that a narrower transition band results in less aliasing. Of note are the left-to-right diagonal side lobes that appear at frequencies around p/2. Their existence implies that positive aliasing occurs when the input has no frequency content in the transition band, while the general notch in the main right-to-left diagonal implies that negative aliasing occurs when the input does have frequency content in the transition band. Thus a simple pre- or post-equalization scheme will only be partially effective. However, an adaptive equalization scheme based on a more detailed analysis of the input signals might successfully compensate for much of the aliasing present in the output.

 

(a)

(b)

(c)

(d)

Figure 29. Magnitude responses of the convolution of a swept sinusoid with a unit impulse using (a) single channel method, (b) 2-channel method with 64th order filters, (c) 2-channel method with 32nd order filters, and (d) 2-channel method with 8th order filters.

Next...Previous...TOC...Home