Chapter 4 – A Proposed Extrapolation Algorithm Using Frequency Domain Blocking
Since digital audio is very data-intensive, an audio algorithm must deal with this important characteristic. The proposed algorithm confronts this issue by segmenting the time-domain input into smaller equivalent time-domain blocks. The time-domain input is segmented in the frequency domain and each segmented block is brought back into the time domain. Each time-domain block is then processed through the extrapolation method. Minimum weighted norm extrapolation is used in this algorithm due to its lack of bandwidth limitations and its use of power spectral information to improve its output. To improve the algorithm’s envelope estimation and signal’s amplitude consistency, an automatic gain, Root Mean Squared (RMS) factor, is included after extrapolation. After extrapolation and gain compensation, the blocks are recombined in the frequency domain. This spectral recombination is brought into the time domain to form the main output of the algorithm. A block diagram of the algorithm is shown in Figure 4.1.
Figure 4.1 Block Diagram of Algorithm
Frequency Domain Blocking, FDB, is a type of digital filterbank analysis. The frequency domain representation of the signal is realized through the Fourier Transform. A Fast Fourier Transform (FFT) is used to reduce computation. Since the FFT produces a conjugate mirror image after the Nyquist frequency, the blocking method must use two blocks within the frequency representation. The two blocks are equidistant from the Nyquist frequency, and they are reverse complex conjugates of each other. These two blocks are combined, and the inverse FFT of this combination produces a small time-domain block representation. The complete frequency blocking method involves blocking the entire spectrum.
If the input vector is N samples long, an N-point FFT is used to determine its spectrum. If the block size is B samples long, the first ^{B}/_{2} samples of the block are ^{B}/_{2} points of the input vector spectrum before the Nyquist frequency, and the last ^{B}/_{2} samples of the block is the conjugate mirror image of the first half. A B-point IFFT is then used to bring the block spectrum into the time domain. This is repeated for the entire input spectrum. One N-sample time-domain input vector, therefore, becomes ^{N}/_{B} B-sample time-domain vectors through the FDB method. The number of blocks is increased when overlapping is included, as discussed in section 4.2.2. Block size is evaluated in section 5.3.1.
An example of frequency domain blocking is illustrated in Figure 4.2. This example uses block sizes of B = ^{N}/_{4}. Therefore, four time-domain signals are outputted each with a quarter of the length of the original input signal.
Figure 4.2 An example of Frequency Domain Blocking
The FDB process is written in equation form in equation ( 4.1). The zero-padded input vector, h, is block-divided in the frequency domain. The FDB process outputs K time-domain blocks, hBlock_{x}.
The inverse of this process is simply the reverse, a digital filterbank synthesis. Each time-domain block is Fourier transformed, and its resulting spectrum is replaced in its original position in the main spectrum. The main spectrum is then inverse Fourier transformed to obtain the main time domain signal. An example of the Inverse Frequency Domain Blocking, IFDB, process is illustrated in Figure 4.3.
Figure 4.3 An example of Inverse Frequency Domain Blocking
The IFDB process is written in equation form in equation ( 4.2). Each block vector, fBlockA_{x}, is placed in its original position in the main spectrum. The IFDB process outputs the complete N-sample time-domain vector, fEst.
If a signal is simply passed through the FDB process and then through the IFDB process, the output is identical to the input. However, in this algorithm after the FDB process, each block is individually processed through an extrapolation system. This manipulation may lead to discontinuities between blocks when the blocks are placed in their original position in the main spectrum during the IFDB process. An overlapping of the blocks is therefore included in the algorithm to improve spectral continuity during the IFDB process.
During the FDB process, each frequency block is overlapped by a certain percentage. The last block is zero-padded to fill its block. During the IFDB process, each block is multiplied by a cross-fade window. The cross-fade window, CFW_{x}(n), is a tapered rectangular window where the overlapping region is half-triangular, thus emulating a linear cross-fade.
Figure 4.4 Cross-Fade Windowing
where
The cross-fade window shown in equation ( 4.3) is used for all the blocks except the first and last blocks. The first cross-fade window, CFW_{1}(n), and the last cross-fade window, CFW_{K}(n), only have tapering on one end.
( 4.4) |
||
( 4.5) |
Since the second half of each frequency block is the conjugate mirror image of the first half, only the first half of each block needs to be windowed. Therefore, the cross-fade window is only half the length of the block. After windowing, the blocks are placed in their original positions in the first half of the main spectrum. Regions of overlap in overlapping blocks are summed during this positioning. The second half of the main spectrum is then filled in with the conjugate mirror image of the first half.
The overlap percentage is usually chosen to be small. A large overlap percentage increases the number of blocks and thus increasing computational time. A small overlap percentage also enables each block to contain a larger range of full resolution. The overlap percentage is analyzed in section 5.3.2.
For computational reasons the input time domain signal is set to be the same size as the number of points in the FFT. The length of the input signal includes the length of the known data segment and the extrapolation. Therefore, the input is the known time segment zero-padded to the length of the extrapolation. The output, equivalently, is the same size as the input. This length can be chosen to be a power of two for the best resolution of the FFT.
The size of each block must then be chosen. For FFT computational reasons this size is also chosen to be a power of two. As the blocks shrink in size, their computational requirements also diminish. The percentage of overlap must then be chosen.
A summary of frequency domain blocking and its inverse are now presented:
Predefined constants:
L = Length of known data vector, g
B = Length of block, hBlock & fBlockA
Overlap = percentage of block overlap
Computed constants:
N = 2L = Length of known data zero-padded to extrapolation length, h
= Length of input vector, h, and output vector, fEst
Frequency Domain Blocking:
1. FFT the input time signal.
Fh = FFT(h), using an N-point FFT
2. Block-divide the input spectrum with decided percentage of overlap, zero-padding the last block (if only partially filled); IFFT each spectrum block.
Inverse Frequency Domain Blocking:
1. FFT each time block; multiply each block with a cross-fade window; replace blocks into their original positions in the main spectrum, summing each overlapping block segment.
2. IFFT the main spectrum.
fEst = IFFT(FfEst), using an N-point IFFT
3. Replace known data vector, g, in first half of output vector, fEst.
Minimum weighted norm extrapolation is used as the extrapolation method in this algorithm. It is used because it does not have bandwidth limitations and because it uses knowledge of the power density spectrum to weight the energy of the resulting extrapolation. Each time block is treated as an individual extrapolation problem. The power density spectrum is determined for each time block and used as the weight to extrapolate that block.
Regularization is included in the extrapolation method. The regularization parameter provides a tradeoff between accuracy to known data and numerical stability. The regularization parameter is chosen to be small so as to increase stability without varying too much from the known data. A good regularization parameter is f(0)x10^{-8}, where f(n) is the autocorrelation sequence and f(0) is the average power in the signal.
The implementation of the extrapolation method is as follows.
3. Compute the G matrix:
4. Compute vector of constants, a:
a = (G + r I)^{-1} gBlock
where r = f (0)x10^{-8} ® Regularization parameter
and I is the identity matrix
5. Determine the extrapolation of the time block:
.
The extrapolated portion of the signal often has lower amplitude than the known portion. This decrease in amplitude relates to envelop amplitude shrinkage. Even if the extrapolation is a very good approximation of the signal, an unexpected decrease in envelope amplitude could be audibly detected. An automatic gain is then desirable to keep the envelope amplitude consistent. Since the human ear is prone to notice unexpected changes, such as abrupt envelope alterations, an automatic gain will also work in favor of covering some extrapolation error. Since music is pseudo-random in nature, changes in music are more acceptable to the ear than abrupt envelope changes.
The envelope estimation of the algorithm is improved with a Root Mean Square (RMS) factor. The RMS conveyss the average length of a set of random numbers. For discrete data, RMS is defined as follows:
( 4.6)
If the RMS value is continuously taken over an interval, it can describe the envelope of that interval. If one RMS value is taken over an interval, it describes the average magnitude of that interval.
A quick improvement to the extrapolation can be obtained by calculating the RMS value of the estimated data set and comparing it with the calculated RMS value of the known data set. This comparison is combined into an RMS factor which is applied to the estimated data set. The RMS factor is
( 4.7) |
Figure 4.5 displays an example of using the RMS factor as an automatic gain. Figure 4.5(a) shows a signal before the automatic gain is applied, and Figure 4.5(b) shows the signal after the automatic gain is applied.
Figure 4.5 Example of automatic gain: (a) signal before automatic gain, (b) signal after automatic gain (second half of signal is multiplied by RMS_{factor})
This chapter introduced a minimum weighted norm extrapolation algorithm that uses frequency-domain blocking and an automatic gain. Frequency-domain blocking reduces the computational requirements of the extrapolation system and enables the processing of large input vectors. An RMS factor is used as an automatic gain, which maintains the consistency of the envelope amplitude. The following chapter analyzes different aspects of the extrapolation algorithm including block size, overlap percentage and the RMS factor.