Chapter 5 System Performance
The proposed algorithm is tested as a system on a 200MHz Pentium processor with 32 MB of RAM. These tests should be analyzed with respect to each other. Ratios are often used in place of actual computational times due to the fact that computational time is heavily dependent on the processor on which the system is running.
An arbitrary example of the extrapolation system output is displayed in Figure 5.1. The input signal is a sum of two sinusoids.
g(t) = sin(2p 100t)+0.5sin(2p 650t)
This arbitrary example uses a known data length of 512 samples, a block size of 128 samples, and an overlap percentage of 6.25%. The real signal is shown above the extrapolated signal for comparison, and the squared error is also shown.
The best extrapolation lengths occur when the number of data points to be estimated is less than or equal to the number of known data points. Figure 5.2 shows how error decreases when the ratio between the known data segment length and the extrapolated data segment length changes using the sum of three sinusoids as the input.
g(t) = sin(2p 200t)+0.5sin(2p 300t) +0.2sin(2p 800t)

Figure 5.1 An example of the extrapolation system
To minimize the number of variables in this test, frequency domain blocking is not used. The length of the known data plus the extrapolated data is set constant at 512 samples. The ratio of known data samples to extrapolated data samples is set to vary at the following ratios: 64:448, 128:384, 192:320, and 256:256; simplified, these ratios become: 1:7, 1:3, 1:1.7, and 1:1, respectively. As the segment length of known data approaches the segment length of extrapolated (unknown) data, the error reaches a minimum. It is, therefore, optimal to limit the extrapolation length to the length of the known data segment.
An arbitrary sinusoidal test input signal is now defined. This test signal, fTEST, will be used as the input for following tests in this chapter unless otherwise noted. This signal is sampled at 8kHz with 16 bits of resolution
fTEST(t) = sin(2p 500t)+0.5sin(2p 1000t) +0.7sin(2p 2000t) +0.2sin(2p 3000t) |
( 5.1) |
The accuracy of the extrapolation increases as the number of known data points increases. Figure 5.3 displays an example of how the mean squared error of the extrapolation decreases with greater length of the input. The test signal, fTEST, is used as the known portion of the input. The input length varies in this experiment. The input length is the length of the known plus extrapolated data vectors. The extrapolation length is equal to the length of the known data vector. There is no frequency blocking used in this example in order to decrease experimental variables. Note that the x-axis is non-linear.
Computational time for the extrapolation system without frequency blocking is compared to input length in Figure 5.4. These results are the computational times of the preceding experiment. It is seen that the computational time increases with input length. Note that the x-axis is non-linear. Since computational time is heavily dependent on the computer processor, note only the relative computational times between different input lengths. Reduction in computational time can be accomplished by dividing the signal into several smaller blocks.

Figure 5.2 Known data length to unknown data length ratio vs. MSE
Figure 5.3 Input Length vs. MSE
Figure 5.4 Input Length vs. Computational Time
Time-domain windowing is not used in this extrapolation system because better results are obtained when the known time segment is kept at a maximum. Time-domain windowing reduces computation but also reduces the length and accuracy of the extrapolation segment. Since the extrapolation length should be at least the length of the known data segment and increased length reduces error, time-domain windowing is not appropriate for this application. Reduction in computation without greatly increasing error is accomplished through frequency-domain blocking.
Frequency-domain blocking reduces the computational requirements of the extrapolation system. Since an L-sample segment with b bytes per sample requires an LxL G-matrix when processed through the extrapolation system without frequency blocking, a data memory space of greater than L2b bytes needs to be available to the computer processor. This means that if a 1024-sample input segment with 2 bytes/sample (16-bit resolution) is processed through the system, a data memory space of greater than 2.1MB needs to be available. This large memory requirement slows processing time. Also, memory size limits of the computer microprocessor restricts the segment length, therefore making it impossible to extrapolate large data samples. Along with enabling large data samples to be processed, frequency-domain blocking reduces computational time. Frequency-domain blocking does include additional FFT processing, but this additional processing time is minimal compared to the overall reduction in processing time.
For analysis of the performance of frequency-domain blocking, the fTEST signal is processed through the extrapolation system. A 1024-sample input segment is first processed through the system where the first 512 points are known valid data and the second 512 points are zero-filled. Overlap percentage is set at 6.25%. These settings are arbitrary.
Computational time ratios for the extrapolation system using different block sizes for the frequency-domain blocking are shown in Figure 5.5. The system without frequency blocking (block size of 1024) is used as the reference for the ratios. As can be seen in the figure, decreasing the block size reduces the computational time. A block size of one-fourth the input size results in a 90% reduction in computational time of the original. The largest reduction in computational time occurs with a 64-sample block size and results in a 96% reduction in the computational time of the original.
The plot also shows that the block size of 32 has a greater computational time than the 64-sample block size. This increase is due to the fact that the number of blocks, K, is greater than the size of the block, B. The number of blocks is computed as follows:
|
( 5.2) |
Since K > B, both computational time and error increases. Although not shown, smaller block sizes, B < 32, result in greater computational time and error for the same reason. Therefore, the most optimal block size is the smallest block size, B, which is greater than the number of blocks, K.
The trade-off of frequency-domain blocking lies in the increase in error. Figure 5.6 displays the mean squared errors (MSE) from the previous experiment. Mean squared error is increased when frequency-domain blocking is applied to the extrapolation system. The MSE is still relatively small for some block sizes. Again, notice that the 32-sample blocking method has greater error than the 64-sample blocking method. This is due to the same reasons stated above.
The increase in error with frequency blocking can be seen in both the time and frequency domains. Figure 5.7 and Figure 5.8 display the time domain views of only the extrapolated signal compared to the real signal. Figure 5.9 and Figure 5.10 display the spectra of only the extrapolated portion of the signal compared to the real spectrum of that portion of the signal. Figure 5.7 and Figure 5.9 depict the 512 samples extrapolated from the sinusoidal test signal, fTEST, without using FDB. Figure 5.8 and Figure 5.10 depict the extrapolated samples using FDB with 128-size blocks and 6.25% overlap. The time domain plot of the extrapolation using FDB shows a decrease in amplitude and a poor ending when compared to the non-frequency blocking extrapolation. In the frequency domain the four sinusoids are detected well in both methods, but the extrapolation using the FDB process has a raised noise floor. The signal-to-noise ratio of the non-frequency blocking extrapolation is 89.5dB, whereas the signal-to-noise ratio of the frequency blocking extrapolation is 26.7dB.
Figure 5.5 Computational Time Ratios for 1024 Sample Segment
Figure 5.6 Mean Square Error Ratios of 1024 Sample Segment
The increase in error when using the FDB process is mainly due to the fact that the power density spectrum used as a weight in the extrapolation method is affected by blocking in some instances. If a large spectral power density lies between two blocks, the extrapolation in both blocks loses resolution. Wide distribution of spectral power density reduces the amount of lost extrapolation resolution. Since most music signals have a wide distribution of spectral power density, the error does not greatly increase when the FDB process is used.
Table 5.1 displays the results of processing the sinusoidal signal described above.
| Extrapolation results of fTEST(overlap = 6.25%) | ||||
| Sample size: | 512 |
|||
| #blocks | Blocksize | MSE | Time | Time ratio |
1 |
512 |
0.00014 |
4.56 |
1 |
3 |
256 |
0.09536 |
1.32 |
0.289474 |
5 |
128 |
0.19542 |
0.55 |
0.120614 |
9 |
64 |
0.14389 |
0.44 |
0.096491 |
18 |
32 |
0.13384 |
0.39 |
0.085526 |
| Sample size: | 1024 |
|||
| #blocks | Blocksize | MSE | Time | Time ratio |
1 |
1024 |
0.00003 |
20.26 |
1 |
3 |
512 |
0.08307 |
15.32 |
0.75617 |
5 |
256 |
0.16915 |
2.3 |
0.113524 |
9 |
128 |
0.0499 |
1.04 |
0.051333 |
18 |
64 |
0.05935 |
0.77 |
0.038006 |
35 |
32 |
0.11157 |
0.88 |
0.043435 |
| Sample size: | 2048 |
|||
| #blocks | Blocksize | MSE | Time | Time ratio |
1 |
2048 |
8E-06 |
295.94 |
1 |
3 |
1024 |
0.07453 |
76.84 |
0.259647 |
5 |
512 |
0.14677 |
17.36 |
0.058661 |
9 |
256 |
0.03191 |
5 |
0.016895 |
18 |
128 |
0.03066 |
2.14 |
0.007231 |
35 |
64 |
0.03304 |
1.6 |
0.005407 |
69 |
32 |
0.08262 |
2.09 |
0.007062 |
Table 5.1 Frequency Blocking of Sinusoidal Test Signal
Figure 5.7 Extrapolation without frequency blocking
Figure 5.8 Extrapolation with 128-size block and 6.25% overlap
Figure 5.9 Spectrum of real signal compared to extrapolation without blocking
Figure 5.10 Spectrum of 128-size blocks with 6.25% overlap
When the same tests are performed with a music signal rather than the sinusoidal signal, the computational times are relatively the same, but the mean squared error of the blocking systems is relatively consistent with the mean squared error of the non-blocking system. Error does not greatly fluctuate when music is used as the input due to the fact that music has a semi-flat spectral density and has a pseudo-random nature. The benefit of the semi-flat spectral density was described above. The pseudo-random nature of music is known by the fact that the progression of music can take numerous paths. An extrapolation of music based on a certain known a priori data segment is merely one of the many paths that the music stream could follow. Therefore, the error of the extrapolation based on the true signal cannot truly be analyzed. Figure 5.11 displays the mean square error of a 1024-sample music signal sampled at 8kHz.
Figure 5.11 Mean Squared Error Ratios of 1024-Sample Music Signal
The number of blocks used in the extrapolation system is determined by the application requirements. Although error increases in some forms of frequency blocking, the decreased computational time and memory makes this method desirable for some applications. Also, since FDB extrapolation used with musical inputs does not have great fluctuations in error, it becomes more admirable. Another important benefit of the FDB process is the fact that large sets of data can be processed which may be a requirement for some applications.
The percentage of overlap affects the computational time and mean squared error. A percentage of 50% would require almost twice as many blocks as 0% overlapping, whereas a percentage of 3% may only require one extra block. Figure 5.12 displays the ratios of computational time compared to 0% overlap for different overlap percentages. Figure 5.13 displays the mean squared error of each corresponding overlap percentage. These two plots are generated from the extrapolation system with fTEST used as the known portion of the input, an input size of 1024, and a block size of 128 samples.
The plot shows that small overlap percentages minimize both computational time and mean squared error. The computational time is minimized because the number of extra blocks is minimized. The mean squared error is minimized because each block contains a large number of full-resolution samples. As the overlap percentage increases, there are a greater number of samples in each block that have decreased resolution due to the cross-fade window. Small overlap percentages only have a few samples of decreased resolution.
The change in error can be seen most readily in the frequency domain. Figure 5.14 displays the extrapolated spectrum of the 1024-sample input vector using fTEST as the known portion, 128-size blocks, and zero percent overlapping. Figure 5.15 displays the same input signal and extrapolation method with a 6.25% overlap in frequency blocks. Notice that the sinusoids are much more narrowly defined in the overlapped example whereas the non-overlapped example shows wide sinusoids, or spectral leakage. The non-overlapped extrapolation has a signal-to-noise ratio of 6.9dB. whereas the overlapped extrapolation has a 26.7dB signal-to-noise ratio. There is a 33dB increase in signal-to-noise ratio when overlapping is used in this example.
Figure 5.12 Computational Time vs. Overlap Percentage
Figure 5.13 MSE vs. Overlap Percentage
Figure 5.14 Extrapolated Spectrum with 0% overlap

Figure 5.15 Extrapolated Spectrum with 6.25% overlap
The fTEST signal is extrapolated with 128-size blocks and 6.25% overlap. The result of the extrapolation without and with the RMS factor automatic gain is shown in Figure 5.16. A slight gain is noticed on the bottom plot when the automatic gain is applied. The mean squared error does not vary much whether the gain is used or not. The benefit is psychoacoustical. By keeping the gain of the extrapolation consistent with the gain of the known data, the listener may not hear irregularities in the time-domain envelope of the signal that may be noticed more than inaccurate data.
A better example is shown with a musical signal in Figure 5.17. This example shows that although the error is minimal, the amplitude of the envelope is only about one-third the amplitude of the real signals envelope. The automatic gain, in this case, greatly improves the amplitude of the extrapolated portion of the signal. The version with the consistent envelope amplitude will be more psychoacoustically acceptable.

Figure 5.16 Comparison of sinusoidal signal with/without RMS factor

Figure 5.17 Music signal with/without RMS factor
The fTEST signal is now pulsed every 100 samples.

This signal to used to test the dynamics of the extrapolation system. Music can be represented as a sum of sinusoids. A pulsed sinusoid could represent strong dynamics in a musical signal. This signal, fPULSE, is processed through the extrapolation system with 128-size blocks, 6.25% overlap, and a known length of 512 samples. The input and extrapolation are plotted in Figure 5.18. The first 512 samples are known and the second 512 samples are extrapolated. The pulses extrapolate fairly well, but the intermediate silences are extrapolated as weaker powered sinusoids. Although not attempted, this type of extrapolation may be improved by increasing the emphasis of the power density spectrum. This added emphasis could increase the extrapolation amplitude during pulse intervals and decrease the extrapolation amplitude during silent intervals based on the spectral power density in the corresponding intervals.
Figure 5.19 shows a sinusoidal signal with an additional
sinusoid appearing later in time. This is useful to test because a musical signal could be
thought of as a sinusoidal signal with additional sinusoids being added and subtracted as
time progresses. The signal originates as a 100Hz and a 600Hz sinusoidal signal sampled at
8kHz. A 500Hz sinusoid is added at sample number 256 (32ms later in time). The added
sinusoid lasts 512 samples (64 ms). 
The input, true signal and extrapolation are shown in Figure 5.19. The 1024-sample input is processed through the extrapolation system with a block size of 256 and a 3.125% overlap. The first 512 samples are known data, and the second 512 samples are extrapolated data. The extrapolation estimates the primary sinusoids fairly well. The third additive sinusoid is seen to dissipate slightly before the true signal dissipates that sinusoid.

Figure 5.18 Pulsed sinusoidal signal through extrapolation system

Figure 5.19 Sinusoidal example with an additional sinusoid appearing later in time
An example of a pulsed white noise burst processed through the extrapolation system is displayed in Figure 5.20. This could be a rhythmic cymbal crash or drum hit. The extrapolation system spreads the pulsed noise throughout its estimation of the time-domain data. When compared with the tonal equivalent example, Figure 5.18, this non-tonal example is not estimated well through the minimum weighted norm extrapolation method. As stated in section 5.5, this may be improved if a greater emphasis was placed on the power density spectrum within the extrapolation method.

Figure 5.20 Example of extrapolated pulsed noise bursts
Another issue regarding the extrapolation method is its tendency to produce mirror-like images with some inputs. This mirror imaging is due to the circular autocorrelation used to weight the extrapolation. Figure 5.21 displays a sinusoidal input multiplied by a ramp function and its extrapolation.
fRAMP(t) = t [sin(2p 100t) + 0.5sin(2p 600t)]
The extrapolation is seen to mirror the input in amplitude. This mirroring effect also occurs when frequencies are added to parts of the input signal. The extrapolation presents these added frequencies in a mirror-like fashion.

Figure 5.21 Example of mirror-like tendency of extrapolation method
The extrapolation method also presents problems with chirp signals. When using large input segment lengths, the large continuous frequency change can not be well represented in a single power spectrum. In order to extrapolate chirp signals, short time domain segments must be used. The use of short time domain signals, however, restricts the length of valid extrapolations.
Since the extrapolation method is simply a "black box" in the over-all system, an improved extrapolation method can be put in place of the minimum weighted-norm extrapolation method. The issues noted could be resolved by simply using an improved extrapolation method. Each extrapolation method has its benefits and issues, each application must weigh these pros and cons to determine the most applicable method. Minimum weighted-norm extrapolation is used in this extrapolation system due to its lack of bandwidth requirements and use of a power spectrum weight.
An alternative method of reducing the computational requirements of the extrapolation was attempted. This alternative method consisted of down-sampling the input signal by n, processing the down-sampled version, and then up-sampling the resulting extrapolation.
|
Although this method reduced the input vector from length N to length N/n, it was found to be inferior to the frequency blocking. Down-sampling, taking every n sample, reduces the bandwidth from s to s /n. The extrapolation method could therefore only extrapolate frequencies in this reduced bandwidth. Cubic interpolation was then used between samples to up-sample the resulting extrapolation. Since spectral resolution is diminished through down-sampling, this method is deemed inappropriate.
As an improvement to the automatic gain block, envelope estimation was attempted. Since the RMS factor is created based on the average amplitude of both the known and extrapolated portions of the signal, a more advanced method was tested. This other method consisted of obtaining five uniformly-spaced RMS values in the known data vector. These RMS values were then used to linearly extrapolate five uniformly-spaced RMS values in the extrapolated data vector.
( 5.3)
This method seemed to improve some extrapolations and worsen others. Since music is unpredictable, the success of the RMS estimation is also unpredictable. Better estimation algorithms could improve the RMS estimation but would add to the computation. Since the main contribution of this algorithm is its reduction in computation, including complex RMS estimators would only oppose the intent of the algorithm. The RMS factor is chosen as the method of automatic gain control due to the fact that it performed as well as the linear RMS extrapolator and required very little computation.
Since this algorithm favors reduction in computation over precision, it could be used for applications that aim to give an average listener continuous music even in the event of a dropout or empty audio data segments. Internet audio streaming is a good example of this type of application. This application is described in the following chapter. Other possible applications include minor repair to audio media with large temporal defects and audio on cellular telephones. Cellular telephones often have dropouts in audio due to interference or bad reception. This algorithm could be implemented to cure some of these dropouts. It should be noted that since clarity is important in verbal communication, this algorithm might not be sufficient unless only used for small dropouts. Also, since speech is abrupt in nature and filled with silent intervals, this algorithm is more suitable for music applications.