Missing data in an audio stream can result from a number of different errors such as packet loss during Internet streaming, transmission errors, or defective media. If the gap of missing audio is small enough, it is inaudible to the human ear. However, a gap length of a few milliseconds can be audible to the human ear. It is preferred that if the gap is audible, it should be filled with relevant information. If valid data precedes and succeeds the gap, interpolation can be used. If valid data following the gap is unknown or if the gap is too large for valid interpolation, extrapolation can be used to fill the gap.
Extrapolation of data has been investigated for many years. It has been used to enhance the resolution of images or to complete fragmented data sets ([1],[8]-[16]). Current extrapolation theories can also be successfully implemented to estimate missing audio data. The major problem, however, is the large amount of audio data samples required for even small time segments. These large data sets result in heavy computational time and memory requirements. These computational requirements often exceed the application constraints (see section 5.3). If the input data segment is divided into a group of smaller data segments, computational requirements can be greatly reduced. Reducing computational requirements decreases processing time and enables processing of large data segments, which previously exceeded application limits.
Audio signals are usually viewed in either the time domain or the frequency domain. If the signal is segmented in the frequency domain, each of these block segments has a time domain representation. Through frequency domain blocking, a set of smaller time domain signals is obtained. Each of these signals can then be processed through an extrapolation algorithm with much lower computational requirements than that of the original signal. The frequency domain views of the resulting extrapolations can then be recombined into one frequency domain view. The time domain view of this recombination is the resulting extrapolation for the original audio signal.
Minimum weighted norm extrapolation [1] is chosen as the extrapolation algorithm to be used with the frequency domain blocking method. This extrapolation method has no bandwidth limitations and utilizes the known power spectrum information to improve its resulting extrapolation. A Root Mean Squared (RMS) factor is included in the algorithm to act as an automatic gain to improve amplitude consistency in the signal. If a delay is acceptable in the chosen application, the algorithm could be implemented in pseudo-real-time.
Although frequency blocking greatly reduces computational requirements, it increases error. This increase in error can be kept to a minimum if proper dimensions are chosen. The extrapolation method is shown to work better with tonal music than non-tonal music. Non-tonal sounds, i.e. percussive sounds, are noise-like and thus do not extrapolate well through the minimum weighted norm extrapolation method (see section 5.6.1).
Chapter 2 begins by discussing digital audio theory, frequency domain representation, and properties of the audio signal. A brief look at the perception of auditory restoration is then presented. Extrapolation theory is discussed in Chapter 3 with the contribution of important algorithms. A new algorithm is presented in Chapter 4 that extends the minimum weighted norm extrapolation with frequency domain blocking. The performance of the new algorithm is then evaluated in Chapter 5, displaying its various tradeoffs. Finally, Chapter 6 presents a possible application of this algorithm within Internet audio streaming.