4. Hybrid Model Design
The purpose of this algorithm is to produce realistic artificial reverberation without the burden of impractical processing. However, impractical design of the building blocks can become the weakness. The success of this design relies on the practical implementation of each of the components:
4.1 Truncation
Both methods of impulse response truncation were implemented (section 3.2.1).
No significant output difference was noticed, however processor performance was
eased greatly when using the time-based method. This allowed for the static
truncation time of 150ms (6615 samples) and the use of 8192 point FFTs for the
block convolution stage. Because of this modest FFT demand, the processor (500
MHz Intel Celeron) was able to execute the convolution portion in less than one
sample period and no additional latency was experienced.
Figure 4.1 shows the waveforms of the two full and truncated impulse responses that are used in the testing section (Chapter 5). These signals are in the WAV format and are designated as IR1.wav and IR2.wav.

Figure 4.1:
Original and truncated impulse responses
4.2 Convolution
The direct FIR implementation is not a
practical realization for this reverberation process. Typical room impulse
responses are too lengthy and sampled too fast for current processors to execute
in a linear convolution. When this hurdle will not permit real-time
functionality, quicker, low-latency alternatives are readily available. The
truncated impulse response will generally be greater than 100 ms (4410 samples
at fs of 44.1 kHz). The Overlap-Add method will introduce an initial
input latency when gathering samples, but a quasi real-time process can be
realized if the block convolution can be performed in less than one sampling
period.
The convolution stage is used to implement the early reflection portion of the reverberation effect. By using the Overlap-Add method for the convolution stage, the early reflection portion of the hybrid algorithm will be exactly the same as that of the linear convolution. In Figures 4.2 and 4.3 a simple percussion signal is convolved with a full and truncated impulse response and the "residue" is the difference between these two outputs. With the early reflection portions of the residue plots equal to zero, it is evident that the truncated and full convolution processes are identical in this region.

Figure 4.2: Full and truncated
impulse response convolution (using IR1.wav)

Figure 4.3: Full and truncated
impulse response convolution (using IR2.wav)
4.3 Windowing
The generally exponential decay of the
impulse response describes its envelope only. The true decay is extremely jagged
and random. Thus, ensuring a smooth truncation with a windowing function has
little effect on these types of signals. In fact, the effect of any windowing
only served to add to the computational expense with negligible effect on the
high frequency content of the truncated response. Of much greater importance is
the low frequency attenuation caused by this premature truncation.
4.4 Reverberation Tail
The Moorer diffuse reverberator exhibits
the time and frequency density requirements of a proper room reverberation and
can be used to simulate different sized rooms simply by adjusting the feedback
control and relative gain of the filter network. In addition, the Moorer
topology includes feedback low pass filters to account for the air and material
absorption of the room. While prior research has found that a first order
low-pass filter with fc set to 12kHz is suitable for modeling a
general room [1], it is not sufficient for modeling a specific room. Impulse
responses can be measured in rooms with drastically different absorptive
materials and humidity conditions. Therefore, using a fixed cutoff frequency
does not apply. The cutoff frequency of the low-pass filters was estimated
through listening and visual inspection of the frequency rolloff.
4.5 Equalization
As stated in section 3.2.1, an EQ
component is needed to compensate for the low frequency attenuation of the
truncation process. Figure 4.4 depicts the frequency response of a full impulse
response and its 150 ms truncated counterpart.

Figure 4.4: Full and truncated frequency response of
Bergamo Cathedral, Italy
The low frequency attenuation is as much as 15 dB at some frequencies, and must be corrected to achieve transparency. This compensation can come experimentally by employing common low frequency boosting filters, or by an adaptive means derived from the spectrum of the entire impulse response.
The EQ stage should be equal to the ratio of the full impulse response spectrum to the truncated impulse response spectrum so that the effects of the truncation cancel.

In practice, the lengths of each spectrum must be equal to maintain accurate frequency equalization. The spectrums X(z), Y(z), and H(z) all have length 8192 (213) samples. For memory preservation the full impulse response is limited to 5.9 seconds (218 samples). To maintain frequency domain accuracy, the original impulse response FFT will also be 218 samples long. A simple decimation algorithm is employed to reduce this spectrum to the 8192 length by keeping every 32nd frequency bin sample and discarding the remaining. While this will reduce the frequency resolution of the signal from 0.17 Hz/bin to 5.38 Hz/bin, the envelope of the response will remain intact and can be used to calculate the EQ stage.
This decimation eliminates the need for an additional filter stage:

Figure 4.5 shows the truncated frequency response of the Bergamo Cathedral and the FFT decimated frequency response (both of length 8192):

Figure 4.5: Truncated and FFT
decimated frequency response of Bergamo Cathedral, Italy
Decimating the FFT, in effect, smoothes the frequency response of the room. The sharp transitions between frequency bins of the full FFT are not maintained in the decimated version. However, the same amplitude envelope will remain. It can be seen from Figure 4.5 that the decimated FFT frequency response has significant low frequency boost while maintaining the high frequency.
Decimating in this way does not preserve an impulse response that is 150 ms (6615 samples) long. The decimated frequency response is 8192 points and to preserve time domain accuracy, the impulse response is also 8192 samples long. This prevents appropriate expansion in the overlap-add block convolution process discussed in section 2.1.2.3. Therefore, the implementation used was a more simple 1st order shelving filter that provides a low frequency boost while maintaining the high frequencies. The specific parameters of the filter must be selected manually for each impulse response, recognizing that some may not require one at all. Using the decimated approach would require a significant initial process (including a 218 point FFT, and storage of full and decimated buffers) that the shelving filter approach does not.