5  IMPLEMENTATION AND RESULTS

The general time-invariant FDN we used as a starting point for our experiments was described in detail in chapter 1 and is shown in Fig. 3.8.  Before adding any modulation to the network, we had to find some basic parameters such as a feedback matrix, the delay lines’ lengths, coefficients of vectors b and c, and absorption and the transfer function of the tone correction filter.  The selection procedure we used to find these parameters will be detailed in section 5.1.

Once we had a time-invariant algorithm with sufficient echo and frequency density to simulate a good sounding room, we added modulation to the network’s delay lines and investigated the effect of each component participating in the modulation process.  All our tests were performed with a variety of input signals, including dry recordings of piano, saxophone, and impulse responses.  Section 5.2 will describe the effect of a change in the modulation source rate and depth.  Section 5.3 will then evaluate different interpolation types, using the best sounding modulation rate and depth determined in section 5.2.

We will then analyze each modulation and interpolation method’s computation requirement, and propose some algorithm optimizations to reduce the overall processing requirements.  The chapter will conclude with the presentation of the results of a listening test evaluating the proposed algorithm.

5.1         Choice of the general FDN parameters

The choice of the feedback matrix is important in order to get a good echo density, while keeping the computations to a minimum.  The most common lossless feedback matrices were presented in section 3.3.3.  We saw that some specific matrices, such as the identity matrix and triangular matrices, lead to the well known parallel comb filter and series all-pass filters, respectively.  However, as we saw previously, these matrices do not provide a sufficient level of echo density (using a reasonable number of delay lines), because they contain many null coefficients.

Rocchesso and Smith’s circulant matrices and Jot’s Householder matrices described previously can both provide a maximum echo density while enabling relatively fast matrix multiplications.  We chose to use Jot’s class of Housholder matrix (defined in ( 3.25 ), where J is a circular permutation matrix) in our tests because it was the one that allowed the most efficient matrix multiplication.

We used the unit vector b = [1 1 … 1]T as the injection matrix to obtain the maximum echo density.  We generated a stereo output by using c as an N x 2 matrix as suggested in section 3.3.1, instead of an N x 1 vector:
 
 
( 5.1 )

We set up the algorithm so that the first column of c corresponds to the left channel output of the reverberation algorithm and the second corresponds to the right channel.  The first column was chosen to avoid clicks in the reverberation tail (as explained in section 3.3.3).  Since listening tests [15] [39] suggested that improved spatial impression can be obtained by decorrelating the reverberation presented to the listeners’ ears, the second column of c (corresponding to the right channel output) was chosen to be as different as possible from the first one.  This will produce two outputs that will be perceived as being uncorrelated, thus improving the spatial image of the reverberation.

The common way is to choose the delay lengths to be incommensurate to avoid superposition of peaks in the time and frequency responses, as mentioned in section 3.2.  We found that prime numbers helped in selecting delay line lengths that did not overlap in the time domain.  To avoid superposition of peaks in the frequency domain, we plotted the frequency response of the reverberation using different prime delay line lengths combinations and picked the set of delay length that produced the flatter frequency response.

At least 12 delay lines were needed to obtain the echo and time density for a reverberation of two seconds using a small amount of memory:
 

( 5.2 )

Note that these delay lengths lead to a frequency density Df = 0.26 and a memory consumption of less than 12k words (about one quarter of a second on our 44.1 kHz sampling rate system).  To put this memory consumption in perspective, a complete reverberation algorithm on a Sony PlayStation I and II can consume up to 100 kBytes.

By using modulation, we would like to achieve the same perceived reverberation quality as this 12 delay line system, using only 8 delay lines selected as:
 
 
( 5.3 )

This system has a frequency density Df  = 0.16 and uses a total memory of less than 7k words (less than 160 ms).  This represents about 60% of the memory used by the 12 delay line algorithm.

Selecting the delay lines’ length was an important factor in the overall sound of the reverberation algorithm.  According to ( 3.5 ), the total amount of delay in the algorithm corresponds to the frequency density of the algorithm.  Section 2.4 also mentioned that the reverberation of a medium sized hall with an RT of 1.8 seconds has a frequency response consisting of peaks spaced by an average of 2.2 Hz over 20 Hz.  This leads to a frequency density Df  = 0.45 and would thus require a total delay length of 450 ms to simulate artificially (about 20k words at a sampling rate of 44.1 kHz).  However, by selecting the delay lengths judiciously, we were able to achieve a time-invariant reverberation algorithm with RT = 2 seconds that had a limited amount of resonant frequencies with almost half of this frequency density (Df  = 0.26).  Our objective is to achieve the same perceived frequency density using a time-variant reverberation with a modal density of only 0.16 modes per Hertz.

       We already described in section 3.2.4 the absorption filters and tone correction filters used in section 3.2.4.  During the development process, we used a reverberation time of 3 seconds for the low frequency and 1.25 seconds for the high frequency, as these values are close to the average reverberation times of good sounding halls.  The tone correction filter was set to correct the effects of the absorption filters using the method discussed in section 3.2.4.  The EDR of the non-modulated algorithm with 12 delay lines is shown in Fig. 5.1.

Fig. 5.1.  Energy decay relief of a time invariant FDN with 12 delay lines.


5.2         Choice of Modulation

5.2.1        Modulation Rate and Depth

To clearly see the effect of different modulation rates and depth, we first tried a simple network consisting of only one delay line.  We used a sinusoidal tone generator as the modulation source, combined with high-quality interpolation.  We tested nominal delay lengths of 600, 1000, and 1500 samples.

The efficiency of modulation to avoid build-up of resonant frequency is proportional to the modulation depth.  However, the depth of modulation is also directly related to the amount of pitch change in the reverberation.  It is thus important to select the modulation depth carefully to avoid chorusing effects or vibrato in the reverberation tail.  We found that a modulation depth of 8 or more samples resulted in a perceivable pitch change on input signals that had sustained tones such as saxophone or piano.    On the other hand, we found that a modulation depth of 4 samples did not help to break the resonances in the reverberation tail.  We thus chose a modulation depth of 6 samples in our final implementation since it is a good compromise between modulation efficiency and perceivable pitch change.

The optimum modulation rate was more difficult to determine, since every delay length seemed to have a specific modulation frequency that worked better than the others.  The only common rule was that a modulation of less than 0.5 Hz resulted in almost no improvement to the equivalent unmodulated output.  On the other hand, modulation rates of more than 2 Hz could be perceived quite easily on some combinations of delay lines and input signals.

We then tried a complete FDN with 8 delay lines to see if these hypotheses would still hold.  We found that when we combined all modulators in phase, we were able to hear a clear pitch change, even with moderate modulation depths and rates.  However, with a phase difference of 45° or 90° between each modulator, the pitch modulation was more confused, and there was no noticeable change in the overall pitch.  We found that even with a modulation depth of 6 samples, there was no apparent pitch change.

5.2.2        Modulation Type

All the previous tests where done with sinusoids, however pseudo-random sequences where also used as modulators.  We found that we were able to get a smoother reverberation tail, with no specific pitch change, when using a different seed in the modulator of each delay line.  However, even with different seeds, there is always a risk that all modulators move in the same direction at the same time.  This punctual behavior creates a distinct pitch change in the output.  There is also a risk of having low amplitude output from all modulators during a certain period of time.  This would cause the modulation to be less efficient.  Using sinusoidal modulators solves both problems since all the modulators are always offset by the same phase amount.  It also creates a more consistent result in reducing resonant frequencies since the amplitudes of the modulators are constant.

5.3         Choice of Interpolation Type

Using a sinusoidal modulator with a rate of 2 Hz and a depth of 6 samples, we then tried different interpolation types.  We saw in the previous chapter that Lagrange interpolation (an FIR design) caused the signal’s high frequencies to be attenuated.  Clearly, for order N = 1 (linear interpolation) and N = 2, this roll-off is not acceptable since it is too pronounced for a natural sounding reverberation.  A fourth order or higher Lagrange interpolation is required to obtain an almost flat perceived frequency response.

Luckily, an IIR all-pass design, such as the maximally flat group delay interpolation, enables us to achieve an all-pass response with a first-order design.  However, this design may cause some distortion due to transients arising from the time-varying filter coefficients.  However, this distortion is low enough to be acceptable in most cases.  We tried both the original design (equation ( 4.28 )), and the approximation of the original design (equation ( 4.29 )).  In all the inputs we tried, we were not able to hear any difference between the two versions of the all-pass interpolation algorithm.

5.4         Implementation Details

This section will look at some of the interesting optimizations and implementation methods that were used to make the overall reverberation algorithms more efficient.  The algorithm was coded as a 32-bit application in C++, designed for the Intel Pentium processor family.  However, we should note that a commercial application for this platform would typically require further optimization using assembly language.

5.4.1        Sinusoidal Tone Generator Implementation

The sinusoidal algorithm we used (described in section 4.1.1) consists of only one multiplication and one addition.  Fig. 5.2 shows the implementation of this algorithm.  Note that , where fosc is the tone frequency and fs is the sampling frequency.

Fig. 5.2.  Implementation of a sinusoidal tone generator.
It is important to note that this algorithm is not stable for all frequencies, even with double precision, due to round-off errors.  For example, coefficient a must be stored in memory using a finite word length.  For low frequencies, the value of the cosine used to compute coefficient a is closed to 1.  For example, if fosc = 2 Hertz and fs = 44.1 kHz, we have .  The equivalent binary representation of coefficient “a” is then: 1.1111 1111 1111 1111 1111 1110 1010 0010.  Thus, a processor using 16 or 24-bit coefficients would not be able to store this value without introducing a large error in the oscillation frequency.  Since obtaining the exact frequency is not critical in our application, this error may be acceptable.  There would also be large errors in the storage of intermediate values such as buffer1 and buffer2.  Unlike the error on coefficient a, these errors would accumulate every period of the oscillator.  In practice, these round-off errors can cause the system to be unstable for some frequencies (even with 32 or 64-bit word lengths).

It is thus desirable to ensure that the system will be stable for any frequency.  Also, the output of the system has to stay in the interval [-1.0, 1.0].  The easiest way to accomplish these requirements is to reset the algorithm when the output goes above 1.0 or below -1.0 as shown in Fig. 5.3, where , fosc is the tone frequency, and fs is the sampling frequency.

Fig. 5.3.  Implementation of a sinusoidal tone generator with overflow prevention.
This modification to the original algorithm causes some abrupt, but slight changes in the frequency of the tone when the stabilization mechanism has to reset the algorithm.  This causes a change in the modulation rate.  However, this has no effect since it is too small to be noticed when used in the context of our application.

5.4.2        Filtered Random Number Generator Implementation

The RNG algorithm itself consists of 2 multiplications and 2 additions.  Note that the modulo operation is done automatically by overflowing the processor’s 32-bit register.  The resulting algorithm is shown in Fig. 5.4.

Fig. 5.4.  Implementation of a linear RNG.
As mentioned in the previous chapter, this sequence must be filtered by a low-pass filter.  The filter we used, described in section 4.1.2, was implemented with two stages of second order biquad filters.  Each biquad section can be implemented in a direct form I implementation, as shown in Fig. 5.5.

Fig. 5.5.  Block diagram of the direct form 1 implementation of a biquad filter.
However, in our specific filter, coefficients b0 and b2 were the same so we were able to optimize the algorithm.  The final implementation (shown in Fig. 5.6) requires only 4 multiplication and 4 additions:

Fig. 5.6.  Modified direct form 1 a) block diagram and b) implementation of a biquad filter (where k = b0 = b2, and b1= b1/k).
5.4.3        Interpolation Implementation

As we saw in section 4.2.2, Lagrange coefficients can be computed using ( 4.26 ), where the denominator factors are pre-stored.  This requires a number of multiplications proportional to (N+1)2, where N is the filter order.  However, Murphy [28] proposed a way to compute the coefficients that requires only 4(N+1)-8 multiplications, saving about 50% of the computation time for a 4th order filter.  Using this method, the numerator factors are computed by first calculating the partial products:
 
 
( 5.4 )

where ppf0…N-1and ppr0…N-1 represents the forward and reverse partial products that have to be temporarily stored.  These are then combined to give the numerators of ( 4.26 ) using:
 
 
( 5.5 )

One way to optimize any interpolation process is to update the interpolation coefficients every Nth samples.  For example, we can set the modulation source frequency to 0.2 Hz, instead of 2 Hz.  We can then take its output at 1/10th of the sampling rate to compute the interpolation coefficients once every 10 audio samples.  We found that using this method to update the coefficients every 25 samples or less does not result in any audible distortion.  Updating them at a rate of 1 update every 50 samples creates a very subtle noise during the reverberation decay.  A rate higher than 75 adds audible noise during decay.

5.5         Computation Requirements

The algorithms were executed using floating-point precision on an Intel Pentium II 266MHz processor, with 64 MBytes of RAM.  Although every algorithm module such as delay lines and modulators were C++ objects, all critical functions small enough were computed inline for ease of design.  Therefore, there was basically no overhead associated with function calls.  All compiler optimizations were set towards maximum execution speed.  The reverberation algorithm processes blocks of 5512 audio samples and an average processing time of more than 200 blocks was used in the following tables as the algorithm’s average processing time.  We should also note that the algorithm always includes a constant pre-delay, but no early reflections and no tone-correction filter.  In a real-world situation, we would add those modules to the algorithm and this would increase the total computation time averages of the algorithm by a constant factor.

Table I shows the average processing time of the original algorithm (without modulation), using 8, 12, and 16 delay lines. We can see that the total computation time of the algorithm is roughly proportional to the number of delay lines, which is about 0.24  per delay line.

TABLE  I – PROCESSING TIMES OF THE FILTER WITHOUT MODULATION
 Algorithm Description
Total Processing Time
 N = 16, with no modulation
3.7
 N = 12, with no modulation
2.8 
 N = 8, with no modulation
1.9 

Table II shows computation time of the same basic algorithm, with 8 modulated delay lines.  As we can see, the required processing time of the entire algorithm has increased dramatically compared to the original non-modulated algorithm.  We also computed the modulator’s algorithm processing time by themselves (shown in the second column).  We found out that the sinusoidal tone modulator and RNG modulators took about 30% and 50% of the added computation time while the interpolation algorithms took the remaining 70% and 50%, respectively.

TABLE  II – PROCESSING TIMES OF DIFFERENT MODULATOR TYPES
 Algorithm Description
Modulators Processing Time
Total Processing Time*
 Sinusoidal tone generator
8 x 0.18  = 1.4 
6.9
 Filtered RNG sequence
8 x 0.43  = 3.4 
8.7 
 * with N = 8 and approximated 1st order all-pass interpolation.

Since we found in section 5.2.2 that the sinusoidal tone modulator gave more consistent results (and was more efficient) than the RNG modulator, we decided to keep it in our final implementation.

Table III shows computation time of different interpolation methods, using the sinusoidal tone modulator.  We selected the approximated 1st order all-pass IIR interpolation, because it sounds almost as good as the 4th order Lagrange FIR interpolation and requires about 55% of the computation needed for the latter.

TABLE  III – PROCESSING TIMES OF DIFFERENT INTERPOLATION TYPES
 Algorithm Description
Interpolation Time
Total Processing Time*
 1st order Lagrange FIR
4.0 
7.3 
 2nd order Lagrange FIR
5.0 
8.3 
 3rd order Lagrange FIR
5.4 
8.7 
 4th order Lagrange FIR
6.5 
9.8 
 1st order All-Pass IIR (Approx.)
3.6 
6.9
 1st order All-Pass IIR
5.0 
8.3 
 * with N = 8 and a  sinusoidal tone modulator.

One way to optimize the interpolation process is to update the interpolation coefficients only every Nth samples.  Table IV shows the computation time of the algorithm using approximated 1st order all-pass IIR interpolation, where the coefficients are updated every 1, 10, 50, and 100 samples.  We found that updating the interpolation coefficients every 50 samples was a good compromise between the resulting added noise and the required computation efficiency.

TABLE  IV – PROCESSING TIMES WITH DIFFERENT COEFFICIENT UPDATE RATES
 Coefficients Update Rate
Total Processing Time*
 Every sample
6.9 
 Every 10 samples
4.1 
 Every 50 samples
3.7 
 Every 100 samples
3.6 
 * with N = 8, a sinusoidal tone modulator, and approximated 1st order all-pass interpolation.

Finally, we can make a compromise in sound quality to further reduce the algorithm’s total processing time by modulating only few delay lines of the complete network.  However, the benefits of the modulation will be decreased almost proportionally.  Table V shows the total processing time of the algorithm using 8, 4, and 0 modulated delay lines, out of 8 delay lines.

TABLE  V – PROCESSING TIMES WITH A VARIABLE RATIO OF MODULATED DELAY LINES
 Number of modulated delay lines
Total Processing Time*
 8 modulated delay lines out of 8
3.7 
 4 modulated delay lines out of 8
2.7 
 0 modulated delay lines out of 8
1.9 
 * with N = 8, a sinusoidal tone modulator, and approximated 1st order all-pass interpolation.

It is interesting to note that the processing time of the algorithm where all 8 delay lines are modulated is equivalent to the processing time of an algorithm that would use 16 non-modulated delay lines.  However, the important result is that the required computation time of the algorithm where 4 out of 8 delay lines are modulated (2.7 ) is a little bit less than an algorithm using 12 non-modulated delay lines (2.8 ).  Using the final algorithm, we have thus gained 60% reduction in memory consumption and about 4% reduction in processing time.  In the next section, we will present results of the listening test in which the sound quality of these last two algorithms was compared.

5.6         Listening Test

A listening test was conducted to determine if a reverberation algorithm where 4 out of 8 delay lines are modulated (that we will refer to as algorithm B) would sound as good or better than our reference – an algorithm using 12 non-modulated delay lines (algorithm A).  Algorithm B was found to be slightly faster than the reference (section 5.5), while using 60% of the memory required by algorithm A (section 5.1).  We would thus like to prove that an algorithm using modulation can achieve a reverberation quality that is equivalent (or even better) than a non-modulating algorithm containing more delay lines.

5.6.1        Procedure

To test different aspects of the algorithms, four different studio recordings were used in the test.  The first audio sample used is an acoustic guitar recording that sounds natural and contains a wide range of frequencies.  The second recording is a solo male vocal.  The third audio sample is an alto saxophone playing a jazz solo with arpeggios.  The last recording is a full drum set playing a rock beat ending with a snare hit.  All these instruments were recorded digitally at 44.1 kHz where microphones were positioned close to the instruments.  The audio samples (without reverberation) were about 3 to 9 seconds long.

The two reverberation types were then added to each sample.  Our purpose was to use the artificial reverberation as it would be used in a typical professional studio session.  Reverberation parameters such as reverberation time and reverberation level were chosen differently for each instrument in order to produce a natural ambiance.  Of course, for a given instrument, both reverberation types were created using the same parameters.  Early reflections were also added to both algorithms to make the reverberation more realistic.

A CD was then compiled with 10 tracks per instrument, where each track contained three audio segments.  The first segment used reverberation type A, the second one used type B, and the third one used type X; where X was randomly chosen to be either A or B.

The first part of the test procedure, a typical ABX listening test, was used to determine if the subjects could tell the difference between reverberation type A and B.  We asked the subjects to listen to the 40 tracks and to identify in each track which reverberation type was the third audio segment.  In other words, they had 10 trials per instrument to identify reverberation type X.  Once this first part was completed, we proceeded to the second part of the test procedure.  The objective of the second part was to find which reverberation type the listeners preferred.  For each of the instruments, listeners were asked to write few words about the reverberation type they preferred (A or B) and why.

The CD was played on a portable Sony CD player, using Sony MDR-V600 studio headphones, in a quiet room.  A total of 20 subjects where asked to take the listening test.  All of them were university students (both graduate and undergraduate) having a musical background and some studio experience.  They were allowed to listen to the audio samples as many times as they wanted.

5.6.2        Results Significance

To determine the significance of the test results, we had to determine if a specific score (for example, 8 out of 10 correct identifications) was due to an audible difference as opposed to chance.  This requires choosing between two hypotheses.  The first one, the null hypothesis (H0), holds that the score was due to chance.  The other hypothesis (H1) holds that the score was due to an audible difference.  For any score, statistics supplies the probability that it was due to chance, termed the significance level .  If the significance level  of a score is low enough, we conclude that the result was not the result of chance (we reject H0), and infer that it was due to an audible difference (we accept H1).  The threshold below which we consider that  is low enough is called the criterion of significance ’.  It is determined using common sense, and a value of ’ = 0.05 is the most commonly used value.

When we make the decision of accepting or rejecting H0, there are two kinds of error risks.  The first one, type 1 error, is also labeled ’ and is the risk of rejecting H0 (because a listener had a score of 8/10 or more, for example) when it was actually true.  The second one, type 2 error, is labeled  and is the risk of accepting hypothesis H0 (because a listener had a score of 7/10 or less, for example), when H0 in fact was true.

The probability that c correct identifications out of n total trials are due to chance is given by a binomial distribution of parameter n and p [2]:
 
 
 
( 5.6 )

where , and p is the probability of a correct identification due to chance alone (p = 0.5 in ABX testing).  Thus, the significance level is the probability that c or more correct identifications in n trials are due to chance:
 
 
 
( 5.7 )

Using ( 5.6 ) and ( 5.7 ) with n = 10 trials, we find that the significance level of a score of 8/10 is  = 0.058.  The same computation for a score of 9/10 gives  = 0.011.  We decided to use ’ = 0.058 as our criterion of significance (corresponding to 8 correct answers out of 10), since it is the closest to the usual value ’ = 0.05.  That means that if a listener is able to identify the correct reverberation type 8 times or more for a given instrument, we will assume that he or she was able to tell the difference between reverberation algorithms A and B (hypothesis H1 will be accepted).

5.6.3        Results

The results of the ABX test are shown in Table VI.  For each instrument, the number of listeners that were able to tell the difference between reverberation algorithms A and B is shown.  We can see that on average, 60% of the subjects were able to perceive a difference in the guitar and drum samples, while only 40% and 30% differentiated the reverberation types on the vocal and saxophone recordings, respectively.  Only 3 subjects were able to identify the reverberation algorithm correctly in all 4 instruments, while 5 subjects did not identify any algorithm.  We should also note that even those who where able to tell the difference between the algorithms said that the two reverberation types were really hard to differentiate.

TABLE  VI – RESULTS OF THE ABX LISTENING TEST: Number of listeners that were able to find a difference.
Guitar
Vocal
Saxophone
Drums
12/20  (60%)
8/20  (40%)
6/20  (30%)
12/20  (60%)

In the second part of the test procedure, listeners that were able to perceive a difference in identifying the reverberation type on an instrument were then asked to tell which algorithm sounded better.  The results of this second part are shown in Table VII.

TABLE  VII – RESULTS OF THE SECOND PART OF THE TEST: Number of listeners that preferred reverberation type B*.
Guitar
Vocal
Saxophone
Drums
7/12  (58%)
6/8  (75%)
5/6  (83%)
4/12  (33%)
 * Only listeners that could hear the difference in the corresponding instrument where asked to give their opinion.

We can see that for 3 of the 4 instruments, the listeners that were able to differentiate the reverberation algorithms preferred type B.  Most listeners had different preferences depending on the source type.  For example, some listeners preferred reverberation A on the drums, but reverberation B on everything else.  Listeners that preferred algorithm A found that it sounded in general more natural and full, and that algorithm B sounded a bit more artificial.  For the vocal and saxophone recordings especially, they almost all agreed that algorithm B had a smoother decay and that they were able to hear some beats or fluttering in the tail of reverberation A.  In other words, reverberation B may sound more artificial, but it helps attenuate the resonant tones present in the tail of reverberation A.  That explains the fact that listeners preferred reverberation A on the drums, since reverberation A had no audible resonant frequencies in its decay for this specific recording.  Interestingly, only one of the 20 listeners heard a slight pitch variation in the tail of algorithm B (associated to an excessive modulation depth) in one of the instruments.

The fact that the modulated algorithm (type B) was judged to sound generally better than the reference algorithm may seem surprising, but we have to keep in mind that the reference algorithm is not perfect.  Even an algorithm using 12 delay lines (or even more) will not provide a flat frequency response.  That is why the use of modulation is so powerful in reducing the audible effects of peaks in the frequency response of an artificial reverberation algorithm.

[ Chapter 4 ] [ Chapter 6 ] [ Table of Contents ] [ Title Page ]