2  ARTIFICIAL REVERBERATION

Reverberation is a natural acoustical effect.  When a sound is emitted in a reverberant room, it is reinforced by a large number of closely spaced echoes.  These echoes occur because the emitted sound bounces off the reflecting surfaces of the room.  Artificial reverberation algorithms attempt to recreate these echoes using different techniques requiring varying computational requirements.

This chapter is an overview of artificial reverberation algorithms.  Section 2.1 will introduce the two main approaches to artificial reverberation design, sections 2.2 through 2.4 will then review the acoustical properties of real rooms, and the remaining sections will finally present several types of artificial reverberation algorithms.

2.1         Physical vs. Perceptual Approach to Artificial Reverberation

Artificial reverberation can be achieved by using two different approaches.  The first one, the physical approach, attempts to artificially recreate the exact reverberation of a real room.  To achieve this level of detail, a reverberated signal is usually obtained by convolving the impulse response of a room with a dry source signal.  The impulse response can be recorded directly from a real room, or it can be obtained from the geometric model of a virtual room.  In this latter case, the geometric properties of the room (such as dimensions and wall materials) can be used to compute the coefficients of the impulse response.

Although this approach allows a precise rendering of the reverberation given the source and the listener’s position, it is often not flexible enough and/or efficient enough for real-time virtual reality or gaming applications.  For example, the time domain convolution of a three-second audio signal with a two-second room impulse response (sampled at 44.1 kHz) would require approximately 12 billion multiplications and 220,500 additions.  The equivalent frequency domain convolution would require 220,500 complex multiplications plus the overhead due to the transform and inverse transform operations.

The second approach, called a perceptual approach, tries to generate artificial reverberation algorithms that will be perceptively indistinguishable from natural reverberation.  The purpose of these algorithms is to reproduce only the salient parts of natural reverberation. This approach is generally much more efficient than the physical approach and ideally, the resulting algorithm could be completely parameterized.  This paper focuses on this approach.

2.2         Perceptual Approach

The impulse response of the St-John Lutheran Church (Madison, WI) is shown in Fig. 2.1.  As we can see, the first part of the waveform (from 0 to about 150 ms) is composed of discrete peaks, while the later part is more homogenous and decreases almost exponentially.  We can model this impulse response by splitting it in three distinct parts, as shown in Fig. 2.2.  According to this model, the impulse response consists of the direct signal followed by discrete echoes called early reflections (coming from walls, floor and ceiling) and the late reverberation.

Fig. 2.1.  Church impulse response (Sonic Foundry Inc, 1997).
Fig. 2.2.  Distinction between the direct signal, early reflections and late reverberation.
This suggests that an artificial reverberation algorithm could be divided in two parts.  The first part would produce a finite number of discrete echoes that would coincide with the ones found in the real impulse response, and the second part would generate a high echo density, that would decrease exponentially.

The early reflections are generally computed using a geometric model of the room to be simulated.  The most widely used methods are the source-image method and the ray-tracing method.  These can both be combined with head-related transfer functions (HRTF).  A discussion of these techniques is beyond the scope of this paper.  However a good overview can be found in [10] and [51].  Sections 2.3 and 2.4 will now review the properties of late reverberation in real rooms.  Modeling of late reverberation will be discussed in detail in sections 2.5 trough 2.7.

2.3         Reverberation Time and EDR

A room is often characterized by its reverberation time (RT), a concept first established by Sabine [33] in his pioneering work on room acoustics in 1900.  The reverberation decay time is proportional to the volume of the room and inversely proportional to the amount of sound absorption of the walls, floor and ceiling of the room:
 
 
( 2.1 )

where Tr is the time (in seconds) for the reverberation to decay 60 dB, V is volume of the room (in m3), and A is the overall absorption of the room.  Since the absorption of a room is frequency dependent, the reverberation time of a room is also frequency dependent.  For example, a room containing walls made of porous materials that absorb high frequencies will cause shorter RT as frequency increases.

To measure the RT, Schroeder [40] proposed to integrate the impulse response of the room to get the room’s energy decay curve (EDC):
 
 
( 2.2 )

where h(t) is the impulse response of the room, and can be filtered to obtain the EDC of a particular frequency range.

Jot [17] and Griesinger [14] extended this concept to help visualize the frequency dependent nature of the reverberation.  Jot proposed a variation of the EDC that he called the energy decay relief or EDR(t,w).  The EDR represents the reverberation decay as a function of time and frequency in a 3D plot.  To compute it, we divide the impulse response into multiple frequency bands, compute Schroeder’s integral for each band, and plot the result as a 3D surface.  As an example, the EDR of a typical hall is shown in Fig. 2.3.  We can see that the impulse response is decaying slowly at low frequencies.  However, the reverberation time at high frequencies is much shorter because the walls absorb the high frequencies more than the low frequencies.

Fig. 2.3.  Energy decay relief of a large hall.


2.4         Modal Density and Echo Density

A room can be characterized by its normal modes of vibration: the frequencies that are naturally amplified by the room.  The number Nf of normal modes below frequency f is nearly independent of the room shape [24].  It is given by:
 
 
( 2.3 )

where V is the volume of the room (in m3), c is the speed of sound (in m/s), S is the area of all walls (in m2), and L is the sum of all edge lengths of the room (in m).  The modal density, defined as the number of modes per Hertz, is:
 
 
( 2.4 )

Thus, the modal density of a room grows proportionally to the square of the frequency.  According to this formula, a medium sized hall (18,100 m3 with a RT of 1.8 seconds) has a frequency density of 5800 modes per Hertz at a frequency of 1 kHz.

However, above a critical frequency, the modes start to overlap.  Over this specific frequency, the modes are excited simultaneously and interfere with each other.  This creates a frequency response that can be modeled statistically [24] [42].  According to this model, the frequency response of a room is characterized by frequency maxima whose mean spacing is:
 
 
  Hz
( 2.5 )

where Tr is the reverberation time.  This statistical model is justified only above a critical frequency:
 
 
  Hz
( 2.6 )

where Tr is the reverberation time and V is the volume of the room.  According to this model, a medium sized hall (18100 m3, with a RT of 1.8 seconds) would have a frequency response consisting of frequency peaks separated by an average of fmax = 2.2 Hz above a critical frequency fc = 20 Hz.

Another major characteristic of a room is the density of echoes in the time domain.  The echo density of a room is defined as the number of echoes reaching the listener per second.  Kuttruff [24] has shown (using the source-image method with a sphere to model a room) that the echoes increase as the square of the time:
 
 
( 2.7 )

where Nt is the number of echoes, t is the time (in s), ct is the diameter of the sphere (in m), and V is the volume of the room (in m3).  Differentiating with respect to t, we obtain the density of echoes:
 
 
( 2.8 )

where Nt is the number of echoes that will occur before time t (in s), c is the speed of sound (in m/s), and V is the volume of the room (in m3).

The time after which the echo response becomes a statistical clutter is dependent on the input signal width.  For a pulse of width t, the critical time after which the echoes start to overlap is about [38]:
 
 
( 2.9 )

For example, the echoes excited by an impulse of 1 ms in a room of 10,000 m3 would start overlapping after 150 ms.  After this time, we cannot perceive the individual echoes anymore.

       Another important characteristic of large rooms is that every adjacent frequency mode is decaying almost at the same rate.  In other words, even if the higher frequencies are decaying faster than the lower frequencies, all frequencies in a same region are decaying at the same rate.  We should also note that in a good sounding room, there are no “flutter” echoes (periodic echoes caused when the sound moves back and forth between two parallel hard walls).

Now that we have reviewed the properties of reverberation in real rooms, we will focus on the different methods that have been developed to generate artificial reverberation.

2.5         Unit Comb and All-pass filters

Schroeder [38] was the pioneer who first attempted to make digital reverberation while he was working at Bell Laboratories.  The first prototype he tried, called a comb filter (illustrated in Fig. 2.4), consisted of a single delay line of m samples with a feedback loop containing an attenuation gain g.

Fig. 2.4.  Comb filter flow diagram.
Fig. 2.5 shows a comb filter similar to the one shown in Fig. 2.4.  However, in this design, the attenuation gain is in the direct path and not in the feedback path.  We will use this configuration during the remainder of this thesis because it will act as the building block of the larger designs.  Note that the two designs will produce a similar impulse response, but one will be softer than the other by a factor g.

Fig. 2.5.  Comb filter flow diagram.
The z-transform of the comb filter of Fig. 2.5 is given by:
 
 
( 2.10 )

where m is the delay length in samples and g is the attenuation gain.  Note that to achieve stability, g must be less than unity.  For every trip around the feedback loop, the sound is attenuated 20  log10(g) dB.  Thus, the reverberation time (defined by a decay of 60 dB) of the comb filter is given by:
 
 
( 2.11 )

where g is the attenuation gain, m is the delay length in samples, and T is the sampling period.

The properties of the comb filter are shown in Fig. 2.6.  We see that the echo amplitude decreases exponentially as time increases.  This is good because real rooms have a reverberation tail decaying somewhat exponentially.  However, the echo density is really low, causing a “fluttering” sound on transient inputs.  Also, the density of echoes does not increase with time as it does in real rooms.

Fig. 2.6.  Comb filter (with Fs = 1 kHz, m = 10, and g = - ).
The pole-zero map of the comb filter shows that a delay line of m samples creates a total of m poles equally spaced inside the unit circle.  Half of the poles are located between 0 Hz and the Nyquist frequency f = fs/2 Hz, where fs is the sampling frequency.  That is why the frequency response has m distinct frequency peaks giving a “metallic” sound to the reverberation tail.  We perceive this sound as being metallic because we only hear the few decaying tones that correspond to the peaks in the frequency response.

Reducing the delay length m to increase the echo density will result in a weaker modal density because there will be less peaks in the frequency domain.  Thus, increasing the echo density for the aim of producing a richer reverberation will result in a sound that resonates at specific frequencies.  The last important thing to note about this filter is that increasing the feedback gain g to get a slower decay (and thus a longer reverberation time) gives even more pronounced peaks in the frequency domain since the frequency variations minima and maxima are:
 
 
( 2.12 )

To solve the frequency problem of the previous design, Schroeder came up with what he called the all-pass unit shown in Fig. 2.7.  Although the original flow diagram of the filter was different then the one shown here, the properties of both filters are equivalent.

Fig. 2.7.  All-pass filter flow diagram.
The z-transform of the all-pass filter is given by:
 
 
( 2.13 )

The poles of the all-pass filter are thus at the same location as the comb filter, but we now added some zeros at the conjugate reciprocal locations.  The frequency response of this design is given by:
 
 
( 2.14 )

We can see that this frequency response is unity for all  since  has unit magnitude, and the quotient of complex conjugates also has equal magnitude.  This leads to:
 
 
( 2.15 )

The properties of the all-pass filter are shown in Fig. 2.8.  As we can see, by using a feed-forward path, Schroeder was able to obtain a reverberation unit that has a flat frequency response.  Thus, a steady state signal comes out of the reverberator free of added coloration.  However, for transient signals, the frequency density of the unit is not high enough and the comb filter’s timbre can still be heard.

Fig. 2.8.  All-pass filter (with Fs = 1 kHz, m = 10, and g = - ).
It is interesting to notice that both the comb and the all-pass filters have the same impulse response (except the first pulse) for a gain |g| = .  Even with other gains g, we find that both designs sound similar for short transient inputs.  It is thus sometimes tricky to look at the frequency response plot of a filter, since its frequency response over a short period of time may be completely different from its overall frequency response.  That is why we often use impulses as an input signal to judge the quality of a reverberation algorithm – it gives us a good indication of the quality of the reverberator both in the time and in the frequency domain.

2.6         Filter Networks

By combining these two unit filters in different ways, we can create more complex structures that will hopefully provide a resulting reverberation with greater time density of echoes and smoother frequency response – even for inputs with sharp transients.

2.6.1        All-Pass Filter Networks

To increase the time density of the artificial reverberation, Schroeder cascaded multiple all-pass filters to achieve a resulting filter that would also have an all-pass frequency response.  The result is shown in Fig. 2.9.

Fig. 2.9.  Unit all-pass filters in series.
In this configuration, the echoes provided by the first all-pass unit are used to produce even more echoes at the second stage, and so on for the remaining stages.  However, as Moorer [27][1] pointed out, the unnatural coloration of the reverberation still remains for sharp transient inputs.

2.6.2        Comb Filter Networks
Even if the unit comb filter has some frequency peaks, the combination of several unit comb filters in parallel can provide a filter having an overall frequency response that looks almost like a real room’s frequency response.  In this design, each unit comb filter has a different delay length, to avoid having more than one frequency peak at a given location.  An example of parallel comb filter design in shown in Fig. 2.10.

Fig. 2.10.  Unit comb filters in parallel.
To achieve good results, each unit comb filter must be properly weighted.  Also, the reverberation time of every filter must be the same.  In order to do that, we can generalize ( 2.11 ) and select the gains gp according to:
 
 
    for p = 1, 2,…,N
( 2.16 )

where N is the number of comb filters.

Schroeder suggested choosing the delay lengths such that the ratio of the largest to the smallest is about 1.5.  This parallel comb filter design leads to the general feedback delay network (FDN) design, which will be discussed in depth in chapter 3.

2.6.3        Combination of Comb and All-Pass Filter

To achieve a good echo density while minimizing the reverberation coloration, Schroeder put both comb filters (in parallel) and all-pass filters (in series) to give the reverberator shown in Fig. 2.11.  He chose delay lengths ranging from 30 to 45 ms for the comb filters and two shorter delay lengths (5 and 1.7 ms) for the two all-pass filters.  The attenuation gains of the comb filters were chosen according to ( 2.16 ) and the all-pass gains were both set to 0.7.  In this design, the comb filters provide the long reverberation delay, and the all-pass filters multiply their echoes to provide a denser reverberation.  However, this design sounds artificial because it still does not provide a high enough echo density.  Audible resonating frequencies are also present in the reverberation tail.
 
 

Fig. 2.11.  Schroeder’s reverberator made of comb and all-pass filters.
Piirilä [29] also suggested using a combination of comb and all-pass filters to produced non-exponentially decaying reverberation.  He was able to produce several different reverberation envelopes to achieve interesting musical effects and to enhance reverberated speech.

2.7         Other types of artificial reverberation

This section discusses different types of reverberator designs that have not been implemented in this thesis.  However, some of them could easily be combined together with a feedback delay network to produce an even more natural or efficient artificial reverberator.

2.7.1        Nested All-Pass Designs

To achieve a more natural-sounding reverberation network, it would be desirable to combine the unit filters to produce a buildup of echoes, as it would occur in real rooms.  As suggested by Vercoe [49] and used later by Bill Gardner [5] and William G. Gardner [7], one solution is to use nested all-pass filters.  In the diagram below, we substitute the delay line of a unit all-pass filter with a another all-pass filter having a transfer function G(z).

Fig. 2.12.  Nested all-pass filters.
If G(z) has an all-pass frequency response, then the resulting filter will also have an all-pass response.  That enables the use of any depth of nested all-pass filters while insuring the overall stability and frequency response of the system.  However, Gardner found that this design still provided a somewhat colored response.  He thus suggested the addition of a global feedback path (having a gain |g| < 1) to the system.  With this new addition, the resulting reverberation is much smoother, probably because of the increased echo density provided by the feedback loop.

2.7.2        Dattoro’s Plate Reverberator

Dattorro [3] presented the flow diagram of a plate reverberator (Fig. 2.13) inspired by the work of Griesinger.

Fig. 2.13.  Dattorro’s plate reverberator.
The first part of the reverberator consists of a pre-delay unit, a low-pass filter, and four all-pass filters used as diffusers.  The diffusers are used to decorrelate the incoming signal quickly, preparing it to enter the second part of the reverberator.  This second section, which Dattorro calls the “tank,” consists of two different paths that are fed back into each other.  Each path is made of two all-pass filters, two delay lines, and a low-pass filter.  We create the output of the reverberator by summing together (with different weights) the output taps from the tanks’ delay lines and all-pass units.  As we will discuss in section 4.3.1, the all-pass filters of the tank can be modulated to smooth the resulting reverberation.

2.7.3        Convolution-based Reverberation

As we mentioned in the beginning of this chapter, creating reverberation by convolving a signal with a real room’s impulse response is not flexible or efficient enough for some applications.  However, it can be done when high quality reverberation is needed.  For example, commercial software such as Sonic Foundry’s “Acoustic Modeler” DirectX plug-in lets the user convolve any audio material with real room impulse responses to produce a realistic reverberation.  Since time-domain convolution requires a huge amount of processing power, several methods such as block-convolution [9] or multi-rate filtering [48] have been developed to decrease the processing requirement and input-output delay of the process.

It should be noted that convolution is not always done with a real room’s impulse response.  It is known that convolving the input with an exponentially decaying Gaussian white noise gives good results [26].

Granular reverberation is also a type of convolution-based reverberation.  By convolving the input signal with sound “grains” generated by a technique called asynchronous granular synthesis (AGS), the virtual “reflections” contributed by each grain smears the input signal in time.  The color of the resulting reverberation is determined by the spectrum of the grains, which depends on the grains' duration, envelope, and waveform.  More details on granular reverberation can be found in [30].

2.7.4        Multiple-stream Reverberation

To create a more natural sounding reverberation, we can also split the reverberation into multiple streams, where each stream is tuned to simulate a particular region of a room.  Any of the filters mentioned earlier can be used for this purpose, and each filter can be assigned to a different output of the reverberator.  They can also be mixed together by the use of other methods such as head-related transfer function (HRTF) models, to give each filter a virtual location in the stereo field.

For example, Stautner and Puckette [47] designed a reverberator made of four comb filters (corresponding to left, right, center and back speakers) that were networked together.  With this specific design, a signal first heard in the left speaker would then be heard in the front and back speaker and would finally be heard in the right speaker.

2.7.5        Multi-rate Algorithms

Multi-rate algorithms split the signal into different frequency ranges.  By combining a bank of all-pass filters with a bank of comb filters such that each comb filter processes a different frequency range, the reverberation time of each specific frequency band can be set by adjusting the corresponding comb filter’s feedback gain. In this configuration, it is thus possible to use a different sampling rate for each band according to its bandwidth [50].

2.7.6        Waveguide Reverberation

Digital waveguides are widely used in the physical modeling of musical instruments.  Julius O. Smith [45] suggested the use of lossless digital waveguide networks (DWN) to create artificial reverberation.  A waveguide is a bi-directional delay line that propagates waves simultaneously in two opposite directions.  When a wave reaches a waveguide termination, it is reflected back to the origin.  When multiple waveguides are connected together in a closed lossless network, the reflections occurring in a room can be simulated, thus creating a reverberator.  Also, with the addition of low-pass filters to the lossless network prototype, frequency dependent reverberation time can be obtained.  More details on digital waveguide networks will be given in Chapter 3.

Waveguides can also be connected in a more structured way to form a 2D mesh [35] [36] or even a more complex 3D mesh [34].  A digital waveguide rectangular mesh is an array of digital waveguides arranged along each dimension, interconnected at their intersections.  For example, the resulting mesh of a 3D space is a rectangular grid in which each node is connected to its six neighbors by unit delays.  One of the problems of this configuration is that the dispersion of the waves is direction dependent.  This effect can be reduced by using structures other than the rectangular mesh, such as triangular or tetrahedral meshes, or by using interpolation methods.

[ Chapter 1 ] [ Chapter 3 ] [ Table of Contents ] [ Title Page ]