Chapter 4: Auditory Scene
Analysis
Precedence Effect
Auditory Stream Segregation The Acoustic Space
TOC
or Beginning

In the presence of multiple sound sources, listeners
can sometimes isolate their attention to one source while ignoring the
others. At other times, sources might
be perceptually integrated to create auditory illusions (images) that occur
away from any of the actual source locations.
This occurs often when listening to stereo loudspeakers. Bregman (1999) has studied these behaviors
and offers the term “auditory scene analysis” to generically describe the perceptual
mechanisms which allow us to interact with complex listening environments.
The most important of these mechanisms is probably the
“precedence effect.” This unique characteristic of our auditory system allows
us to hear a sound while ignoring any close temporal replications. This most commonly occurs in the echoes of a
reverberant acoustic space. An understanding of the precedence
effect allows a higher-level discussion on the topic of auditory stream
segregation. Each sound source is
considered to be a stream of information to the listener. Depending on the characteristics of those
sound streams, varied levels of integration or segregation will occur. Finally, an introduction to the influences
of an acoustic space will be discussed.
The topic of auditory scene analysis applies to this research in that one must understand the consequences of spatially relocating frequencies. Moreover, studying the characteristics of this behavior will provide insight as to how far the sources might be separated, or which frequencies might be more easily relocated before segregation occurs.
Discussions of the
“precedence effect” (Wallach, 1949), “Haas effect “ (named after Helmut Haas’
1949 dissertation), or the “law of the first wavefront” specifically refers to
the human tendency to perceive only one sound event when two (or more)
sequentially reoccurring sounds occur.
The most common example of this phenomenon is found when listening in a
reverberant room. Here, sound reaches
the listener both directly from the sound source as well as from varied
directions due to wall reflections.
Yet, only one sound is typically heard and in only one direction.
Blauert (1999)
as well as Litovsky, Colburn, Yost, & Guzman (1999) have presented
summaries of the theory and results of many precedence effect experiments. Here, “clicks” tend to be the most commonly
used test signal because they provide a wide spectral bandwidth and definite
temporal presentation to the listener.
Litovsky et al. have further categorized precedence effect experiments
as either auditory fusion, localization dominance, or lag discrimination. Lag discrimination is not particularly
relevant to this research and will not be discussed.
Auditory fusion
experiments present sequential clicks with a silent interval in-between
them. The goal is to identify signal
characteristics that will change the perceived number of sound events from one
to two. Clicks with <5ms between
them create only one auditory event, where the two-event threshold is around
8-10 ms (Litovsky et al., 1999). A more
common term for this performance is called the “echo threshold,” referring to
the time difference necessary to create an audible echo.
With the two sequential
sounds fused into one image, it is interesting to next consider the conditions
that can affect the overall localization of the image. This is probably the most relevant to this
research and is called “localization dominance.” Scientists study localization
dominance by varying the interaural differences (ITD and ILD) between two
sequential sounds and asking listeners to comment on its perceived
location. This has already been
discussed in early sections, and will not be presented in detail here.
However, one particularly
interesting aspect of localization dominance is the “Franssen effect.” With a standard stereo loudspeaker setup, a
short burst is played from the left speaker and a sequential longer burst plays
from the right speaker. Under this
condition, the listeners will hear a short burst at the left, followed by a
diffuse sound coming from between the loudspeakers. In essence, the short leading sound has caused the longer lagging
image to be perceived as spatially diffuse, even though it comes from only the
right speaker. Litovsky et al (1999)
additionally comment that this illusion will not occur with tones of high or
low frequency, but rather requires sounds that are difficult to localize
(suggesting mid frequencies) (p. 1638).
Conceptualize
each sound source in a multi-source setup as a stream of information to the
listener. Under certain conditions, the
streams can be perceptually integrated into one overall auditory event. However, each stream contributes
characteristics to the event and can influence the sound in many ways.
In general, streaming research is
classified as either sequential or concurrent auditory streaming (Yost,
1994). Sequential streaming experiments
present several non-overlapping sound events whereas concurrent streaming
presents simultaneous events. For this
thesis, concurrent streaming is more applicable because the spatially relocated
frequencies are presented at the same time as the other sounds. However, sequential streaming is a more
severe condition and can better exemplify the signal characteristics that will
cause stream segregation.
Through psychoacoustic experiments, it has been found that
those characteristics most affecting auditory streaming include the temporal
interrelationship, relative similarity of fundamental frequencies, spectral
distribution (harmonics) and the perceptual location of the sound sources
(Bregman, 1999). Obviously, those
events that occur at the same point in time are likely to belong to the same
stream source. Also, increasing the time between events is more likely to
suppress the second sound (precedence effect) or eventually associate it with a
separate auditory event.
In addition, those sounds with common spectral
characteristics are more likely to be integrated. On the other hand, having different fundamental frequencies or
timbres (harmonics) will most likely to lead to stream segregation. This point is particularly relevant to this
thesis, because the greater the perceived spectral change of the signal, the
greater the chance the auditory image will split. However, the spatially relocated sounds will still share the same
basic harmonic structure of the original signal.
Finally, the influence of the perceived location of the
sound sources should be considered.
Bregman (1999) states that “one of the most powerful strategies for
grouping spectral components it to group those that have come from the same
spatial direction and to segregate those groups that have come from different
directions” (p. 658). In a
multi-source situation, the brain will use the physical interaural and monaural
localization cues in an attempt to interpret a spatial location for each stream
of sound. If the streams are the same
due to physical proximity, or otherwise present a perceived image with a shared
spatial origin (i.e. correlated left and right ear signals, as in stereo
listening), the brain will integrate them into one event. However, strong differences in localization
cues will cause the two streams to segregate.
The term to describe these kinds of phenomena is called summing
localization (Blauert, 1971).
Summing localization research performed by Gockel, Carlyon,
and Micheyl (1999) used several sequential band-limited harmonically complex
sounds over headphones. Their focus was
to determine if perceived location (through changing interaural differences)
would have an impact on auditory streaming.
They found that presenting the second sound with interaural differences
increased the subject’s segregation tendencies. However, the high frequency region (3900-5400 Hz) showed the
least amount of segregation as compared to the mid (1375-1875 Hz) and low
(125-625 Hz) frequency regions. This
suggests that interaural changes at higher frequencies are less likely to cause
stream segregation than those same changes at middle or lower frequencies.
Along similar lines, Thurlow
and Marten (1962) performed sequential streaming experiments on high pass noise
(>2000 Hz) coming from two loudspeakers in free space. They continually increased speaker
separation and found that listeners perceived one source of sound until
approximately 6.4° of separation, where 50% of
the listeners perceived one steady sound and one intermittent sound (suggesting
partial fusion) (p. 1858). Further
increase in speaker separation eventually caused the sound to split into two
unique streams.
Finally, some interesting commentary on concurrent streaming experiments can be found in Gardner (1973). Here, he provides a technical review of auditory illusions caused by multiple sound sources radiating similar signals. He discusses how spatially separated loudspeakers radiating signals of similar quality will create fused images (auditory integration). He continues, stating that even if one speaker radiates low pass noise and the other radiates high pass noise, both will fuse into the perception of a single source of full bandwidth noise. This example illustrates that the “quality” and general agreement of the sound sources can result in auditory fusion despite differences in the source’s spatial and spectral content. Gardner reasons that these streaming and fusion effects are largely due to the precedence effect. As will be seen later, this result has a direct implication to this research, where spatially relocated frequency bands of white noise are integrated into one defined image.
Discussing all the nuances of acoustics is not necessary
or relevant to this research. However,
it is important to realize that an acoustic space can influence the results of
listening experiments. Any
reverberation or spectral variations caused by the acoustic space will
essentially change the perceived listening material. This is why many of the discussed localization experiments have
been performed in anechoic (relatively free from reverberation)
environments. However, it is
interesting to consider how the room itself affects localization.
Hartmann
(1983) concedes that localization is predominately determined by the interaural
and monaural characteristics of the incident waveform. In addition, any secondary waveforms such as
echo reflections are most likely suppressed by the precedence effect. However, he points out that the historical
body of research on precedence has been performed either in free field, with
headphones, or via paired-loudspeakers in anechoic chambers. His research therefore contributes
experimental data regarding the influence of room geometry and wall absorption
on horizontal localization.
Using
the Espace de Projection (ESPRO) variable-acoustic concert hall in Paris, Hartmann
(1983) ran several test signals through different room conditions (absorbing,
reflecting, and low ceiling). Of
particular interest is that for impulsive sounds (50 ms sine bursts),
localization showed no significant changes due to the different room
types. However, for non-impulsive
sinusoids (6-10 sec. rise times), significant localization blur did occur. With tones of 200, 500 and 5000 Hz, he
showed that only the 5000 Hz signal could be localized with moderate
accuracy. Both the 200 Hz and 500 Hz
signals had errors suggesting random guesses.
Other non-impulsive complex tones and broadband noises were also used in
his experiments, which were localized significantly better than the
steady-state tones.
Hartmann (1983) concludes that the localization of non-impulsive low frequencies appears to be deteriorated due to room acoustics. This is in agreement with other papers which have stated that non-impulsive sinusoids are in general the most difficult test signals to localize (Wagenaars, 1990; Rakerd & Hartmann, 1985; Hartmann, 1993). Thus, influences of room acoustics are probably the most likely explanation of why most people claim that low frequencies are “hard to localize.”

Created February 2003 by Rob Hartman
Copyright (C) 2003