Chapter 4: Auditory Scene Analysis
Precedence Effect            Auditory Stream Segregation          The Acoustic Space         TOC or Beginning


           In the presence of multiple sound sources, listeners can sometimes isolate their attention to one source while ignoring the others.  At other times, sources might be perceptually integrated to create auditory illusions (images) that occur away from any of the actual source locations.  This occurs often when listening to stereo loudspeakers.  Bregman (1999) has studied these behaviors and offers the term “auditory scene analysis” to generically describe the perceptual mechanisms which allow us to interact with complex listening environments.

            The most important of these mechanisms is probably the “precedence effect.” This unique characteristic of our auditory system allows us to hear a sound while ignoring any close temporal replications.  This most commonly occurs in the echoes of a reverberant acoustic space.  An understanding of the precedence effect allows a higher-level discussion on the topic of auditory stream segregation.  Each sound source is considered to be a stream of information to the listener.  Depending on the characteristics of those sound streams, varied levels of integration or segregation will occur.  Finally, an introduction to the influences of an acoustic space will be discussed.

            The topic of auditory scene analysis applies to this research in that one must understand the consequences of spatially relocating frequencies.  Moreover, studying the characteristics of this behavior will provide insight as to how far the sources might be separated, or which frequencies might be more easily relocated before segregation occurs. 

Top


Precedence Effect

Discussions of the “precedence effect” (Wallach, 1949), “Haas effect “ (named after Helmut Haas’ 1949 dissertation), or the “law of the first wavefront” specifically refers to the human tendency to perceive only one sound event when two (or more) sequentially reoccurring sounds occur.  The most common example of this phenomenon is found when listening in a reverberant room.  Here, sound reaches the listener both directly from the sound source as well as from varied directions due to wall reflections.  Yet, only one sound is typically heard and in only one direction.

Blauert (1999) as well as Litovsky, Colburn, Yost, & Guzman (1999) have presented summaries of the theory and results of many precedence effect experiments.  Here, “clicks” tend to be the most commonly used test signal because they provide a wide spectral bandwidth and definite temporal presentation to the listener.  Litovsky et al. have further categorized precedence effect experiments as either auditory fusion, localization dominance, or lag discrimination.  Lag discrimination is not particularly relevant to this research and will not be discussed.

Auditory fusion experiments present sequential clicks with a silent interval in-between them.  The goal is to identify signal characteristics that will change the perceived number of sound events from one to two.  Clicks with <5ms between them create only one auditory event, where the two-event threshold is around 8-10 ms (Litovsky et al., 1999).  A more common term for this performance is called the “echo threshold,” referring to the time difference necessary to create an audible echo.

With the two sequential sounds fused into one image, it is interesting to next consider the conditions that can affect the overall localization of the image.  This is probably the most relevant to this research and is called “localization dominance.” Scientists study localization dominance by varying the interaural differences (ITD and ILD) between two sequential sounds and asking listeners to comment on its perceived location.  This has already been discussed in early sections, and will not be presented in detail here.

However, one particularly interesting aspect of localization dominance is the “Franssen effect.”  With a standard stereo loudspeaker setup, a short burst is played from the left speaker and a sequential longer burst plays from the right speaker.  Under this condition, the listeners will hear a short burst at the left, followed by a diffuse sound coming from between the loudspeakers.  In essence, the short leading sound has caused the longer lagging image to be perceived as spatially diffuse, even though it comes from only the right speaker.  Litovsky et al (1999) additionally comment that this illusion will not occur with tones of high or low frequency, but rather requires sounds that are difficult to localize (suggesting mid frequencies) (p. 1638).

 Top


Auditory Stream Segregation

            Conceptualize each sound source in a multi-source setup as a stream of information to the listener.  Under certain conditions, the streams can be perceptually integrated into one overall auditory event.  However, each stream contributes characteristics to the event and can influence the sound in many ways. 

In general, streaming research is classified as either sequential or concurrent auditory streaming (Yost, 1994).  Sequential streaming experiments present several non-overlapping sound events whereas concurrent streaming presents simultaneous events.  For this thesis, concurrent streaming is more applicable because the spatially relocated frequencies are presented at the same time as the other sounds.  However, sequential streaming is a more severe condition and can better exemplify the signal characteristics that will cause stream segregation.

Through psychoacoustic experiments, it has been found that those characteristics most affecting auditory streaming include the temporal interrelationship, relative similarity of fundamental frequencies, spectral distribution (harmonics) and the perceptual location of the sound sources (Bregman, 1999).  Obviously, those events that occur at the same point in time are likely to belong to the same stream source. Also, increasing the time between events is more likely to suppress the second sound (precedence effect) or eventually associate it with a separate auditory event.

In addition, those sounds with common spectral characteristics are more likely to be integrated.  On the other hand, having different fundamental frequencies or timbres (harmonics) will most likely to lead to stream segregation.  This point is particularly relevant to this thesis, because the greater the perceived spectral change of the signal, the greater the chance the auditory image will split.  However, the spatially relocated sounds will still share the same basic harmonic structure of the original signal.

Finally, the influence of the perceived location of the sound sources should be considered.  Bregman (1999) states that “one of the most powerful strategies for grouping spectral components it to group those that have come from the same spatial direction and to segregate those groups that have come from different directions” (p. 658).    In a multi-source situation, the brain will use the physical interaural and monaural localization cues in an attempt to interpret a spatial location for each stream of sound.  If the streams are the same due to physical proximity, or otherwise present a perceived image with a shared spatial origin (i.e. correlated left and right ear signals, as in stereo listening), the brain will integrate them into one event.  However, strong differences in localization cues will cause the two streams to segregate.  The term to describe these kinds of phenomena is called summing localization (Blauert, 1971).

Summing localization research performed by Gockel, Carlyon, and Micheyl (1999) used several sequential band-limited harmonically complex sounds over headphones.  Their focus was to determine if perceived location (through changing interaural differences) would have an impact on auditory streaming.  They found that presenting the second sound with interaural differences increased the subject’s segregation tendencies.  However, the high frequency region (3900-5400 Hz) showed the least amount of segregation as compared to the mid (1375-1875 Hz) and low (125-625 Hz) frequency regions.  This suggests that interaural changes at higher frequencies are less likely to cause stream segregation than those same changes at middle or lower frequencies.  

Along similar lines, Thurlow and Marten (1962) performed sequential streaming experiments on high pass noise (>2000 Hz) coming from two loudspeakers in free space.  They continually increased speaker separation and found that listeners perceived one source of sound until approximately 6.4° of separation, where 50% of the listeners perceived one steady sound and one intermittent sound (suggesting partial fusion) (p. 1858).  Further increase in speaker separation eventually caused the sound to split into two unique streams.

Finally, some interesting commentary on concurrent streaming experiments can be found in Gardner (1973).  Here, he provides a technical review of auditory illusions caused by multiple sound sources radiating similar signals.  He discusses how spatially separated loudspeakers radiating signals of similar quality will create fused images (auditory integration).  He continues, stating that even if one speaker radiates low pass noise and the other radiates high pass noise, both will fuse into the perception of a single source of full bandwidth noise.  This example illustrates that the “quality” and general agreement of the sound sources can result in auditory fusion despite differences in the source’s spatial and spectral content.  Gardner reasons that these streaming and fusion effects are largely due to the precedence effect.  As will be seen later, this result has a direct implication to this research, where spatially relocated frequency bands of white noise are integrated into one defined image. 

Top  


The Acoustic Space

            Discussing all the nuances of acoustics is not necessary or relevant to this research.  However, it is important to realize that an acoustic space can influence the results of listening experiments.  Any reverberation or spectral variations caused by the acoustic space will essentially change the perceived listening material.  This is why many of the discussed localization experiments have been performed in anechoic (relatively free from reverberation) environments.  However, it is interesting to consider how the room itself affects localization. 

Hartmann (1983) concedes that localization is predominately determined by the interaural and monaural characteristics of the incident waveform.  In addition, any secondary waveforms such as echo reflections are most likely suppressed by the precedence effect.  However, he points out that the historical body of research on precedence has been performed either in free field, with headphones, or via paired-loudspeakers in anechoic chambers.  His research therefore contributes experimental data regarding the influence of room geometry and wall absorption on horizontal localization. 

Using the Espace de Projection (ESPRO) variable-acoustic concert hall in Paris, Hartmann (1983) ran several test signals through different room conditions (absorbing, reflecting, and low ceiling).  Of particular interest is that for impulsive sounds (50 ms sine bursts), localization showed no significant changes due to the different room types.  However, for non-impulsive sinusoids (6-10 sec. rise times), significant localization blur did occur.  With tones of 200, 500 and 5000 Hz, he showed that only the 5000 Hz signal could be localized with moderate accuracy.  Both the 200 Hz and 500 Hz signals had errors suggesting random guesses.  Other non-impulsive complex tones and broadband noises were also used in his experiments, which were localized significantly better than the steady-state tones.

Hartmann (1983) concludes that the localization of non-impulsive low frequencies appears to be deteriorated due to room acoustics.  This is in agreement with other papers which have stated that non-impulsive sinusoids are in general the most difficult test signals to localize (Wagenaars, 1990; Rakerd & Hartmann, 1985; Hartmann, 1993).  Thus, influences of room acoustics are probably the most likely explanation of why most people claim that low frequencies are “hard to localize.”


Top or TOC or Beginning

 

 

 

Created  February 2003 by Rob Hartman

Copyright (C) 2003