Chapter 3: Localization Cue Salience

Physical Aspects        Perceptual Aspects                   TOC or Beginning

It is also important to consider the relative hierarchy that localization cues have on the resulting auditory event.  First, the acoustical nature of sound waves dictates that the physical level of a localization cue will vary with spatial position and frequency.  Perceptually, research has shown that localization cues vary in importance relative to one other and with frequency.  Understanding both the physical and perceptual significance of the localization cues should help suggest which frequencies could be spatially relocated with a minimal impact on the position of the resulting auditory image.

Top

Physical Aspects

        To understand the relationship between free field listening conditions and the physical localization cues they create, first consider the interaction of two different frequency tones with a listener (Figure 10).The sound pressure wave travels towards the subject and interacts with the body and head as an acoustic barrier.  Depending on the frequency content and angle of incidence, portions of the wave may be reflected off the listener or may refract around them with minimal interaction.



Figure 10ITD and ILD variations with speaker position (azimuth)

As is shown in Figure 10, a sound source located on the median plane (position 1) has no ILD or ITD for either frequency case.  This is because the sound source has a similar path to both ears.  Yet, when the source is moved counter-clockwise on the horizontal plane, the two interaural cues have varied changes.

In particular, note that the ITD gradually increases because of the increased path length difference, reaching its maximum at position 3.ITDs are independent of acoustic effects, which is why they are the same in both frequency cases.  However, because ILDs are caused by the complex interaction of sound with the listener, position 2 will generally exhibit a greater level difference than position 3.In fact position 3 is unique, because in some conditions the diffraction of sound can cause the opposing ear to be louder than the incident ear (Blauert, 1999, p. 71).Also notice that the ILD is considerably less for low frequencies (Figure 10a) than for high frequencies (Figure 10b). This is because at low frequencies, sound will refract around the head whereas high frequencies will be reflected, creating an acoustic shadow on the opposing ear.

A similar analysis of individual ITD cues shows how each varies with frequency and spatial position (see Figure 11).Notice that IATD is dependent only on spatial position and is independent of frequency changes.  On the other hand, despite the same spatial position, the IPD shown in Figure 11(a) is more than (b) because of the decreased signal wavelength.  The IPD cue will also change due to differences in position only, as is shown in Figure 11(a) and (c).  Although not shown, IETD is similar to IPD, meaning that it only varies with changes in spatial origin and modulation frequency.

Figure11: IATD varies only with position, while IPD varies with position and signal frequency.Shown for (a) low frequency source (b) high frequency source (c) high frequency source with new spatial location

        The generic results shown inFigure 10 and Figure 11 are quite simplified from actual in-ear measurements (i.e. recordings taken with microphones placed at the entrance to the ear canals).  Many frequency dependent influences such as variations from the pinnae and torso reflections are not represented.  In fact, complex plots of both ILD and ITD occur with changes in azimuth and frequency.  These patterns are shown from Blauert (1999) in Figure 12.Note that ILD (left) and ITD (right) are presented, where each plot shows the variations versus frequency for a fixed azimuth.  The azimuth is varied from the top plot (?=150º) to the bottom plot (?=30º), where azimuth is measured counter-clockwise from directly in front.

Figure12Complex patterns of ILD (left) and ITD (right) with varied horizontal plane positions (azimuth).  Reprinted from Blauert (1999), with permission from the MIT press.

As for monaural cues, it is most relevant to consider the frequency range where they physically occur.  Because of the wavelength of sound relative to the size of the pinnae folds, monaural cues exist in only a small portion of the audible spectrum.  One of the most common ways to determine their range of occurrence is by analyzing in-ear recordings.  Batteau (1967) appears to have been one of the first to measure in-ear recordings as he contemplates that more research of this type had not be done because “the extraordinary fidelity needed in all aspects of this system, microphones, amplifiers, headphones, acoustic isolation, perhaps has prevented construction of the requisite systems until now” (p. 161).

Butler and Belendiuk (1977) furthered the use of in-ear recordings by comparing the amplitudes of the signals at the two ears.  With this analysis, they showed that as a sound source moves from above to below the interaural axis, a notch of frequencies moves continuously from high (approx. 7 kHz at 15º elevation) to low (approx. 5.5 kHz at -30º elevation) (p. 1267).Also, Musicant and Butler’s (1984) experiments used various low pass and high pass noises to show that spectral cues above 4 kHz are originating from the pinnae and help avoid front/rear confusion.  They also found spectral cues in the 1-4 kHz range, which are caused by the interaction of sound with the torso.  The higher end of monaural cues reaches 10-12kHz, where Hebrank and Wright (1974) measured the effect of a small peak responsible for an upper-rear sense of direction.

Additionally, Middlebrooks (1997) calculated the “directional transfer function (DTF),” obtained by taking in-ear measurements and then subtracting a signal representing the average of 360º microphone measurements with no subject present.  The DTFs showed that a spectral notch increases in center frequency from 7 kHz to 9.5 kHz for 0º to 160º azimuth and from 6 kHz to 10 kHz from-60º to 60º elevation.
 


Top

Perceptual Aspects

The human auditory system is more sensitive to some localization cues than others, and at times will even ignore a physical cue (Buell & Trahiotis, 1994).What is known has been taken mostly from lateralization (headphone) tests, where the physical cues can be independently controlled.  A specific kind of “trading experiment” is very popular, where the cues are put into conflicting conditions to see which will dominate.

Through these trading experiments, summarized by Blauert (1999, p. 172), it has been shown that signals with most content below 1.6 kHz are dominated by IATD/IPD.In this condition, 40 us of time difference is equal to 1 dB of ILD.  For signals above 1.6 kHz, IPD no longer has an impact.  In fact, only IETD and ILD will create changes in localization, where up to 200 us of IETD is needed to adjust for only 1 dB of ILD.

It is also insightful to discuss localization blur (MAA) because it essentially represents a summary of the localization cue perceptual salience.  In other words, a smaller MAA suggests more dominant localization cues and vice versa for larger MAAs.  Although, MAA is considerably smaller for complex signals with wide bandwidths than it is for simple tones or narrow band sounds.   This is because as bandwidth increases, the number and agreement of the localization cues can also increase.  This creates a stronger, more definite sense of where a sound is coming from.

With regards to frequency, it has already been discussed that MAA is the largest in the middle frequency range.  In fact, for tones in the horizontal plane, Mills (1958) found the MAA to be maximum between 1000-3000 Hz while minimum from 250-1000 Hz and 3000-6000 Hz.  Similarly, Stevens and Newman (1936) showed the middle frequency range of 2000-4000 Hz to be the hardest range to notice changes in azimuth.  Other experimenters have made similar comments regarding the degradation of localization performance in the middle frequency bands as opposed to higher or lower frequency ranges (Perrott, 1969; Perrott and Tucker, 1988; Pulkki and Karjalainen, 2001; Grantham, 1984).

Regardless, it is probably most important to keep perspective on the relative salience of all the available localization cues.  Middlebrooks (1997) states that localization research has “led to a general acceptance of the notion that interaural difference cues provide the principal cues to the horizontal dimension and that spectral shape [monaural] cues provide the principal cues to vertical and front/back localization” (p. 78).Also, Fisher and Freedman (1968) mention that while pinnae cues may be useful for motionless (fixed head) experiments, they are of little importance in realistic conditions for binaural individuals.  They showed that listeners who are free to move their heads, yet without pinnae cues (listening through tubes), can correctly localize free field sounds. 

Wightman and Kistler (1997) also support the dominance of interaural cues.In particular, they have warned against performing monaural experiments using binaural individuals with an occluded ear.  Their experiments show that even slight levels present in the occluded ear causes a dominance of interaural time difference.  Thus, subjects tend to ignore both the monaural spectral cues under investigation and the unnaturally created ILD cues due to occlusion, and instead rely solely on ITDs.

Therefore, the three types of localization cues seem to perceptually rank with interaural time differences being the most dominant, followed by interaural level differences, and finally monaural spectral cues.  With respect to localization versus frequency, the middle frequency range is the most difficult to localize with low frequencies being dominated by ITD cues and high frequencies by ILD cues.  Also, monaural pinnae cues present in the 5-12 kHz range are useful in some situations, mostly for avoiding front/back confusion and estimating the distance of a sound.

All of the localization cues vary both in physical and perceptual significance over the audible frequency range (see Figure 13).While their variations do have complex patterns that change with frequency and azimuth, generalities can be made from the results of localization experiments and physical measurements.  Specifically, physical ITDs sharply drop while ILDs sharply rise with increasing frequency. Perceptually, IPD’s influence begins to decrease around 800 Hz, having no effect above 1.6 kHz.  ILD has a lesser but relatively stable influence, with a slight peak around 2 kHz (Blauert, 1999, p. 158).Pinna cues are of minimal importance in most real world conditions, but occur in the range of 5-12 kHz.


Figure 13: Generic (a) Physical and (b) Perceptual localization cue salience versus frequency


Top or  TOC or Beginning
 
 
  
 

Created February 2003 by Rob Hartman
Copyright (C) 2003