
It is also important to consider the relative hierarchy that localization cues have on the resulting auditory event. First, the acoustical nature of sound waves dictates that the physical level of a localization cue will vary with spatial position and frequency. Perceptually, research has shown that localization cues vary in importance relative to one other and with frequency. Understanding both the physical and perceptual significance of the localization cues should help suggest which frequencies could be spatially relocated with a minimal impact on the position of the resulting auditory image.
Physical Aspects
To
understand the relationship between free field listening conditions and
the physical localization cues they create, first consider the interaction
of two different frequency tones with a listener (Figure
10).The
sound pressure wave travels towards the subject and interacts with the
body and head as an acoustic barrier.
Depending
on the frequency content and angle of incidence, portions of the wave may
be reflected off the listener or may refract around them with minimal interaction.
Figure 10:
ITD
and ILD variations with speaker position (azimuth)
As
is shown in Figure
10,
a sound source located on the median plane (position 1) has no ILD or ITD
for either frequency case. This is
because the sound source has a similar path to both ears.
Yet,
when the source is moved counter-clockwise on the horizontal plane, the
two interaural cues have varied changes.
In
particular, note that the ITD gradually increases because of the increased
path length difference, reaching its maximum at position 3.ITDs
are independent of acoustic effects, which is why they are the same in
both frequency cases. However, because
ILDs are caused by the complex interaction of sound with the listener,
position 2 will generally exhibit a greater level difference than position
3.In fact position 3 is unique,
because in some conditions the diffraction of sound can cause the opposing
ear to be louder than the incident ear (Blauert, 1999, p. 71).Also
notice that the ILD is considerably less for low frequencies (Figure
10a)
than for high frequencies (Figure
10b).
This is because at low frequencies, sound will refract around the head
whereas high frequencies will be reflected, creating an acoustic shadow
on the opposing ear.
A similar analysis of individual ITD cues shows how each varies with frequency and spatial position (see Figure 11).Notice that IATD is dependent only on spatial position and is independent of frequency changes. On the other hand, despite the same spatial position, the IPD shown in Figure 11(a) is more than (b) because of the decreased signal wavelength. The IPD cue will also change due to differences in position only, as is shown in Figure 11(a) and (c). Although not shown, IETD is similar to IPD, meaning that it only varies with changes in spatial origin and modulation frequency.
Figure11:
IATD varies only with position, while IPD varies with position and
signal frequency.Shown for (a) low
frequency source (b) high frequency source (c) high frequency source with
new spatial location
The
generic results shown inFigure
10
and Figure
11
are quite simplified from actual in-ear measurements (i.e. recordings taken
with microphones placed at the entrance to the ear canals).
Many
frequency dependent influences such as variations from the pinnae and torso
reflections are not represented.
In
fact, complex plots of both ILD and ITD occur with changes in azimuth and
frequency. These patterns are shown
from Blauert (1999) in Figure
12.Note
that ILD (left) and ITD (right) are presented, where each plot shows the
variations versus frequency for a fixed azimuth.
The
azimuth is varied from the top plot (?=150º) to the bottom plot (?=30º),
where azimuth is measured counter-clockwise from directly in front.
Figure12: Complex
patterns of ILD (left) and ITD (right) with varied horizontal plane positions
(azimuth). Reprinted from Blauert
(1999), with permission from the MIT press.
As
for monaural cues, it is most relevant to consider the frequency range
where they physically occur. Because
of the wavelength of sound relative to the size of the pinnae folds, monaural
cues exist in only a small portion of the audible spectrum.
One
of the most common ways to determine their range of occurrence is by analyzing
in-ear recordings.
Batteau (1967)
appears to have been one of the first to measure in-ear recordings as he
contemplates that more research of this type had not be done because “the
extraordinary fidelity needed in all aspects of this system, microphones,
amplifiers, headphones, acoustic isolation, perhaps has prevented construction
of the requisite systems until now” (p. 161).
Butler
and Belendiuk (1977) furthered the use of in-ear recordings by comparing
the amplitudes of the signals at the two ears.
With
this analysis, they showed that as a sound source moves from above to below
the interaural axis, a notch of frequencies moves continuously from high
(approx. 7 kHz at 15º elevation) to low (approx. 5.5 kHz at -30º
elevation) (p. 1267).Also, Musicant
and Butler’s (1984) experiments used various low pass and high pass noises
to show that spectral cues above 4 kHz are originating from the pinnae
and help avoid front/rear confusion.
They
also found spectral cues in the 1-4 kHz range, which are caused by the
interaction of sound with the torso.
The
higher end of monaural cues reaches 10-12kHz, where Hebrank and Wright
(1974) measured the effect of a small peak responsible for an upper-rear
sense of direction.
Additionally,
Middlebrooks (1997) calculated the “directional transfer function (DTF),”
obtained by taking in-ear measurements and then subtracting a signal representing
the average of 360º microphone measurements with no subject present.
The
DTFs showed that a spectral notch increases in center frequency from 7
kHz to 9.5 kHz for 0º to 160º azimuth and from 6 kHz to 10 kHz
from-60º to 60º elevation.
Perceptual
Aspects
The
human auditory system is more sensitive to some localization cues than
others, and at times will even ignore a physical cue (Buell & Trahiotis,
1994).What is known has been taken
mostly from lateralization (headphone) tests, where the physical cues can
be independently controlled.
A specific
kind of “trading experiment” is very popular, where the cues are put into
conflicting conditions to see which will dominate.
Through
these trading experiments, summarized by Blauert (1999, p. 172), it has
been shown that signals with most content below 1.6 kHz are dominated by
IATD/IPD.In this condition, 40 us of time difference is equal to 1 dB of ILD.
For
signals above 1.6 kHz, IPD no longer has an impact.
In
fact, only IETD and ILD will create changes in localization, where up to
200 us of IETD is needed to adjust for only 1 dB of ILD.
It
is also insightful to discuss localization blur (MAA) because it essentially
represents a summary of the localization cue perceptual salience.
In
other words, a smaller MAA suggests more dominant localization cues and
vice versa for larger MAAs.
Although,
MAA is considerably smaller for complex signals with wide bandwidths than
it is for simple tones or narrow band sounds. This
is because as bandwidth increases, the number and agreement of the localization
cues can also increase. This creates
a stronger, more definite sense of where a sound is coming from.
With
regards to frequency, it has already been discussed that MAA is the largest
in the middle frequency range.
In
fact, for tones in the horizontal plane, Mills (1958) found the MAA to
be maximum between 1000-3000 Hz while minimum from 250-1000 Hz and 3000-6000
Hz. Similarly, Stevens and Newman
(1936) showed the middle frequency range of 2000-4000 Hz to be the hardest
range to notice changes in azimuth.
Other
experimenters have made similar comments regarding the degradation of localization
performance in the middle frequency bands as opposed to higher or lower
frequency ranges (Perrott, 1969; Perrott and Tucker, 1988; Pulkki and Karjalainen,
2001; Grantham, 1984).
Regardless,
it is probably most important to keep perspective on the relative salience
of all the available localization cues.
Middlebrooks
(1997) states that localization research has “led to a
general acceptance
of the notion that interaural difference cues provide the principal cues
to the horizontal dimension and that spectral shape [monaural] cues provide
the principal cues to vertical and front/back localization” (p. 78).Also,
Fisher and Freedman (1968) mention that while pinnae cues may be useful
for motionless (fixed head) experiments, they are of little importance
in realistic conditions for binaural individuals.
They
showed that listeners who are free to move their heads, yet without pinnae
cues (listening through tubes), can correctly localize free field sounds.
Wightman
and Kistler (1997) also support the dominance of interaural cues.In
particular, they have warned against performing monaural experiments using
binaural individuals with an occluded ear.
Their
experiments show that even slight levels present in the occluded ear causes
a dominance of interaural time difference. Thus,
subjects tend to ignore both the monaural spectral cues under investigation
and the unnaturally created ILD cues due to occlusion, and instead rely
solely on ITDs.
Therefore,
the three types of localization cues seem to perceptually rank with interaural
time differences being the most dominant, followed by interaural level
differences, and finally monaural spectral cues.
With
respect to localization versus frequency, the middle frequency range is
the most difficult to localize with low frequencies being dominated by
ITD cues and high frequencies by ILD cues.
Also,
monaural pinnae cues present in the 5-12 kHz range are useful in some situations,
mostly for avoiding front/back confusion and estimating the distance of
a sound.
All
of the localization cues vary both in physical and perceptual significance
over the audible frequency range (see Figure
13).While
their variations do have complex patterns that change with frequency and
azimuth, generalities can be made from the results of localization experiments
and physical measurements.
Specifically,
physical ITDs sharply drop while ILDs sharply rise with increasing frequency.
Perceptually,
IPD’s influence begins to decrease around 800 Hz, having no effect above
1.6 kHz. ILD has a lesser but relatively
stable influence, with a slight peak around 2 kHz (Blauert, 1999, p. 158).Pinna
cues are of minimal importance in most real world conditions, but occur
in the range of 5-12 kHz.
Created February 2003 by Rob
Hartman
Copyright (C) 2003