
One
can rarely read a publication on the topic of localization without seeing
some reference to the early twentieth century work of Lord Rayleigh (1907)
and his duplex theory of sound.
Rayleigh
felt that humans rely on two types of cues for localization: interaural
time differences (ITD) and interaural level differences (ILD).
However,
his theory did not allow scientists to understand how localization occurred
when ITD and ILD were zero or equivalent, which occurs in several situations. Thus,
a complement to Rayleigh’s explanation is that monaural cues from the pinnae
must provide additional help with localization when the duplex theory does
not apply.
Since
Rayleigh’s time, many experiments have been performed to further our understanding
of this topic.
The experimental setup
typically falls into one of two broad categories, free field (localization)
or with headphones (lateralization).
While
free-field testing is obviously more natural, headphone testing tends to
be more popular because it allows the isolation of each localization cue
and also removes any effects from the room.
Yet
headphone testing also has its drawbacks, such as the creation of internalized
auditory images (i.e. headphone images are perceived inside the head).
For
either type of testing, it is important to introduce some basic terms used
to describe the position of the sound and auditory events (see Blauert,
1999).This position is typically
described by a horizontal angle (azimuth, gamma) and vertical angle (elevation,
delta) from a point directly in front
of the listener.
Note from Figure
3
that azimuth and elevation both start at zero directly in front, and increase
in a counterclockwise fashion.
The
starting point is the intersection of two imaginary planes, the horizontal
and median planes. The horizontal
plane is the extension of the interaural axis, containing
Figure 3: Views of the auditory planes, azimuth angle and elevation angle
Interaural Difference Cues
As
was mentioned, two major localization cues involve the interaural level
and time differences between the left and right ear canal signals.
Yet,
the term “interaural time difference” is somewhat nebulous because it can
represent arrival, phase, and envelope temporal differences. Therefore
the terms will be defined as below:
·Interaural Arrival Time Difference (IATD): The difference in arrival time between left and right ear signals. This is due to the constant speed of sound with varied path length differences (see Figure 4).The sound typically reaches one ear and then must additionally go around the head to the opposing ear.
Figure 4: ITDs are caused primarily by path length differences
·Interaural
Phase Difference (IPD):
The difference in phase between left and right ear signals caused by different
arrival times.
For a periodic sound
(
),
IPD can have two different physical values:
either IATD or T - IATD.
Figure 5: Interaural
Phase Difference (IPD) has two physical values due to periodicity
·Interaural
Envelope Time Difference (IETD):The
temporal difference of the modulation pattern between the two ears. IETD
is independent of the carrier frequency.
Similar
to IPD, it also exhibits two physical values (see Figure
6).
·Interaural
Time Difference (ITD):
A generic term used to describe any of the above time differences. Typically
refers to the one that dominates the signal frequencies under discussion.
According
to Blauert (1999), continuous sounds under 1.6 kHz would be dominated by
IPD, while IETD has a definite influence above 1.6 kHz.
Although
IATD directly affects IPD, its only direct influence is to impulsive sounds.
With regards to interaural level differences (ILD), those frequencies with long wavelengths as compared to the 17.5 cm diameter head are relatively undisturbed. As the frequency of the sound increases (decreasing wavelength), it will begin to either reflect off or refract around the head (see Figure 7).ILD is additionally dependent on source position. This is because of the asymmetrical characteristics of the head and body, and also the properties of acoustical waves and the barriers they encounter.

Monaural Cues
Having
discussed the interaural cues, it is also important to realize the role
played by monaural cues.
These cues
are important because for every sound source position, there is a unique
group of points that shares the same path to the ears (Durrant & Lovrinic,
1984).These points are more commonly
know as the “cone of confusion,” and are represented by a hyperbola in
the horizontal plane and a cone in three-dimensional space (see Figure
8).Two
sound sources located on this cone would provide identical interaural cues;
thus the monaural cues allow listeners to differentiate between them.
Figure 8:
The cone of confusion is a set of
points which provides identical interaural cues
Scientists
have known for some time that it is possible to localize sounds with only
one ear.
Angell and Wite (1901)
compared the localization abilities of a normal binaural hearing individual
to one who was entirely deaf in one ear.
They
found that the monaural individual’s localization ability on the side of
the non-deaf ear was “not greatly inferior” (p. 236) to the normal hearing
individual.
However, hearing on
the side of the deaf ear was “extremely uncertain” (p. 243). For
both subjects, front/back confusions occurred often and in general, complex
sounds (whistles and bells) were more accurately localized than pure tones
(tuning forks).
Their final conclusion
stated that in monaural hearing, the external ear was responsible for contributing
“qualitative peculiarities,” (Angell & Wite, 1901, p. 246) to the sound,
which allowed proper localization to occur. A
more detailed analysis of the origin of monaural cues was not published
until Batteau (1967) and Blauert (1969).Blauert
described the operation of the pinna as a directionally dependent filter.
He
stated that it enhanced or reduced various spectral portions of the input
signal depending on the angle of vertical and horizontal incidence.
Blauert’s
experiments suggested that these spectral influences dominated localization
in fixed-head experiments; and that the actual location of the sound source
had little to do with its perceived location. For
instance, because sounds originating from overhead exhibit a peak in the
7 kHz range, a sound that is played in the horizontal plane with an artificial
peak at 7 kHz is perceived to originate from overhead.
Blauert
defined several sections of the auditory frequency range that behave this
way.
He called them “preference bands,”
and showed that the relative intensity of these bands is what dictates
fixed-head localization. Batteau
(1967) also discussed the influences of the pinna, but in terms of time-based
reflections.
He showed that sound
will reflect off the individual folds and cavities of the pinna, causing
replication of the original signal with very small time delays. Batteau
measured an almost linear relationship between azimuth and monaural pinna
delay ranging from 10 ?sec (on axis with the ears) to 90 ?sec (directly
in front) (p. 163).He also showed
that changing the elevation of the sound source influenced the amount and
concentration of pinna delay. Wright,
Hebrank, and Wilson (1974) reinforced the plausibility of Batteau’s theory
by showing that humans are sensitive to time delays as short as 20 ?sec
(p. 960).However, Middlebrooks
(1997) points out that time delays essentially cause spectral amplitude
modifications due to phase interactions of the original and delayed signals.
Thus,
researchers since Batteau’s time have focused on “spectral modifications,
rather than on time delays per se” (Middlebrooks, p. 78). Localization
Blur
Our
ability to detect changes in a sound source’s position is experimentally
measured as the minimum audible angle (MAA), also called “localization
blur.” There are various methods of experimentation, but in essence, the
localization cues are varied from a fixed point and the MAA is calculated
to be the minimal amount of change that a statistically significant number
of listeners can detect.
The MAA
can be measured for both horizontal and vertical directions. Blauert
(1999) has summarized much of the localization blur experiments, including
influential work from Stevens and Newman (1936) and Mills (1958).From
this summary, Blauert suggests that our most acute sense of localization
is directly in front (0° azimuth). In that position he states “the
absolute lower limit for the localization blur is, as shown, about 1º”
(p. 38).Schmidt, Vangemert, De Vries,
and Duyff (1953) also state that changes in azimuth for pure tones close
to the median plane were “considerably less than one degree” (p. 16). Precision
in locating a source is affected by its spatial location and frequency
content.
With regards to source
location, MAA in the horizontal direction (azimuth) is generally considered
to be more accurate than that of the vertical plane (elevation) (Strybel
and Fujimoto, 2000).On the horizontal
plane, MAA is smallest directly in front of the listener where it intersects
the median plane.
As the source
moves around the head, MAA slowly increases to a maximum on axis with the
ears, and then decreases again as the sound continues towards the rear
of the listener.
In the vertical
plane, MAA is again most accurate directly in front near the horizontal
plane.
It similarly increases to
its maximum directly above the listener’s head before decreasing to its
secondary minimum directly behind the listener.
MAA
also varies with the signal’s spectral content.
Testing,
such as Mills (1958) has shown that for pure tones, the middle frequency
range generally has a larger MAA than either low or high frequencies.
In
addition, narrowband signals and sinusoids are intrinsically more difficult
to localize than wideband signals because of the limited number of localization
cues the brain has to consider. This
discussion above describes the general nature of localization blur, but
in reality it is more complex.To
get an idea for this complexity, consider Figure
9
from Blauert (1999).This is the
result of test subjects aligning the azimuth of a sinusoidal (solid) and
octave-band (dotted) sound source to that of three wideband sources fixed
at azimuths of 0º and ± 40º from midline. Notice
how the perceived azimuth varies with frequency and also between the two
narrow band test signals.
While the
results seem to change somewhat unpredictably with frequency, the 0º
incident typically has smaller variations than either the 40º or 320º
(i.e. -40º) positions.
Also,
the results fall into a finite area around the wideband source, which is
an indication of localization blur.
The
white noise should be considered the absolute position of the source, whereas
the difference in azimuth for the sinusoid/octave band represents the localization
blur.
For example, a 5kHz sinusoid
(solid) at ~44º azimuth is perceived to share the same location as
the white noise source at 40º.This
suggests that the MAA for a 5 kHz sinusoid is approximately 4º.In
comparison, a 5kHz octave band (dotted) seems to have a MAA around 14º
(located at ~26º azimuth) when compared to the same white noise source.
Figure 9: Horizontal
plane localization of sinusoidal (solid)
and narrow band noise (dotted) as compared to a reference sound of wide-band
noise at 0, 40, and 320 degree azimuth locations.
Shown
versus frequencies to 5 kHz.
Reprinted
from Blauert (1999) with permission from the MIT press.
Created February 2003 by Rob
Hartman
Copyright (C) 2003