
Over
the last century, auditory scientists have made great progress in understanding
the complexities of human hearing.
However,
as with any sensory behavior, there will always be uncertainty between
what is perceived and what can be physical measured. Physically,
the analysis of human hearing can be reduced to studying the signals entering
the left and right ear canals. Yet
perceptually, these two complex signals contain many encoded messages that
the brain deciphers into audible qualities such as loudness, pitch, timber
and spatial origin.
This
ability for complex processing is especially applicable during localization
- determining the spatial origin of a sound event.
It
has been shown through experiments that humans do not have an absolute
sense of a sound’s location.
Because
of this, scientists explicitly differentiate between the “sound event,”
where the sound physically originates, and the “auditory event,” where
the sound is perceived to originate.
For
a single sound source, the auditory and sound event often occur in close
proximity. However for multiple
sound sources, the overall perceived event depends on many factors.
For
instance, if each sound event is unique in location, pitch and timbre -
they are typically perceived as independent sounds with their own spatial
origins. Yet, if the sounds are similar
enough, they might be integrated into one perceived auditory event with
one collective spatial location.
In
this multi-source situation, the overriding perception is based on the
agreement of the localization cues and the correlation of the sound sources.
The
research for this thesis has focused on studying the localization of a
stereo image when certain frequencies of the signal have been spatially
relocated in a multi-source
sound system. This is a fairly common
occurrence in most consumer electronic systems.
Often
the signal is split into several frequency bands, each sent to a transducer
in a different spatial position.
Consider
some “three-way” loudspeaker systems shown in Figure
1,
which shows the spatial relocation of low (L), middle (M), and high (H)
frequencies for a stereo, automotive, and home theater surround audio system.
Note that in the home stereo system of Figure 1a, the left loudspeaker enclosure has three transducers which share the same horizontal position but have different vertical positions. The low frequencies come from the vertically lowest portion of the enclosure whereas the tweeter is significantly above it at the top. An automotive audio system (Figure 1b) represents an even more severe condition, having different vertical and horizontal origins for all three frequency bands. Moreover, in surround sound home theater systems (Figure 1c), the lowest frequencies are often separated to a powered subwoofer in an altogether different horizontal and vertical position. This is deemed acceptable because low frequencies are said to be “hard to localize.” This popular, yet somewhat oversimplified observation will be discussed in more detail later.
Figure1:
Views of a listener from the Rear (left) and Top (right) and the spatial
relocation
of high (H), mid (M), and low (L)
frequencies. Shown are typical (a) Stereo,
(b) Automotive, and (c) Home Theater Surround loudspeaker
systems
As
mentioned, the focus of this research was to study the perceived location
of a stereo image when portions of the signal are moved to a different
spatial location. This idea originated
in the suggested reading of a paper on Digital Theater System’s (DTS) surround
sound encoding called “Coherent Acoustics.”
In
the paper, Smyth (1999) specifically mentions, “experimental evidence suggests
that it is difficult to localize mid-to-high frequency signals above about
2.5 kHz, and therefore any stereo imagery is largely dependent on the accurate
reproduction of only the low-frequency components of the audio signal”
(p. 18).This obviously seems to
contradict popular opinion, and warranted a preliminary investigation into
the claims Smyth made.
Early
investigation into this topic included well-known texts such as Blauert
(1999), Yost (1987) and Begault (1994), as well as some previous UM graduate
research by West (1998) and Ballman (1990).There
are also many good summary articles, such as those from Hartmann (1999)
and Kendall (1995).These and other
sources, suggest that Smyth’s (1999) hypothesis of a sound’s spatial location
being dominated by low over high frequencies localization cues is well
established among auditory scientists.
To
further study this idea, it was investigated whether high frequencies could
be limited to a mono-tweeter, the same way that low frequencies are sent
to a mono-subwoofer. A typical stereo
loudspeaker system was set up, adding a separate tweeter directly in front
of the listener (see Figure
2).The
stereo signal was processed so that all frequencies above 10 kHz were sent
to the central tweeter, while frequencies below 10 kHz were played from
their respective left/right speakers. Using
a wide range of music, it became clear that while the spectral balance
was kept mostly intact, the image localization of high-pitched instruments
was sometimes different.
Figure 2: Preliminary experiments tested “mono-ized” high frequencies
In
fact, the image’s sound stage position seemed to depend on the spectral
energy distribution of its frequencies relative to the 10 kHz crossover.
Essentially,
the more high frequency energy the instrument had, the farther towards
the tweeter (center sound stage) it was pulled.
For
example, a hard-left-panned cello was correctly localized at the left because
of dominant lower frequency energy, while a left-panned cymbal crash was
now heard somewhere between left and center.
This
result mandated a new direction for the thesis.
One
option would have been to investigate how noticeable this type of image
shift actually was. After all, would
most people even realize the cymbal image had shifted towards the center?
While
this line of questioning might make interesting marketing data (and was
actually performed for a small portion of this thesis), it would be difficult
to scientifically explain the results because of so many potential variables.
Instead,
this thesis compared the relative impact that spatially relocating low
versus high frequencies have on the perceived horizontal location of a
stereo image. The experimental setup
would be a less severe version of that shown in Figure
2,
bringing a full range center speaker much closer to one of the stereo pair.
Also,
rather than moving only high frequencies, the objective would be to move
various portions of the audible spectrum to this offset speaker to see
which had the greatest effect on the perceived location.
The
details of the experiment will be covered later in this paper.
To
present this research, the topic of localization is first introduced.
This
includes a discussion of the various localization cues and the results
of historical experiments measuring human accuracy, precision, and sensitivity
to those cues. Yet for practical
purposes, it is more important to consider the physical nature of the cues
and their relative perceptual significance.
Also,
the topic of auditory scene analysis is introduced as it relates to this
research. Scene analysis shows how
conflicting spatial and spectral cues, as well as the acoustic space, might
impact the “integration” (fusion) and localization of sound events. Next,
the specific experimental setup and methodologies used in this research
are reviewed. This is followed by
the results, which are presented and analyzed in various forms.
Finally,
conclusions are made and future areas of research involving this topic
are suggested.
Created February 2003 by Rob
Hartman
Copyright (C) 2003