Chapter 1: Introduction

TOC or Beginning

Over the last century, auditory scientists have made great progress in understanding the complexities of human hearing.  However, as with any sensory behavior, there will always be uncertainty between what is perceived and what can be physical measured. Physically, the analysis of human hearing can be reduced to studying the signals entering the left and right ear canals.  Yet perceptually, these two complex signals contain many encoded messages that the brain deciphers into audible qualities such as loudness, pitch, timber and spatial origin.

This ability for complex processing is especially applicable during localization - determining the spatial origin of a sound event.  It has been shown through experiments that humans do not have an absolute sense of a sound’s location.  Because of this, scientists explicitly differentiate between the “sound event,” where the sound physically originates, and the “auditory event,” where the sound is perceived to originate.

For a single sound source, the auditory and sound event often occur in close proximity.  However for multiple sound sources, the overall perceived event depends on many factors.  For instance, if each sound event is unique in location, pitch and timbre - they are typically perceived as independent sounds with their own spatial origins.  Yet, if the sounds are similar enough, they might be integrated into one perceived auditory event with one collective spatial location.  In this multi-source situation, the overriding perception is based on the agreement of the localization cues and the correlation of the sound sources.

The research for this thesis has focused on studying the localization of a stereo image when certain frequencies of the signal have been spatially relocated in a multi-source sound system.  This is a fairly common occurrence in most consumer electronic systems.  Often the signal is split into several frequency bands, each sent to a transducer in a different spatial position.  Consider some “three-way” loudspeaker systems shown in Figure 1, which shows the spatial relocation of low (L), middle (M), and high (H) frequencies for a stereo, automotive, and home theater surround audio system.

Note that in the home stereo system of Figure 1a, the left loudspeaker enclosure has three transducers which share the same horizontal position but have different vertical positions.  The low frequencies come from the vertically lowest portion of the enclosure whereas the tweeter is significantly above it at the top.  An automotive audio system (Figure 1b) represents an even more severe condition, having different vertical and horizontal origins for all three frequency bands.  Moreover, in surround sound home theater systems (Figure 1c), the lowest frequencies are often separated to a powered subwoofer in an altogether different horizontal and vertical position.  This is deemed acceptable because low frequencies are said to be “hard to localize.”  This popular, yet somewhat oversimplified observation will be discussed in more detail later.



Figure1: Views of a listener from the Rear (left) and Top (right) and the spatial relocation
of high (H), mid (M), and low (L) frequencies.  Shown are typical (a) Stereo, 
(b) Automotive, and (c) Home Theater Surround loudspeaker systems

            As mentioned, the focus of this research was to study the perceived location of a stereo image when portions of the signal are moved to a different spatial location.  This idea originated in the suggested reading of a paper on Digital Theater System’s (DTS) surround sound encoding called “Coherent Acoustics.”  In the paper, Smyth (1999) specifically mentions, “experimental evidence suggests that it is difficult to localize mid-to-high frequency signals above about 2.5 kHz, and therefore any stereo imagery is largely dependent on the accurate reproduction of only the low-frequency components of the audio signal” (p. 18).This obviously seems to contradict popular opinion, and warranted a preliminary investigation into the claims Smyth made.

Early investigation into this topic included well-known texts such as Blauert (1999), Yost (1987) and Begault (1994), as well as some previous UM graduate research by West (1998) and Ballman (1990).There are also many good summary articles, such as those from Hartmann (1999) and Kendall (1995).These and other sources, suggest that Smyth’s (1999) hypothesis of a sound’s spatial location being dominated by low over high frequencies localization cues is well established among auditory scientists.

To further study this idea, it was investigated whether high frequencies could be limited to a mono-tweeter, the same way that low frequencies are sent to a mono-subwoofer.  A typical stereo loudspeaker system was set up, adding a separate tweeter directly in front of the listener (see Figure 2).The stereo signal was processed so that all frequencies above 10 kHz were sent to the central tweeter, while frequencies below 10 kHz were played from their respective left/right speakers.  Using a wide range of music, it became clear that while the spectral balance was kept mostly intact, the image localization of high-pitched instruments was sometimes different.

Figure 2Preliminary experiments tested “mono-ized” high frequencies

In fact, the image’s sound stage position seemed to depend on the spectral energy distribution of its frequencies relative to the 10 kHz crossover.  Essentially, the more high frequency energy the instrument had, the farther towards the tweeter (center sound stage) it was pulled.  For example, a hard-left-panned cello was correctly localized at the left because of dominant lower frequency energy, while a left-panned cymbal crash was now heard somewhere between left and center.

This result mandated a new direction for the thesis.  One option would have been to investigate how noticeable this type of image shift actually was.  After all, would most people even realize the cymbal image had shifted towards the center?  While this line of questioning might make interesting marketing data (and was actually performed for a small portion of this thesis), it would be difficult to scientifically explain the results because of so many potential variables.

Instead, this thesis compared the relative impact that spatially relocating low versus high frequencies have on the perceived horizontal location of a stereo image.  The experimental setup would be a less severe version of that shown in Figure 2, bringing a full range center speaker much closer to one of the stereo pair.  Also, rather than moving only high frequencies, the objective would be to move various portions of the audible spectrum to this offset speaker to see which had the greatest effect on the perceived location.  The details of the experiment will be covered later in this paper.

To present this research, the topic of localization is first introduced.  This includes a discussion of the various localization cues and the results of historical experiments measuring human accuracy, precision, and sensitivity to those cues.  Yet for practical purposes, it is more important to consider the physical nature of the cues and their relative perceptual significance.  Also, the topic of auditory scene analysis is introduced as it relates to this research.  Scene analysis shows how conflicting spatial and spectral cues, as well as the acoustic space, might impact the “integration” (fusion) and localization of sound events.  Next, the specific experimental setup and methodologies used in this research are reviewed.  This is followed by the results, which are presented and analyzed in various forms.  Finally, conclusions are made and future areas of research involving this topic are suggested.

Top or TOC or Beginning
 
 
 
 


 
 Created  February 2003 by Rob Hartman
Copyright (C) 2003