Having detailed the background
theory on the various localization topics this thesis surrounds, it is
now important to lay out the experiments used in this research. This
includes an explanation of the listening tests and the thoughts that went
into designing them. The following will present the experimental
conditions as well as the test signals and variables used for this research.
Also, the equipment and the test methodology will be discussed.
Experimental Conditions
The goal of the listening tests was to identify whether low or high frequencies have a greater impact on the localization of a stereo image. In essence, the listeners would be asked to comment on the change in position of the image between two different conditions. The two conditions were created by relocating different frequency bands (i.e. low vs. high) of the left stereo signal to a new spatial position.
With this in mind, the devised test setup consisted of a right speaker (R) and its complementary left speaker (L) at the same distance (2m) and symmetrical angle from midline. A third speaker, called the “Spatially Relocated” (SR) speaker was positioned at the same distance to the listener, but closer to midline than the left speaker by a small azimuth delta (see Figure 14).
Figure 14:Physical setup of test
The stereo speaker angular separation was partially based on this thesis’ ties to automotive applications. Thus, it was desirable to have them wider than a typical stereo system, which Blauert (1999) suggested as ± 30º from midline. With some additional constraints from room and positioning factors, the resulting speaker separation was selected at L/R of ±40º from midline. The relative position of the SR speaker was also important. There needed to be enough angular separation to cause shifts of the stereo image; yet not too much distance such that the L and SR signals might segregate. Thus, the SR speaker must be definitely outside the typical MAA. Several papers on this topic showed a threshold to be in the area of 5-10º (Mills, 1958; Stevens & Newman, 1936).Thus it was felt that 15º would be far enough to be noticed without splitting the L and SR auditory streams.
With the setup decided, it was next important to choose test signals that
would give useful data. The spectral content and distribution of
the signal was the most crucial factor. It was felt that a wideband
signal with a strong center image would make an ideal test track.
However, this is difficult to find in popular music, which is why white
noise was chosen as the primary test signal (track3_white_noise_bursts.wav).
Because white noise is not particularly interesting to listen to, a music
track was also used in an attempt to collect more data while keeping the
listener’s attention. Of course, it was expected that the results
of the music track might be difficult to explain because of the many variables
that music introduces, such as time-varying spectral content.
The listener was tested with the music passage first because it contained
stereo images which were easier to conceptualize than those created by
white noise. It was thus desirable to find a music track with a strong,
consistent central stereo image and a reasonably wide bandwidth.
A fairly simple, almost monophonic sound stage would also make it easier
to notice shifts in the image’s position. The female voice seemed
to reasonably comply with all of these constraints, thus an 8 second clip
(0:27-0:34) from Joan Baez’s “Diamonds and Rust” was used (track7_lady_sing.wav)
In order to present a comprehensive description of this track, consider
the spectrogram in Figure
15.The
amplitude of the signal is represented by relative color intensity as shown
in the color bar indicator, while time is shown on the x-axis and frequency
on the y-axis. This was performed using Matlab (see Appendix C for
code). The total energy of the L/R channels was also calculated for
the entire signal (Figure
16)
and the various subbands (Figure 17),
including the grouped subband representations of CD and CDE (
Test
Signals
Figure 15:Spectrogram
of music passage L (top) and R (bot)
Figure
16:Music
passage temporal (top) and total energy (bottom)
Figure 17: Music passage’s subband energy
Figure 18: Music passage’s combined subband energy
Moving
on to the white noise test signal, it was decided that short noise bursts
would be even easier to localize that a continuous noise segment.
This is because of the additional transient localization cues that occur.
Therefore, six 250 ms bursts of white noise, with 20 ms onset/offset ramps
and 300 ms silent intervals, were used as the primary test signal.
The spectrogram of the noise bursts can be seen in Figure
19.
Figure 19:: Spectrogram of white noise passage
The
next decision involved choosing which portion of the test signals would
be relocated to the SR channel. During the background localization
research, it was noticed that the localization cues were often discussed
in terms of the range of frequencies they were most effective in.
Therefore, it seemed reasonable that these points would create bands of
frequencies that were known to have a dominant localization cue.
This resulted in defining the following bands:
Band A = 20-800 Hz
Band B = 800-1600 Hz
Band C = 1600-5000 Hz
Band D = 5000-12000 Hz
Band E = 12000-20000 Hz
It is necessary to provide some insight into the variables used in the experiments. Potential variables included loudspeaker position (horizontal vs. vertical), signal intensity, listening material, acoustic space, and loudspeakers. As mentioned, this thesis would focus on localization cues versus frequency. Thus, it was decided that the SR frequency bands would be the lone variables. Yet, five frequency bands still create an excessively large number of combinations to potentially test in a span of thirty minutes. Again keeping with the theme of low vs. high frequencies, the following ten test conditions were implemented:
E vs. A
E vs. AB
DE vs. Stereo (STR)
DE vs. A
DE vs. AB
DE vs. ABC
CDE vs. Stereo (STR)
CDE vs. A
CDE vs. AB
For each of the conditions listed above, the listener would be sequentially
presented with two versions of the signal and asked to comment on the movement
of the stereo image. Thus “E vs. A” would first present “band E”
playing from the SR speaker while “Left - E” played from the L speaker.
This would be followed by a short silent interval. The listener would
then be presented, in this case, with “band A” playing from the SR speaker
while “Left - A” played from the left speaker. Again, the R speaker
always played the original right stereo signal.
To reiterate this technique, consider Figure
20.
This shows how certain bands of the L signal are relocated to the SR channel.
In fact, the speaker configuration previously shown in Figure
14
is ideal because without the SR channel, a solid center image can also
be presented to the listener. However, the test’s primary function
was to move spectral energy of the L channel to the SR speaker in order
to move the stereo image. Of course, the presumption was that the stereo
image would shift to varying degrees to the right, based on the competing
L and SR localization cues.
Figure 20: Division of Frequencies between L and SR speaker
With the experimental details and variables decided, it was necessary to
also standardize the process of the experiment. Thus, before the
test subject entered the room, the speaker configuration was hid behind
a curtain. Upon entering the room, the listeners were asked to sit
in a fixed chair and wear sunglasses with opaque lenses. A laser-pointing
device was used to align the listener’s ear canal with the top of the speaker’s
woofer. The curtain was then removed, placing the listener at the
midline of the stereo (L/R) speaker array (see Figure
14).
They were then read the following passage:
I have put you in an ideal listening position. Please try to keep
from rotating the chair, or moving your head or body. These listening
tests are investigating the human ability to locate sounds. This
is generically called localization. As you may have noticed in your
casual music listening, a stereo system can recreate realistic audio “images”
where the singer, or instrument, sounds as if it is coming from some point
in front of you, but between the actual loudspeakers.
For
instance, I will play you a short passage of music to illustrate.
Notice that the piano sound is directly in front of you, the triangle comes
from your left and the cello comes from your right. [play track 1: track1_introimages.wav]
Now listen to a modified version of that track. Notice that the piano
is no longer directly in front of you, but has moved off to your right.
These are the kinds of differences I am going to ask you to pay attention
to [play track 2: track2_introshift.wav].
To
begin with, I will give you a couple of practice runs. I ask that
you pay particular attention to the female voice in the center. I
will play you two versions of this clip, with a short pause in between
them. Then I will ask you to comment on the position of the singer
in the second clip relative to the first.
Realize
that it is the RELATIVE position, left/right/up/down/far/ near or any combination
of these, which I am asking you for. Thus, if both images sound as
if they are to the right of center, I am asking you to tell me if the second
image you hear has moved to the left or right, or other direction, of the
first image. If they don’t change, simply tell me they are the same.
You
may also experience a feeling where part of the sound comes from one location
and another part of that same image seems to come from a different location.
This is called a “split” image, and I would ask that you tell me when you
notice this occurring.
As mentioned, the musical passage was played first because it was easier
to identify localization shifts. This was followed by the white noise
bursts test signal. For each test subject, the variables (varied bands
played from the SR channel) were randomized within each set of trials.
The Arcade DSP Amplifier presets were used to switch between the variables
while the CD player would repeat the same track twice per trial.
For a brief description of the Arcade Amplifier’s capabilities, see the
following section.
However, because the amplifier was limited to 8 presets (and 10 variables
were desired), the tests were further broken into two sub trials (see Table
1).The
“DE vs.” variables were repeated partially to even out the two sub trials,
but also because it was felt this comparison was the crux of the thesis.
Obviously more trials provide a higher statistical confidence to the results. Table
1:Trial
subband variables
The room used throughout these experiments was located in the top floor
of the Gusman Concert Hall on the university’s campus. It is affectionately
called the “dead room,” because of its pseudo-anechoic response created
by acoustical treatment on the walls. The general dimensions and
shape of the room can be seen in Figure
21,
while Appendix D has a measured frequency response.
Figure
21:Gusman
"dead" room basic dimensions Next,
the following equipment was used for the listening experiments:
Loudspeakers:
Miller & Kreisler (M&K) model MPS-1610 loudspeakers are a 2-way
near/mid-field monitor with a 1” tweeter and 6.5” (4?) woofer.
Their frequency response is 80Hz-20kHz ± 2dB, with a passive crossover
at 1.2 kHz. MSRP is $650 each. See Appendix E for a measured
frequency response.
Speaker Stands: Studio Tech SN-A adjustable metal satellite speaker stands
CD Player: “HP CD-Writer Plus” CD-ROM drive was used to play audio CDs directly to the Power Amp via line level output.
Amplifier: Proprietary DSP-based power amplifier intended for multi-channel car audio. Accompanied by Windows-based “Arcade v1.2” software provided by Kevin Heber of Delphi Automotive Systems. Allowed using stereo input to distribute and manipulate multi-channel audio outputs.
Power Supply: Hewlett Packard (HP) supply up to10A.
Windows Personal Computer (PC): Windows-based PC to run Power Amp software and CD player.
The electrical setup of
the experiment is shown in Figure
22.
Figure 22:Electrical schematic of test setup
The DSP-based amplifier was a critical piece of equipment, used for both developing and testing ideas as well as the final listening experiments (see Figure 23and Figure 24).It has the ability to perform multi-input and multi-output frequency and temporal signal processing of up to four stereo inputs and twelve stereo outputs.
Besides signal processing, it is also a power amplifier with the capability of 12x35 watts into 8 ohms at < 1% THD+N.The amplifier had a serial interface to a PC, which ran a software program called “Arcade.” The user interface allowed the input/output and signal processing parameters to be varied and tested. It also allowed up to 8 different preset amplifier configurations to be instantly changed in real time. This was used during listening tests, to change the variables of the experiment between test tracks.Figure
23:Arcade’s
Main Program Screen
Created February 2003 by Rob
Hartman
Copyright (C) 2003