Chapter 5: Experimentation

Experimental Conditions        Test Signals       Experimental Variables        

Test Methodology         Test Equipment              TOC or Beginning


         Having detailed the background theory on the various localization topics this thesis surrounds, it is now important to lay out the experiments used in this research.  This includes an explanation of the listening tests and the thoughts that went into designing them.  The following will present the experimental conditions as well as the test signals and variables used for this research.  Also, the equipment and the test methodology will be discussed.

Top


Experimental Conditions

 The goal of the listening tests was to identify whether low or high frequencies have a greater impact on the localization of a stereo image.  In essence, the listeners would be asked to comment on the change in position of the image between two different conditions.  The two conditions were created by relocating different frequency bands (i.e. low vs. high) of the left stereo signal to a new spatial position.

            With this in mind, the devised test setup consisted of a right speaker (R) and its complementary left speaker (L) at the same distance (2m) and symmetrical angle from midline.  A third speaker, called the “Spatially Relocated” (SR) speaker was positioned at the same distance to the listener, but closer to midline than the left speaker by a small azimuth delta (see Figure 14).

Figure 14:Physical setup of test

                The stereo speaker angular separation was partially based on this thesis’ ties to automotive applications.  Thus, it was desirable to have them wider than a typical stereo system, which Blauert (1999) suggested as ± 30º from midline.  With some additional constraints from room and positioning factors, the resulting speaker separation was selected at L/R of ±40º from midline.  The relative position of the SR speaker was also important.  There needed to be enough angular separation to cause shifts of the stereo image; yet not too much distance such that the L and SR signals might segregate.  Thus, the SR speaker must be definitely outside the typical MAA.  Several papers on this topic showed a threshold to be in the area of 5-10º (Mills, 1958; Stevens & Newman, 1936).Thus it was felt that 15º would be far enough to be noticed without splitting the L and SR auditory streams. 

Top


Test Signals

    With the setup decided, it was next important to choose test signals that would give useful data.  The spectral content and distribution of the signal was the most crucial factor.  It was felt that a wideband signal with a strong center image would make an ideal test track.  However, this is difficult to find in popular music, which is why white noise was chosen as the primary test signal (track3_white_noise_bursts.wav).

    Because white noise is not particularly interesting to listen to, a music track was also used in an attempt to collect more data while keeping the listener’s attention.  Of course, it was expected that the results of the music track might be difficult to explain because of the many variables that music introduces, such as time-varying spectral content.

     The listener was tested with the music passage first because it contained stereo images which were easier to conceptualize than those created by white noise.  It was thus desirable to find a music track with a strong, consistent central stereo image and a reasonably wide bandwidth.  A fairly simple, almost monophonic sound stage would also make it easier to notice shifts in the image’s position.  The female voice seemed to reasonably comply with all of these constraints, thus an 8 second clip (0:27-0:34) from Joan Baez’s “Diamonds and Rust” was used (track7_lady_sing.wav)

                In order to present a comprehensive description of this track, consider the spectrogram in Figure 15.The amplitude of the signal is represented by relative color intensity as shown in the color bar indicator, while time is shown on the x-axis and frequency on the y-axis.  This was performed using Matlab (see Appendix C for code).  The total energy of the L/R channels was also calculated for the entire signal (Figure 16) and the various subbands (Figure 17), including the grouped subband representations of CD and CDE (Figure 18).

Figure 15:Spectrogram of music passage L (top) and R (bot)
 
 
 
 

Figure 16:Music passage temporal (top) and total energy (bottom)
 


Figure 17: Music passage’s subband energy

Figure 18: Music passage’s combined subband energy

Moving on to the white noise test signal, it was decided that short noise bursts would be even easier to localize that a continuous noise segment.  This is because of the additional transient localization cues that occur.  Therefore, six 250 ms bursts of white noise, with 20 ms onset/offset ramps and 300 ms silent intervals, were used as the primary test signal.  The spectrogram of the noise bursts can be seen in Figure 19.

Figure 19:: Spectrogram of white noise passage


        The next decision involved choosing which portion of the test signals would be relocated to the SR channel.  During the background localization research, it was noticed that the localization cues were often discussed in terms of the range of frequencies they were most effective in.  Therefore, it seemed reasonable that these points would create bands of frequencies that were known to have a dominant localization cue.  This resulted in defining the following bands:

  • Band A = 20-800 Hz

  • Band B = 800-1600 Hz

  • Band C = 1600-5000 Hz

  • Band D = 5000-12000 Hz

  • Band E = 12000-20000 Hz 

                Recall that localization cues are generically categorized into ITDs, ILDs, and pinnae effects.  Blauert (1999) states that the localization effects of IATD/IPD greatly decrease above 800 Hz, ultimately having no effect above 1600 Hz.  This led to the development of bands A and B.  Next, the pinnae effects are thought to have the most impact from 5-12 kHz, which is what band D represents.  Band E represents the highest frequency band and is also a conveniently close to the range that Smyth (1999) discussed in his paper on DTS’ Coherent Acoustics.  Band C was a remnant of the other bands, but is also known to contain monaural cues caused by the torso.

Top


Experimental Variables

It is necessary to provide some insight into the variables used in the experiments.  Potential variables included loudspeaker position (horizontal vs. vertical), signal intensity, listening material, acoustic space, and loudspeakers.  As mentioned, this thesis would focus on localization cues versus frequency.  Thus, it was decided that the SR frequency bands would be the lone variables.  Yet, five frequency bands still create an excessively large number of combinations to potentially test in a span of thirty minutes.  Again keeping with the theme of low vs. high frequencies, the following ten test conditions were implemented:

  • E vs. A

  • E vs. AB

  • DE vs. Stereo (STR)

  • DE vs. A

  • DE vs. AB

  • DE vs. ABC

  • CDE vs. Stereo (STR)

  • CDE vs. A

  • CDE vs. AB

Top




Test Methodology

                For each of the conditions listed above, the listener would be sequentially presented with two versions of the signal and asked to comment on the movement of the stereo image.  Thus “E vs. A” would first present “band E” playing from the SR speaker while “Left - E” played from the L speaker.  This would be followed by a short silent interval.  The listener would then be presented, in this case, with “band A” playing from the SR speaker while “Left - A” played from the left speaker.  Again, the R speaker always played the original right stereo signal.   To reiterate this technique, consider Figure 20.  This shows how certain bands of the L signal are relocated to the SR channel.  In fact, the speaker configuration previously shown in Figure 14 is ideal because without the SR channel, a solid center image can also be presented to the listener.  However, the test’s primary function was to move spectral energy of the L channel to the SR speaker in order to move the stereo image. Of course, the presumption was that the stereo image would shift to varying degrees to the right, based on the competing L and SR localization cues.

Figure 20Division of Frequencies between L and SR speaker

                With the experimental details and variables decided, it was necessary to also standardize the process of the experiment.  Thus, before the test subject entered the room, the speaker configuration was hid behind a curtain.  Upon entering the room, the listeners were asked to sit in a fixed chair and wear sunglasses with opaque lenses.  A laser-pointing device was used to align the listener’s ear canal with the top of the speaker’s woofer.  The curtain was then removed, placing the listener at the midline of the stereo (L/R) speaker array (see Figure 14).  They were then read the following passage:
 

 

    I have put you in an ideal listening position.  Please try to keep from rotating the chair, or moving your head or body.  These listening tests are investigating the human ability to locate sounds.  This is generically called localization.  As you may have noticed in your casual music listening, a stereo system can recreate realistic audio “images” where the singer, or instrument, sounds as if it is coming from some point in front of you, but between the actual loudspeakers.

For instance, I will play you a short passage of music to illustrate.  Notice that the piano sound is directly in front of you, the triangle comes from your left and the cello comes from your right. [play track 1: track1_introimages.wav]  Now listen to a modified version of that track.  Notice that the piano is no longer directly in front of you, but has moved off to your right.  These are the kinds of differences I am going to ask you to pay attention to [play track 2: track2_introshift.wav].

To begin with, I will give you a couple of practice runs.  I ask that you pay particular attention to the female voice in the center.  I will play you two versions of this clip, with a short pause in between them.  Then I will ask you to comment on the position of the singer in the second clip relative to the first.

Realize that it is the RELATIVE position, left/right/up/down/far/ near or any combination of these, which I am asking you for.  Thus, if both images sound as if they are to the right of center, I am asking you to tell me if the second image you hear has moved to the left or right, or other direction, of the first image.  If they don’t change, simply tell me they are the same.

You may also experience a feeling where part of the sound comes from one location and another part of that same image seems to come from a different location.  This is called a “split” image, and I would ask that you tell me when you notice this occurring.

    As mentioned, the musical passage was played first because it was easier to identify localization shifts.  This was followed by the white noise bursts test signal. For each test subject, the variables (varied bands played from the SR channel) were randomized within each set of trials.  The Arcade DSP Amplifier presets were used to switch between the variables while the CD player would repeat the same track twice per trial.  For a brief description of the Arcade Amplifier’s capabilities, see the following section.

    However, because the amplifier was limited to 8 presets (and 10 variables were desired), the tests were further broken into two sub trials (see Table 1).The “DE vs.” variables were repeated partially to even out the two sub trials, but also because it was felt this comparison was the crux of the thesis.  Obviously more trials provide a higher statistical confidence to the results.

Table 1:Trial subband variables


Top


Test Equipment

                 The room used throughout these experiments was located in the top floor of the Gusman Concert Hall on the university’s campus.  It is affectionately called the “dead room,” because of its pseudo-anechoic response created by acoustical treatment on the walls.  The general dimensions and shape of the room can be seen in Figure 21, while Appendix D has a measured frequency response.

Figure 21:Gusman "dead" room basic dimensions

Next, the following equipment was used for the listening experiments:

  • Loudspeakers: Miller & Kreisler (M&K) model MPS-1610 loudspeakers are a 2-way near/mid-field monitor with a 1” tweeter and 6.5” (4?) woofer.  Their frequency response is 80Hz-20kHz ± 2dB, with a passive crossover at 1.2 kHz.  MSRP is $650 each.  See Appendix E for a measured frequency response.

  • Speaker Stands: Studio Tech SN-A adjustable metal satellite speaker stands

  • CD Player: “HP CD-Writer Plus” CD-ROM drive was used to play audio CDs directly to the Power Amp via line level output.

  • Amplifier: Proprietary DSP-based power amplifier intended for multi-channel car audio.  Accompanied by Windows-based “Arcade v1.2” software provided by Kevin Heber of Delphi Automotive Systems.  Allowed using stereo input to distribute and manipulate multi-channel audio outputs.

  • Power Supply: Hewlett Packard (HP) supply up to10A.

  • Windows Personal Computer (PC): Windows-based PC to run Power Amp software and CD player.


The electrical setup of the experiment is shown in Figure 22.


Figure 22:Electrical schematic of test setup

The DSP-based amplifier was a critical piece of equipment, used for both developing and testing ideas as well as the final listening experiments (see Figure 23and Figure 24).It has the ability to perform multi-input and multi-output frequency and temporal signal processing of up to four stereo inputs and twelve stereo outputs.  Besides signal processing, it is also a power amplifier with the capability of 12x35 watts into 8 ohms at < 1% THD+N.The amplifier had a serial interface to a PC, which ran a software program called “Arcade.”  The user interface allowed the input/output and signal processing parameters to be varied and tested.  It also allowed up to 8 different preset amplifier configurations to be instantly changed in real time.  This was used during listening tests, to change the variables of the experiment between test tracks.
 

Figure 23:Arcade’s Main Program Screen

Figure 24:Arcade’s Amplifier Configuration screen with 8 selector bars (at top)



Top or TOC or Beginning
 
 


Created  February 2003 by Rob Hartman
Copyright (C) 2003