Chapter 6: Results and Analysis

            Experimental Results                                      Analysis                                            Analysis - Loudness Calculations

Analysis - Loudness Experiments                          Analysis - ABX Testing                               TOC or Beginning       

        Fourteen test subjects completed the listening tests previously described.  They were all students in the Music Engineering or Audio Engineering programs at the University of Miami.  All were assumed to have normal hearing abilities, but should be considered untrained listeners.  However, most were musicians and thus could be considered to have above average abilities from the typical untrained listener.

        The test subjects were asked to comment on the relative movement of a centrally located audio image caused by spatially relocating various bands of the left stereo channel.  They were not limited in the vocabulary of their response, however their answers were interpreted and entered into 8 different image movement categories (or combinations of): No Shift, Right, Left, Up, Down, Near, Far, Split.  Occasionally, due to time constraints a few of the trials would be skipped, which caused some variation in the number of total test trials. 

Top

Experimental Results

    The following plots include a graphical representation of each listening test (see Appendix A for complete results).  They begin with a summary of the “music track” results, followed by a plot for each individual SR band that was tested with the music track.  Then, the summary plot of the “noise burst track” is presented, similarly followed by the results of each band for the noise track.  Obviously, the number of “no shift,” “left,” and “right” responses were of primary interest for this research.  However, “up” and “down” responses were occasionally mentioned, which is why they were included in the two summary plots.

    The x-axis groups are presented such that the responses shown represent the image shift caused by the second listed frequency band.  For example, Figure 26 shows the results of moving band E to the SR channel first and then comparing that image position to one caused by relocating either: nothing (Stereo, STR), band A, or band AB.  The first grouping in this plot shows the results of relocating E vs. STR for 14 test subjects.  In this case, 3 subjects felt the stereo image was to the “left” of the E image, 1 subject said it was to the “right,” while the majority (9) felt the image did not shift.

Figure 25: Music Image Summary


 


Figure 26 : Band E vs. Music Image
 
 


Figure 27: Band DE vs. Music Image
 

 


Figure 28: Band CDE vs. Music Image
 

 


Figure 29: Noise Image Summary
 
 
 


Figure 30: Band E vs. Noise Image
 
 
 


Figure 31: Band DE vs. Noise Image
 


Figure 32: Band CDE vs. Noise Image


Top

Analysis

    The overall results of the listening tests suggest that noticeable shifts in the stereo image did occur when portions of the spectrum are relocated to the SR speaker.  Also, the relative direction and amount of the image’s movement seems to depend on which frequency band was moved to the SR channel.

    It is first necessary to view the collection of results and try to eliminate correct answers due to chance.  This is accomplished by calculating significance levels.  The following analytical discussion is based upon an interpretation of the test subject’s free-form verbal responses.  Specifically, their answers were regrouped into three horizontal plane categories (no shift, right, or left).  For example, a response of “up and to the left” was interpreted as “left,” where a response of “closer” was interpreted as “no shift,” because the subject did not mention any observed left/right movements.

    Once the data was interpreted, the statistical analysis of the results was performed, as guided by Burstein (1990).  Essentially, for any listening test of limited sample size, one must discuss the results in terms of probability, and not in terms of certainty.  In fact when providing test results, it is informative to include the “criterion of significance” (?’) along with the calculated significance level.  For this thesis, two values for the criterion of significance were used; an ?’ of .05 was considered the predominant indicator of positive statistical significance, while an ?’ of .1 was considered to be lesser, yet still suggestive.

    However, note that in this testing, an image that was shifting just right of the minimum audible angle, might produce an even number of responses for “no shift” and “right.”  In this case, both may fall short of the criterion of significance.  Yet, despite two unconvincing level of criterion (for right and center) it is apparent from the results that the image did not shift left.  Any cases like this will be explicitly pointed out during this analysis.
     The significance level, P(r), was determined by assuming the collected data follows a binomial distribution of random guessing.  It can be directly calculated using:


 

where

N = number of trials

r = number of successes

x = probability of success for one trial based on guessing

    For these results “x” was assumed to be one-third (1/3), allowing equal probability for each of the three categorized responses (no shift, right, left).

    The music track will be analyzed first in its entirety, followed by the white noise analysis.  The results for the music passage can be seen in Table 2, where the two criterion of significance (.05 and .1) are highlighted.

Table 2: Significance levels for Music test results

    Considering the results, it was initially surprising to see that no shift was observed for either band E vs. STR (stereo) or DE vs. STR (?’=.004 and .001).  Notice however, that STR was perceived to the left of band CDE (?’=.000), suggesting that band C must contribute significant energy.  Most of the other highly confident shift results involved the lower frequency bands of A, AB, and ABC.  However, one borderline case should also be noted, where band CDE vs. A appears to be somewhere between “no shift” and “right.”  It could thus be considered that A is just right of band CDE.  A visual summary of these results can be seen in Figure 33.

Figure 33: Shift hierarchy of SR bands for Music track

         
                As was mentioned, a low significance level suggests that the change in localization cues was definite and noticeable.  For the music test track, shifts were strong for the lower frequency bands.  However, from previous analysis of the music track (see Figure 15-Figure 18), it is apparent that the signal’s energy in the high frequency bands is fairly insignificant.  This most likely explains why the listener did not perceive shifts for relocation of the higher frequency bands.  While this is somewhat disappointing from the perspective of producing results to support this thesis, the results do support the validity of the listening tests.  In other words, listeners seem to have reported only shifts that could have reasonably occurred.

    On the other hand, the white noise test track will not have this lack-of-high-frequency problem because it has an even distribution of energy over the audible frequency range.  The white noise track’s test results are presented in Table 3.Similar to before, band E shows no shift compared to STR (?’=.001).Thus, even for the spectrally balanced white noise, moving band E creates unnoticeable shifts!  However, STR is now noticeably to the left of bands DE and CDE (?’ of .033 and .000), suggesting the high frequency bands do have some effect on the overall perceived location.

    More importantly, moving band E versus the lower frequency bands of A and AB still produces noticeable shifts towards the right (?’ of .000 and .000).  Band DE, and CDE versus the low frequencies produces similar results, shifting noticeably further to the right with the low frequencies.  These results alone provide strong support for this thesis, that high frequency cues are not as important as low frequency cues for the perception of the horizontal position of a stereo image.  The resulting relative position of the images can also be seen in Figure 34.

Table 3: Significance levels for Noise test results
 
 

Figure 34: Shift hierarchy of SR bands for Noise track

        One further point of discussion should surround the differences between the results from the music track and the white noise track.  In fact, this was expected.  The white noise test signal represents the ideal test condition, having a flat energy distribution across the audible spectrum.  However, the music passage has a spectrogram that has a gross energy distribution amongst the frequency bands, left and right signals, and it also varies with time (see Figure 15).  As was mentioned, this makes it difficult to draw conclusions from the results of the music tracks’ trials.

 Top

Analysis - Loudness Calculations

        With the results disclosed, one might argue that a “low frequency” band from 20-800 Hz (band A) is not a fair comparison versus a “high frequency” band from 12-20 kHz (band E).  The more abstract question might be, what is a fair comparison - considering that dominance is relative to the defined frequency range of each.  What characteristics might have favored the low frequency dominance shown in this testing?  One factor to consider is the loudness of the spatially relocated frequency bands.

        Before calculating loudness, the basic concepts and terminology should be presented.  From the fundamentals of acoustics, a vibrating body in space creates pressure variations in the medium, which radiates out from the source and are ultimately transferred to the listener’s eardrums, thus creating a sensation of loudness.  The magnitude of the pressure variations defines a physically measurable quantity called intensity.  Loudness, a perceptual quality, is related to intensity but in a non-linear and frequency dependent fashion.  Other important terms to consider include “loudness level” (measured in “Phon”) and “loudness” (measured in “Sone”).  The relationship of these terms are given below (also see ISO 131-1979 (E)):


 

Sound Pressure Level [dB SPL]

where p0 = 20 ?Pa and p ? RMS pressure at a particular point

or I0 = 10-12 w/m2 and I ? Average Intensity

or P0 = 10-12 w and P ? Average Acoustical Power
 


 


        The equations above enable the db SPL of a sound to be determined from a measurable quantity.  However, to determine the loudness level, a comparison must be made between the sound in question and the loudness of a 1000 Hz tone.  For instance, if the test sound is determined to be equally loud as a 1000 Hz tone at 70 db SPL, then the sound is considered to have a loudness level of 70 Phon.

         Unfortunately, loudness levels have no relative perceptual meaning, such that a sound at 60 Phon is not twice as loud as a sound at 30 Phon.  In order to compare loudness levels, one must use the Sone scale; which has been determined through experimental evidence.  This unit allows relative loudness comparisons to be made, such that 10 Sone is twice as loud as 5 Sone; and 4 Sone is half as loud as 8 Sone and etc.  Also notice that 1 Sone is equivalent to a 1000 Hz tone at 40 dB SPL (i.e. 40 Phon).

         Determining the perceived loudness of complex sounds is a well-published topic.  The historical development of loudness models can be found in most audio engineering or psychoacoustics textbooks, such as Hartmann (1997) or Zwicker and Fastl (1990), which were used for this research.  From these texts, it is understood that Fletcher and Munson’s (1933) work on loudness was the most popular origin for much of the later research.  Continuing the tradition was two influential scientists including Stevens (1955, 1961) and Zwicker (1961) (also see Zwicker, Flottorp, & Stevens, 1957).  Both published similar, but slightly different methods of calculating loudness, and both were ultimately included in the International Standards Organization (ISO) standard method for calculating loudness levels (see ISO 532-1975 (E)).

           From the ISO standard, Stevens’ Mark VI method (method A) is considered simpler and more effective for measurements taken in octave band increments, while Zwicker’s (method B) is more accurate for one-third octave band measurements.  Even more recent than the ISO standard, the two methods have been further improved.  First Stevens (1971) himself improved his Mark VI method (to Mark VII), while Zwicker’s method has suggested improvements from Moore and Glasberg (1996).

        These loudness models are based on the fact that the ear has a non-linear frequency response, exhibits areas of masking (or inefficient excitation) called critical bands, and seems to follow the power law (Weber’s or Steven’s law) of perceptually measurable qualities (Hartmann, 1997).  The power law suggests that there is a linear relationship between perceived loudness and the difference limen of intensity.  Thus, increasing intensity by 10%, at any loudness level, will create a similar increase in perceived loudness.

          For simplicity, only Steven’s (1972) most recent Mark VII method will be used to calculate the loudness of the SR bands.  It begins by dividing the sound’s spectrum into octave bands.  ISO recommends that the octaves are represented by their geometric mean, shown in Table 4 (also see ISO 266-1975 (E)).

 Table 4: ISO 266 Octave Band Frequency Centers

    The power in each octave band is then used to calculate the sound’s absolute level (in dB SPL).  These level values are then correlated to a “loudness index”     using lookup tables provided by Steven’s (1972).  Finally, the total loudness (in Sone) is calculated using the following formula:


where


 

            Steven’s method will be used to calculate the sound level for each of the six spatially relocated bands (A, AB, ABC, E, DE, CDE).  As mentioned, the level of a sound can be calculated directly from the acoustical power (P) of the signal.  Because the test signal was white noise, the total power of the signal is uniformly distributed over the entire spectrum and could thus be represented as:


 
where PTOT = Total Acoustical Power [Watts],

BWSRBAND = Bandwidth of Spatially Relocated band [Hz]

and BWTOT = Total Bandwidth [Hz]

    For the loudness experiments, the volume of the amplifier was adjusted so that the full bandwidth white noise playing from one speaker gave a level of 70 dB SPL on an A-weighted sound meter.  The equation above yields a total acoustical power of .01 mW.  Then, assuming the reproduction equipment is ideal from 80-20,000 Hz (BWTOT = 19920 Hz), the resulting sound levels were calculated and can be seen in Table 5

Table 5: Level of Spatially Relocated Bands

 

However, in order to calculate the loudness of the SR frequency bands with Steven’s method, it is necessary to further break down the loudness levels into octave bands.  Thus the same approach using power (P) and the equation above was used to calculate the sound levels, in octaves, shown in Table 6

Table 6: Calculated Octave Band Levels

 

These level values were then matched with Stevens’ (1972) tabular loudness indexes, which essentially compensate for the non-linear nature of human hearing and other experimental findings.  The resulting loudness index values can be seen in Table 7.

Table 7: Loudness Index values from table lookup

With the loudness indexes obtained, the next step is to calculate the factor (F).  Each of the six spatially relocated bands will have their own F value.In fact, F is determined by subtracting 4.9 dB from the level of the loudest octave in the spatially relocated band and then looking up a new index value in the tables.This new index is then correlated to an F value in a different factor tables in Stevens (1972).Also,because octave bands were used in the analysis, the final F value is double that listed in the table.  All of this is per Steven’s Mark VII instructions, the results of which are the “F factor” values shown in Table 8

    Finally, the loudness levels (in Sone) can be calculated using the previously mentioned equation.  These results are also shown in Table 8, which provides some insight to the relative loudness of the SR bands.  Note that the low frequency bands of A and AB are both considered to be softer than the high frequency energy of band E  In fact, band A is only two-thirds the loudness of band E.  This fact in itself should dismiss loudness as the cause of the low frequency SR bands localization dominance.

Table 8: Final calculated loudness levels


Top

Analysis - Loudness Experiments

    It would have been interesting to compare each SR band to a reference 1000 Hz tone.  This method provides a way to directly measure the relative loudness of each band, and would have also produced a Phon level for each band.  However, it would not have helped determine what SR bandwidths would have made them equally loud.  In other words, knowing that band A was 1.2 times softer (or louder) than band E would not have helped determine what band E’s lower cutoff frequency should have been in order to equal the loudness of the two.

    This question of relative bandwidth is also a relevant and interesting exercise, and prompted a second round of listening tests (see Appendix B for complete results).  A different group of sixteen test subjects participated, all Music Engineering or Audio Engineering students at the University of Miami.  The test was physically configured to be the same as the first tests, repeated in Figure 35.
 

Figure 35: Listening Test II Setup


    For these experiments, the test subjects were asked to compare the loudness of the same white noise bursts used in the first listening test.  However this time, the bursts were only sent to the SR channel, since it was the main variable of the first round of experiments and directly caused shifting of the stereo image. The L and R channel reproduced no sound.  The goal of the experiment was to compare the loudness of the SR bands with an adjustable low/high pass version of the filter.  The cutoff frequency (signal bandwidth) would thus be adjusted to match the perceived loudness of the SR band (see Figure 36).

    For example, the noise bursts representing band A (80-800 Hz) were first played for the test subjects.  This was followed by a high pass filtered version of the noise bursts with an arbitrary lower frequency cutoff point.  The subject was then asked if the second sound was “louder, softer, or about the same” as the first.  Subsequent trials would ask the same question, while using a different lower cutoff frequency for the high pass noise.  Obviously, the only variable for these tests is the frequency cutoff of the second filtered noise.

Figure 36: Loudness Comparisons of Low Frequency Bands with High pass (left) and High Frequency Bands with Low pass (right)

 

In other words, the low pass filtered spatially relocated bands (A, AB, and ABC) were compared to high pass noise of a varied lower frequency cutoff point.  Similarly, the high pass spatially relocated bands (E, DE, and CDE) were compared to low pass noise with a varied higher frequency cutoff point (see Figure 36 for further clarification).  The results of this experiment can be seen for each SR band in Table 9-Table 13

A close analysis of the results suggests that the higher frequency bands are louder than the lower bands.  Table 9 shows the results of the band A (80-800 Hz) versus high pass filtered noise comparison.  Of course, as the lower frequency cutoff of the high pass noise is raised from 6 kHz to 15 kHz, it gets softer.  Thus, the results suggest that the high pass noise is significantly louder than band A until around 14 kHz.  Above this point, the high pass noise seems to approach the same loudness as A.  However, there is no predominant indication that it is ever considered softer than band A.  Particular notice should be given to the 12 kHz point (representing band E), which is considered by most to be louder than band A.  Being louder than band E, this implies that bands DE and CDE should also be considered louder than band A.

Table 9: Band A vs. Bandwidth Loudness


    Similarly comparing band AB (80-1600 Hz) to the high pass noise in Table 10 shows that the subjects considered the high pass noise to be significantly louder until about 10 kHz.  Around this point, the noise seems closer in loudness to band AB.  Yet, above this point, most seemed to consider the high pass noise louder again.  In particular, at 12kHz (band E), four listeners considered it to be louder than band AB.  The results for 10kHz could thus simply be an anomaly.  Regardless, these results also suggest that bands E, DE and CDE were louder than bands AB and A.
 

Table 10: Band AB vs. Bandwidth Loudness

    Moving on to the high frequency versus low pass noise comparisons shows similar indications (Table 11).  When comparing SR band E, most subjects reported the low pass noise was softer even up to 2,000 Hz (which contains both bands A and AB).  Only at around 3 kHz did some listeners suggest the low pass noise became louder than band E.

Table 11: Band E vs. Bandwidth Loudness

    For band DE (5,000-20,000), listeners did not describe the low pass noise as being significantly louder - even up to a cutoff of 7 kHz (see Table 12).  Instead, they considered the low pass noise softer, or sometimes equally loud.  One discrepancy seems to be their sense of equal loudness in the 2000-4000 Hz range, yet the low pas noise at 7 kHz was considered softer.  Regardless, these results suggest that band DE is louder than bands A or AB, but about as loud as band ABC (80-5000).

Table 12: Band DE vs. Bandwidth Loudness

    Finally, band CDE can also be reviewed despite having limited data points.  The results seem to again suggest that listeners did not consider the low pass noise to be louder than CDE, even at 7 kHz.  Instead, they seemed to consider the low pass noise softer up to around 4 kHz, above which it appeared equally loud.  This, as well as the results from the other tests, suggests that band CDE is most likely the loudest of all the spatially relocated bands. 

Table 13: Band CDE vs. Bandwidth Loudness

 

Top

Analysis - ABX Testing

    It has been established that the lower frequency SR bands did dominate the localization of a stereo image and not because they are louder.  However, the results are primarily based on white noise bursts instead of a more realistic music track.  Therefore, during the second round of listening tests, ABX tests were additionally performed using two different music tracks.  The same 16 test subjects of the second round of tests also performed a set of ABX tests for one of the two music tracks (alternatively selected).

    ABX testing is often used to study the perceivable differences between two items.  The test subject is played three versions of a test track: A, B, and X.  Sounds A and B are different, but will stay the same in each trial.  However, for each trial, track X is a random selection of either A or B.  The subjects are repeatedly asked to identify whether X is A or B.  The results produce a level of statistical confidence, which suggests whether the listener can reliably tell the difference between A and B.  Obviously more correct answers of X produces a higher confidence.  However, low confidence does not necessarily prove the sounds are the same, but usually implies the sounds are more similar than dissimilar.

In these experiments, the listeners were asked to discern between regular stereo music (condition A) and the SR frequency setup shown previously in Figure 14 (condition B).  During condition B, the SR channel would play the left stereo signal’s frequencies above 10 kHz.  Thus, the R channel played the same right stereo signal for both trials A and B.  The L channel played the full bandwidth left signal on trial A, and a low pass filtered (at 10 kHz) version for trial B.  The SR channel played nothing for trial A and played the high pass filtered (at 10 kHz) version of the left stereo signal for trial B.

            One of the chosen music tracks was a 10 second clip (0:10-0:20) of the introduction to Madonna’s “Candy Perfume Girl”  (track4_madonna_short.wav).  It was used because it contains a very unique wide bandwidth, noise-like sound that moves from center to right sound stage.  Note that for the listening tests, the left and right stereo signals were flipped from the original recording in order to maximize the influence of the SR channel.  Thus, listeners should have experienced the noise moving from center to left stage during trial A (stereo) of these experiments.  The spectrogram of the test signal can be seen in Figure 37.  Recall that the amplitude is represented by the intensity, with time as the x-axis and frequency as the y-axis.

The other test track was an eleven second clip of the Tower of Power’s “What is Hip?” (track5_whatship.wav)  The section was chosen because it contained a high-note (squealing) riff from a right-panned trumpet section, while the other band members (sax, brass, percussions, singer, etc.) played simultaneously.  It had a greater amount of high frequency energy than most of the other test tracks available.  This will be discussed later in more detail.  Again, for the ABX test, the left and right stereo signals were flipped from the original recording.  The spectrogram of this test track can be seen in Figure 38.

Figure 37: Spectrogram of Track One - Madonna
 
 

Figure 38:  Spectrogram of Track Two - What is Hip?


        Each subject performed 5-10 trials of the ABX tests for one of the two test tracks.  If all subjects and trials are assumed to be equivalent, the tests produced 62 trials for track one and 83 trials for track two.  The results can be seen in Table14.


 

Table 14: ABX Test Results

 Considering the high frequency content of the tracks, one might predict that trial B should have resulted in a noticeable shift of the high frequency stereo images towards the right.  Also some changes in the spectral balance of the signal should have been noticeable, caused by the spatial relocation of the left signal’s high frequencies.

However, the results suggest that the test subjects could not reliably tell the difference between the stereo and SR versions.  In fact, calculating the results (as previously discussed, using x = 50%) gives a fairly high significance level for both trials (a’ =.972 for trial 1, a’ =.587 for trial 2).  This is greatly above the typically acceptable value of a =.05.  A high significance level yields low confidence levels, which ultimately suggests that the tracks were hard to differentiate.

However, note that some subjects could tell the difference between the two trials (see Table 14).Subjects 2 and 11 on track one, and 1, 6, and 8 on track two performed above average.  Also notice that the listeners seemed to perform better on track two than they did on track one.  Of course, varied results should be expected.  After all, the perceptual significance of the SR channel depends heavily on the amount of high frequency information, and its proportion to concurrent lower frequency information.

A closer look at the two test tracks might provide further insight as to why they were difficult to discern from stereo, and even why track two was easier than track one.  When considering these tracks, it is important to focus on the spectral content above 10 kHz of the left stereo signal (right signal of the original recording).  After all, this was the only difference between trial A (stereo) and B (the SR version).

The amount of relocated energy is most likely a major factor.  More energy suggests a louder SR speaker, which is the most measurable perceptual impact.  Loudness however, is not necessarily the most important factor as has been previously shown.  Thus, the energy distribution above and below 10 kHz was calculated and plotted in the lower right plot of Figure 39 and Figure 40.  Recall that the left and right channel were flipped during testing.  Thus, notice the portion of energy that is represented by the “right” signal of the two tracks.

 Realize that the second music track has a greater amount of high frequency energy than the first.  However, these energy plots assume an ideal brick-wall filter at 10 kHz; as opposed to the actual testing condition, which used a second order high, pass filter.  This means that a slightly greater amount of energy could be expected from the SR channel than what is shown in the figures.

Figure 39: Spectral Energy for Track One - Madonna
 
 
 
 

Figure 40: Spectral Energy for Track Two - What is Hip?

            A logical follow-up question might be to ask what typical amount of energy is above 10 kHz in music.  This is obviously difficult to say for certain, however evaluating a sample of varied music tracks should be insightful.  Actually, this analysis was performed prior to the second round of listening tests in order to choose a music track that would hopefully create a significant image shift during testing.

Fifty-two tracks from a critical listening music CD were evaluated.  The disc contained a wide selection of genres.  For each of the tracks, a ten second segment containing the phrase with the most high frequency energy was chosen.  This was determined by monitoring Sound Forge’s spectrum analysis during playback.  Then, the shorter sound clips were each analyzed using short time frequency analysis techniques in Matlab (see Appendix C for code).  This produced a short-time calculation of total energy, energy below 10 kHz, and energy above 10 kHz.  Ultimately these numbers were used to find an average value of energy above 10 kHz.

Surprisingly, only seven of the fifty-two clips had more than 3% average energy above 10 kHz (see Table 15).  The final music tracks for ABX testing were chosen because they had a high frequency image that was stationary on the sound stage during playback and was non-impulsive.  Track eight was also used in the testing , despite having only 1% average energy above 10 kHz.

Of course, it was assumed that “average energy” is representative metric for determining which track would exhibit shifts due to SR high frequencies.  In actuality, the short-time ratio of energy above and below 10 kHz is probably more representative; and also more difficult to calculate.

Table 15: Energy Analysis of misc. music tracks at 10 kHz

   Another influential difference between the tracks could have been that track two had a stationary image, whereas track one was moving during playback.  Thus, during ABX testing, the listener would have had to identify the panning movement and then notice that it became more stationary in the SR version.  This is perhaps a lot to expect for an unfamiliar piece of music.

Top or TOC or Beginning

 

 

 

Created  February 2003 by Rob Hartman

Copyright (C) 2003