[Chapter 3][Table of Contents][Chapter 5]

4. Listening Test Formulation

Previous research regarding a valid listening test design will be investigated. It is vital that a sound psychoacoustic test design is adopted for implementation so that acquisition of the most accurate data is possible. It is paramount in conducting listening tests with the most accurate audio equipment available if the test subject is to hear any audible effects.

4.1 Previous Research

An interesting study was conducted by Sergeant and Boyle [23] regarding comparisons of various psychoacoustic tests of pitch discrimination to see what is "the ‘best’ measure in relation to efficiency, or validity for general musical behavior" by examination of various test structures, stimulus differences, and response formats. Although pitch discrimination was not an area tested in this research at hand, Sergeant and Boyle’s study has relevance in that the phase distortion listening test implementation can be significantly improved by observance of their results. The comparisons of the five pitch tests (Bentley, Colwell, Kwalwasser-Dykema, Seashore, and Sergeant) are summarized in Table 4.1. Of interest here is the variability of task structures among the different test implementations. The K-D test and the Sergeant pitch test require determination of the presence or location of pitch change. However, the others (Bentley, Colwell, and Seashore) require the determination of both the presence or location of pitch change and pitch direction. It would appear that the two-step task structure would be more difficult than a one-step task structure. The response formats used in the various tests also reflect accommodation of the different task structures.

Table 4.1

Table 4.1. Comparison of five pitch discrimination tests. [D. Sergeant and J. D. Boyle, "Contextual Influences on Pitch Judgement," J. Soc. for Res. in Psych. of Music and Music Ed., vol. 8, pp. 3 - 15 (1980), pp. 7, Table 1.]

Table 4.2 displays the means and standard deviations for the scores of 65 subjects on each of the five tests.



Standard Deviation

Bentley Model



Colwell Model



K-D Model



Seashore Model



Sergeant Model



Table 4.2. Means and standard deviations of subjects’ responses to five tests (n = 65). [D. Sergeant and J. D. Boyle, "Contextual Influences on Pitch Judgement," J. Soc. for Res. in Psych. of Music and Music Ed., vol. 8, pp. 3 - 15 (1980), pp. 12, Table 4.]

It can be seen that the Kwalwasser-Dykema model has the highest test mean (out of 30 total) and the lowest standard deviation. Therefore, a phase distortion audibility test implementation based on the K-D test should provide very accurate and hence reliable data acquisition. The Sergeant model was also considered, but was not implemented since it was determined that its implementation may be inappropriately long for some of the music-based test signals.

4.2 Test Equipment

The audio equipment used in the implementation consisted of

1. Personal computer
2. Yamaha 01V digital mixing console
3. Genelec 1030A loudspeakers
4. AKG K270 headphones.

The personal computer contained a studio-quality sound card (Micro Technology Unlimited, Krystal) that was used to play back the test signals. The Yamaha 01V mixing console provided pre-amp gain for the self-powered loudspeakers and headphone amplification for the headphones. The studio-quality Genelec loudspeakers have a free-field frequency response of 55 Hz - 18 kHz ( 2.5 dB). This on-axis (0° ) plot, along with off-axis plots of 15° , 30° , and 45° along with the 1/3 octave power response is shown in Fig. 4.1. The off-axis characteristics of the loudspeaker display a smooth and well-behaved transition. Their harmonic distortion at 90 dB SPL at 1 meter was specified at < 3% for 60 - 150 Hz and < 0.5% for > 150 Hz. The loudspeakers incorporate a waveguide on the tweeter, which focus dispersion and aids in minimizing unwanted first-order reflection contributions in the sound heard in the listening position.

Figure 4.1

Fig. 4.1. Anechoic frequency response (0° , 15° , 30° , and 45° ) and 1/3 octave band power response plot for Genelec 1030A loudspeaker. [Genelec Data Sheet for 1030A Bi-amplified Monitoring System, pp. 3, Genelec Oy, Finland (1999)]

Phase characteristics for this loudspeaker were not investigated. The AKG headphones have a published frequency-response specification of 20 Hz to 28,000 Hz. Again, the phase response for the headphones was not investigated. As far as the frequency response of both the headphones and loudspeakers were concerned, they were both adequate to carry out the test signal content.

4.3 Test Implementation

All test subjects were given clear oral instructions regarding how to proceed with the test. Fig. 4.2 is a copy of the instructions handed to the subjects for viewing purposes.


    1. Please locate and open (double click) the folder with the appropriate sound folder corresponding to the scoresheet.
    2. Please locate and open (double click) the folder with the appropriate version number folder corresponding to the scoresheet.
    3. Lower the headphone or loudspeaker output initially so that excessive levels are not produced.
    4. Two sounds, played one after another, are contained in each sound file (a) ~ (d).
    5. Play back the sound file (a) ~ (d) by double clicking on them and adjust headphone or loudspeaker output to comfortable levels. (sounds may be played as many times as desired)
    6. Locate the corresponding sound (a) ~ (d) on the scoresheet.
    7. Please indicate if a difference (i.e. in timbre, loudness, etc.) was heard between the two sounds by checking on the appropriate ‘Yes’ or ‘No’ box.
    8. When finished with the headphone test, please inform me to set up the loudspeaker test.

IMPORTANT: Take care to make sure that you are in the correct sound and version folder.

Thank you very much for your participation.

Fig. 4.2. Instruction sheet for the listening test implementation.

The listening test was implemented by a randomized arrangement of A, the unfiltered test signal and B, the all-pass filtered test signal as AA, AB, BA, or BB. The synthesized test signals were presented in mono and acoustic test signals in stereo. The test subject was free to adjust both headphone and loudspeaker levels for each test signal to comfortable levels. The randomized presentation order of the test signals was different for the headphone and loudspeaker test implementations. Care was taken regarding the onset portion of the test signals, which perception of timbre is highly dependent on, were audible by the subjects. A ‘blank’ time of 50 msec duration was placed in the beginning and between all test signals except for the jazz vocal, which received 500 msec. The longer ‘blank’ time was determined necessary for the relatively long and complex jazz vocal test signal to facilitate phase distortion detection.

An implementation where test signals were presented entirely in increasing order of difficulty has the possibility of subjects developing response expectancies regarding the relative phase distortion differences among subsequent items presented. Also, it was found in Sergeant and Boyle’s study [23] that wholly random sequencing of test items had "the result that a subject has no chance to acclimatize himself to the test before critical judgements are demanded of him" therefore putting some difficulty in the beginning of a test. Therefore, test items were first sequenced in order of difficulty and then randomized for the rest of the test to optimize data acquisition.

Upon listening to the test signal sequence, the test subjects had to indicate by marking a box if a difference was heard or not by a ‘Yes’ or ‘No’ box, as shown in Fig. 4.3. This implementation is similar to the K-D Model except it simplifies the response format slightly further by simply questioning if a difference existed. This was repeated for all six test signals implemented on both headphones and loudspeakers.

Figure 4.3

Fig. 4.3. Task response format used for listening test. Version numbers refer to randomized maximum group delay times of either 4 or 8 msec.

The loudspeakers used in the listening test were set up for near-field listening (~1 meter from left and right loudspeaker to respective ear) and the acoustic centers of the loudspeakers were located on the same plane as the test subject’s ears. The test subject sat equidistant from the left and right loudspeakers. In this way, the sound received by the test subject was predominantly the on-axis direct sound from the loudspeakers.

The listening room, which was a control booth for a concert hall, had carpeted floors and concrete walls. The corners closest to the audio equipment were treated with acoustic diffusers/absorbers. Therefore, the listening environment had a semi-reverberant characteristic.

All test subjects apparently had normal hearing, but were not tested. The majority of test subjects was musicians and had recording studio experience (meaning they were critical listeners). Listener fatigue was attempted to be minimized by focusing on relevant aspects of the test implementation such as a clean and simple test design and relatively short overall length. The average listening test had the duration of 40 minutes with the longest duration being 1 hour.

This chapter introduced the listening test formulation process. A study was shown which compared the validity of various psychoacoustic listening test implementations. Test equipment used in this thesis research was presented. Finally, the listening test implementation used in the thesis research was outlined. It is important that a sound psychoacoustic test design is implemented so that accurate data acquisition is possible. It is also stressed that in listening tests, the most accurate audio equipment available should be used if the test subject is to hear any audible effects. The results of the listening test implementation will be presented in the next chapter and discussed

[Chapter 3][Table of Contents][Chapter 5]