APPENDIX A
Qualitative Judgements for Multirate Convolution without
Cross-Convolved Channels
Listening tests were performed in order to evaluate the audibility of the transition band aliasing present in the proposed algorithm. Results indicated that inexperienced listeners detected a difference between the ideal and aliased results, but had difficulty in verbally describing or quantifying it.
Procedure
Ten ten-second audio samples were convolved with one of five impulse responses using overlap-save convolution and the proposed variation. One of four different filter banks with various transition bandwidths and stopband attenuations was used in each multirate convolution. Table 4 summarizes the test battery. Audio samples were recorded from compact disc to a personal computer using Sound Forge 4.0d, manipulated in Matlab, and digitally transferred to a digital audio tape (DAT) recorder. The left channel contained the overlap-save convolution result. The right channel contained the multirate convolution result. Each channel of the DAT was routed through a Sony MXP-3056 mixing console so that each track could be monitored separately in a centrally panned position. The test samples were monitored in the near field via two Event 20/20 near field monitors positioned approximately five feet from the listener forty-five degrees to the left and right of straight ahead.
Table 4. Listening test samples.
| Number | Style |
Impulse response |
Multirate filter and order |
| 1 | solo violin | large steel stairwell | FIR - 8 |
| 2 | light vocal rock | medium recital hall - rear | FIR - 16 |
| 3 | hard rock | large recital hall - rear | FIR - 8 |
| 4 | chamber orchestra | medium recital hall - middle | IIR - 32 |
| 5 | instrumental jazz | large recital hall - rear | IIR - 32 |
| 6 | electronic/world | large recital hall - rear | FIR - 16 |
| 7 | classical symphony | large recital hall - front | FIR - 8 |
| 8 | heavy metal | large recital hall - front | IIR - 32 |
| 9 | religious choral | large recital hall - front | IIR - 64 |
| 10 | classical guitar | large steel stairwell | FIR - 16 |
Test subjects included 6 undergraduate university students between the ages of 18 and 21. Subjects were instructed to listen to each excerpt and rate on a scale of 1 to 5 the similarity of the second channel to the first. Subjects were also asked to verbalize any audible differences. Subjects directly controlled which channel they listened to via the channel mute buttons on the mixing console. Subjects were allowed to listen to each excerpt up to five times.
Results
Table 5 summarizes the results of the listening tests. It should be expected that any differences heard between the exact and multirate convolutions would qualitatively concern the frequency range where aliasing occurs. For audio sampled at 44.1 kHz, this is the range from approximately 10 to 12 kHz. Although most audio material has much more information in the lower frequency ranges, high frequency content critically affects the timbre of many instruments, especially percussion instruments and vocals.
Table 5. Listening test results with respect to mean similarity rating. In the comments, Channel A is the exact convolution and Channel B is the multirate approximation.
| Number | Mean similarity rating (1-5) |
Variance | Comments |
| 1 | 4.17 | 0.57 | B had more ring; B slightly muffled compared to A; B had less depth than A |
| 2 | 4.17 | 0.57 | B's low frequency range is expanded; B mid highs different; kick drum more resonant |
| 3 | 3.50 | 1.10 | B had more depth; high hat is brigher in A; cymbal "sizzle" was distorted in B; weird high-freq stuff (cymbals) in B; cymbals more distorted, fuzzy in B |
| 4 | 4.33 | 0.27 | B has expanded low freq response; less brittle; more cello in B; slight difference in bass notes |
| 5 | 4.67 | 0.27 | A sounded muted; slight distortion on trumpet high riff; a bit on drums |
| 6 | 4.33 | 0.27 | drums on B sound weird (phase like), high freq was rounder, less brittle than A; cymbal crash is more definite in B; loss of definition on snare and high hat |
| 7 | 4.50 | 0.30 | less depth in A; subtle change in B's presence; B seems louder |
| 8 | 4.50 | 0.30 | B has change in presence; loss of definition on drumset |
| 9 | 4.33 | 0.27 | female voice more prevalent on A; hear a bit more male voice in B |
| 10 | 4.17 | 0.17 | B seemed cleaner; B has more bass, rounder; bell doesn't echo as long in B, loss of sharpness in B |
Discussion
The user comments seem to indicate that these differences are generally audible. For the most part, they specifically address high frequency adjectives like presence, sharpness, definition, etc. But the high mean similarity scores seem to indicate that subjects either (a) had difficulty hearing the differences or (b) didnt consider the differences very significant with respect to overall similarity. As one would expect, the items with the highest mean similarity scores had the fewest comments. For item 5,7, and 8, only two of the listeners detected a difference. Items 5 and 8 used a more selective 32nd order IIR filter. Item 7 used a very simple 8th order FIR filter, but applied it to a musical selection with no percussion or vocals. The poorest performance occurred on item 3, which used a simple 8th order FIR in conjunction with drum and cymbal laden material.
Although too few subjects were tested in order to support any generalized claims, it seems reasonable to assume that the multirate convolution using highly selective filters on non-percussive, instrumental material will audibly differ only slightly from that of the exact single-channel convolution. Whether the difference is detectable will depend significantly on listening conditions and listener experience.