7.1 Tests and Results. 2

*7.1 Train of Pulses. 2

*7.2 Train of Pulses With White Noise  3

*7.3 MIDI Files. 5

*7.4 Polyphonic Music. 6

*7.5 Train of Pulses with Two Different Intensities  9

*7.6 Autocorrelation Comparison  13

 

 

 

 

 

 

7.1    Tests and Results

7.1.  Train of Pulses

The tests started with the simplest kind of audio file involving rhythm, a train of pulses.  Several trains of pulses at different frequencies (1, 2, 3, and 4 Hz) were created with the program Sound Forge using fs=8 KHz and duration=5 sec.  All the trains of pulses presented here were decimated in a factor of 5, and used a bin parameter of 64 for AMI calculation.  The beat detection algorithm presented in Chapter 6 was applied to the train of pulses in Figures 7.1, 7.3, 7.5, and 7.7.  The results obtained (AMI) are presented in Figures 7.2,7,4,7.6, and 7.8.  The maximum peak from the AMI is considered to be the beat rate.

Figure 7.1.1 Train of pulses at 1 Hz. Beat occurs each 1 sec.

Figure 7.1.2 AMI. The maximum peak is at 1 sec corresponding to 1 Hz

 

Figure 7.1.3 Train of pulses at 2 Hz. Beat occurs each 0.5 sec

Figure 7.1.4 AMI. The maximum peak is at 0.5 sec corresponding to 2 Hz

Figure 7.1.5 Train of pulses at 3 Hz. Beat occurs each 0.33 sec

 

Figure 7.1.6 AMI. The maximum peak is at 0.33 sec corresponding to 3 Hz

Figure 7.1.7 Train of pulses at 4 Hz. Beat occurs each 0.25 sec

 

Figure 7.1.8 AMI. The maximum peak is at 0.25 sec corresponding to 4 Hz

According to the previous figures, all the results are very accurate and prove that AMI works very well for beat detection.  However, the autocorrelation function presents very similar results for a train of pulses.  On the other hand, AMI presents better results for more complex signals as described in section 7.5.

7.2         Train Of Pulses With White Noise

7.2.1      Train of Pulses (1 Hz) with White Noise

This time, a train of pulses of 1 Hz was created adding white noise in the silent parts (parts closer to zero).  The pulse sections where added with a signal of voice and white noise and in this way were repeated during the signal.  In addition, the pulses did not start at t=0, they started at=0.0625 sec just to ascertain if phase could affect the result, it did not.

 

Figure 7.1.9 Train of pulses of 1 Hz, fs=8000, 5 sec.

Figure 7.1.10 The train of pulses plus white noise

Figure 7.1.11 Train of pulses of 1 Hz added to a white noise signal

Figure 7.1.12 AMI. Maximum peak at 1 sec corresponding to 1 Hz.

In Figure 7.10 after adding the white noise to the signal, we still can see the shape of the train of pulses, for that reason the signal was filtered with a fourth order Chebyshev LPF, 3 dB of ripple with cut-off frequency at 700 Hz as presented in Figure 7.11, which looks more a real world signal.  In Figure 7.12, the AMI is shown. A decimation of 5 was applied N=5, maxtau=2, bin=64. After the noise added the result is still as expected.  AMI still could detect the beat rate in a signal with white noise.  White noise does not have memory of the system; if AMI is applied just to white noise it will not yield any peak.

7.2.2      Train of Pulses (2 Hz) with White Noise

This test is exactly the same as the one for 1 Hz, but the train of pulse frequency is 2 Hz.

Figure 7.1.13 Signal plus white noise

Figure 7.1.14 Train of pulses at 2 Hz added to a white noise signal

Figure 7.1.15 AMI=0.5 sec corresponding to 2 Hz

The results are good again, despite all the added noise AMI found the beat rate at 0.5 sec, corresponding to 2 Hz.

7.2.3      Train of Pulses (1 Hz) with much more White Noise

This time the level of white noise added to the pulses was higher than in the past tests.  The results are still good, but some degradation is seen in the AMI because the peaks are not as prominent as before in Figure 7.12.  The maximum level of noise added that still allows get good results, was half of the level applied to the silences.  In Figure 6.21 the signal has already been filtered but note that the level of noise is not the same as in the pulses, the shape of the pulses is still seen.  This picture looks very similar to a real world music signal.

Figure 7.1.16 Train of pulses at 1 Hz added to a white noise signal

Figure 7.1.17 AMI=1 sec corresponding to 1 Hz.

In Figure 7.17 the AMI maximum value is still at 1 sec (1 Hz), but due to the high level of noise added to the original signal the AMI response is not as prominent as before in Figure 7.12.

7.3         MIDI Files

MIDI files are a good option for testing because they have a fixed beat rate, and were converted to WAV files in order to be tested.  Files were sampled at Fs=8 KHz, decimation factor N= 2, maxtau=2, bin=64, and 5 secs of duration.

7.3.1      MIDI file (80 bmp) 

Figure 7.1.18 AMI of MIDI file 80 bmp (t80bpm.wav) the max peak is at t=0.74 sec = 81 bpm.

This particular file has 20 beats in 15 sec, approximately 79.5 bpm (measured by myself). The piece doesn’t have drums, but the beat is very clearly performed by a strong piano.  The result of the AMI 81 bpm is very acceptable.

7.3.2      MIDI file (110 bpm)

This file has 29 beats in 15 seconds approximately 110.91 bpm.  As shown in Figure 7.9 the detection failed.  To find out if more data is necessary for the detection, the length of the audio file was increased from 5 to 10 sec.  The max peak in AMI was again in t=0.27 (222 bmp). 

Figure 7.1.19 AMI of MIDI file 110 bpm (t110bpm.wav) the max peak is at t=0.27 detection failed. Must be at t=0.55 to correspond to 110 bpm.

Listening to the file no drums or any kind of clues were marking the beats. For this reason, another 5 sec from the same song was selected, this time some percussion was marking the beats.  The result was t=0.546 sec, very close to the 0.55 expected.  Figure 7.20 shows the result.

Figure 7.1.20 AMI of (t110bpmkick.wav) this time the file contains kicks.  The max peak is at t=0.546 = 109.89 bmp~110 bpm

7.3.3      MIDI file (124 bpm)

Figure 7.1.21 MIDI file 124 bpm (t124bpm.wav) the detection fail must be at t=0.48

This MIDI file has10 beats in 5 sec at approximately 120 bpm.  A maximum was expected close to t=0.48 but there was not any prominent value in our range of detection, probably, because this song did not have any drums at all.  After testing in 7.3.2 and 7.3.3 it can be said that this method has difficulties when no drums or any other instrument marks the beat.

7.4         Polyphonic Music

Some music files from compact discs of popular music were sampled for these tests.

7.4.1      Ballad

This is a Beatles song, “Free as a Bird” from the album Anthology (2).  Even though the beat is well marked by Ringo’s drums, the AMI doesn’t present a clear peak.  The max is at t=0.83 corresponding to 6 beats in 5 sec in the song. It worked well, but was not very conclusive, despite the maximum value is right but the pattern doesn’t show a clear peak around this value.  Figure 7.22 shows the AMI for this audio file.

Figure 7.1.22 AMI of file pballad8k5.wav, 6 beats in 5 sec, max peak at t= 0.83 sec corresponding to 72 bpm.

7.4.2      Reggae

Reggae music has equally spaced beats, but with the structure one strong beat one beat weak.  This file preggae8k5.wav has 7 strong beats in 5 sec = 84 bpm.  But it could also be said that this file has 13 equal spaced beats in 5 sec. (quarter note level see Figure 4.7).  Note that the AMI has the max peak at t=0.71, corresponding to 7 beats in 5 sec.

Figure 7.1.23 AMI of preggae8k5.wav the max peak is at t=0.713 corresponding to 7 strong beats in 5 sec.  Also the quarter note level is detected at t=0.382 corresponding to 13 beats in 5 sec.

It is very interesting to see how people perceive rhythm.  I asked a former drummer to tell how many beats he counted in this 5 sec file.  He said 13 beats, but for me the rate was just 7 beats.  It is most interesting to see how AMI detects the maximum rate for the strong beats at t=0.713 (7 beats in 5 sec), but also presents a second smaller peak around t=0.382 corresponding to the quarter note (13 beats in 5 sec).

7.4.3      Merengue

The Merengue’s structure has one strong beat, one weak beat, one strong, one weak, and so on.  The 5 sec excerpt of the Sergio Vargas’s song “Que Sera” (pmerengue8k5s.wav) has 6 strong beats in 5 sec and 12 equally spaced beats in 5 sec.  In this case, the AMI yielded a bigger peak around 0.83 and a minor peak around 0.41 as expected.  Again, AMI detects the strong beat rate and the quarter note beat rate.

Figure 7.1.24 AMI of pmerengue8k5.wav the max peak is at t=0.882 (6 beats in 5 sec), the second notorious peak is at t=0.442 (12 beats in 5 sec).

7.4.4      Latin Pop

The Ricky Martin song “Living la Vida Loca” from the album with the same name, was tested in both English and Spanish versions with very similar results.  This song has the same beat rate in both English and Spanish.  The strong beat rate is 7 beats in 5 sec (t=0.71) and 14 beats in 5 sec (t=0.35) for the quarter note beat rate.  The detection was successful in both cases applying AMI as shown in Figures 7.25, 7.26, 7.27, and 7.28.

                   Spanish Version

                English Version

Figure 7.1.25 File platinsp.wav, “Living la Vida Loca” Spanish version

Figure 7.1.26 File platinen.wav, “Living la Vida Loca” English version

Figure 7.1.27 AMI from Spanish version

Figure 7.1.28 AMI from English version

 

Strong beat rate

    t= 0.6853 sec

    87.55 bpm

 

Quarter of note beat

        t=0.33 sec

       181.8 bpm

 

Strong beat rate

    t= 0.6855 sec

    87.52 bpm

 

Quarter of note beat

        t= 0.34

       176.4 bpm

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Table 7.1 Results from the song “Living la Vida Loca” English and Spanish versions.

7.5         Train of Pulses with Two Different Intensities

To verify if AMI is able to detect strong and weak beats separately, a special train of pulses was created with Sound Forge. One train of strong pulses was created at 2 Hz, and another train of pulses of lower intensity was created at 4 Hz and then combined in one file as presented in Figure 7.29  The AMI result was as expected, the maximum peak at t=0.5 sec marking the strong beats and the second peak at t=0.25 sec corresponding to the weak beats.

Figure 7.1.29 Train of pulses at 4 and 2 Hz

Figure 7.1.30 AMI the max peak at t=0.5 sec (2 Hz). The second peak at t=0.25 sec (4 Hz).

 

7.6         Autocorrelation Comparison

     Audio Signal

    Auto Mutual Information

          Autocorrelation

 

 

 

Train of pulses at 4 Hz

Figure 7.1.31 AMI t=0.25 sec

Figure 7.1.32 Autocorrelation t=0.25 sec

 

 

 

      1 Hz plus noise

Figure 7.1.33 AMI t= 1 sec

Figure 7.34 Autocorrelation t= 1 sec

 

 

 

 

      MIDI file 80bpm

Figure 7.1.35 AMI  t=0.74 sec ~ 81 bpm.

 

Figure 7.36 Autocorrelation t= 0.5 sec, 120 bpm (did not work)

 

Figures 7.31, 7.33 and 7.35 present beat detection with AMI for three different kind of audio files. Figures 7.32, 7.34, 7.36 present autocorrelation applied to the same audio files.  These figures show a comparison of AMI and Autocorrelation in beat detection.  The detection by autocorrelation worked only fine for the train of pulses cases, when the audio files became more complex the autocorrelation was unable to detect the beat.  Despite of autocorrelation, AMI can detect beats in more complex signals because it is an operation for non-linear signals.