7.2 Train of Pulses With White Noise
7.5 Train of Pulses with Two Different
Intensities
7.6 Autocorrelation Comparison
The tests started with the simplest kind
of audio file involving rhythm, a train of pulses. Several trains of pulses at different frequencies (1, 2, 3, and 4
Hz) were created with the program Sound Forge using fs=8 KHz and duration=5
sec. All the trains of pulses presented
here were decimated in a factor of 5, and used a bin parameter of 64 for AMI
calculation. The beat detection
algorithm presented in Chapter 6 was applied to the train of pulses in Figures 7.1,
7.3, 7.5, and 7.7. The results obtained
(AMI) are presented in Figures 7.2,7,4,7.6, and 7.8. The maximum peak from the AMI is considered to be the beat rate.
|
Figure 7.1.1
Train of pulses at 1 Hz. Beat occurs each 1 sec. |
Figure 7.1.2
AMI. The maximum peak is at 1 sec corresponding to 1 Hz |
|
Figure 7.1.3
Train of pulses at 2 Hz. Beat occurs each 0.5 sec |
Figure 7.1.4
AMI. The maximum peak is at 0.5 sec corresponding to 2 Hz |
|
Figure 7.1.5
Train of pulses at 3 Hz. Beat occurs each 0.33 sec |
Figure 7.1.6
AMI. The maximum peak is at 0.33 sec corresponding to 3 Hz |
|
Figure 7.1.7
Train of pulses at 4 Hz. Beat occurs each 0.25 sec |
Figure 7.1.8
AMI. The maximum peak is at 0.25 sec corresponding to 4 Hz |
According
to the previous figures, all the results are very accurate and prove that AMI
works very well for beat detection.
However, the autocorrelation function presents very similar results for
a train of pulses. On the other hand,
AMI presents better results for more complex signals as described in section
7.5.
This
time, a train of pulses of 1 Hz was created adding white noise in the silent
parts (parts closer to zero). The pulse
sections where added with a signal of voice and white noise and in this way
were repeated during the signal. In
addition, the pulses did not start at t=0, they started at=0.0625 sec just to ascertain
if phase could affect the result, it did not.
|
Figure 7.1.9
Train of pulses of 1 Hz, fs=8000, 5 sec. |
Figure 7.1.10 The train of pulses plus white noise |
|
Figure 7.1.11 Train of pulses of 1 Hz added to a white
noise signal |
Figure 7.1.12 AMI.
Maximum peak at 1 sec corresponding to 1 Hz. |
In
Figure 7.10 after adding the white noise to the signal, we still can see the shape
of the train of pulses, for that reason the signal was filtered with a fourth
order Chebyshev LPF, 3 dB of ripple with cut-off frequency at 700 Hz as
presented in Figure 7.11, which looks more a real world signal. In Figure 7.12, the AMI is shown. A
decimation of 5 was applied N=5, maxtau=2, bin=64. After the noise added the
result is still as expected. AMI still
could detect the beat rate in a signal with white noise. White noise does not have memory of the
system; if AMI is applied just to white noise it will not yield any peak.
This
test is exactly the same as the one for 1 Hz, but the train of pulse frequency
is 2 Hz.
|
|
Figure 7.1.14 Train of pulses at 2 Hz added to a white
noise signal |
|
The
results are good again, despite all the added noise AMI found the beat rate at
0.5 sec, corresponding to 2 Hz.
This
time the level of white noise added to the pulses was higher than in the past
tests. The results are still good, but
some degradation is seen in the AMI because the peaks are not as prominent as
before in Figure 7.12. The maximum
level of noise added that still allows get good results, was half of the level
applied to the silences. In Figure 6.21
the signal has already been filtered but note that the level of noise is not
the same as in the pulses, the shape of the pulses is still seen. This picture looks very similar to a real
world music signal.
|
Figure 7.1.16 Train of pulses at 1 Hz added to a white
noise signal |
|
In
Figure 7.17 the AMI maximum value is still at 1 sec (1 Hz), but due to the high
level of noise added to the original signal the AMI response is not as
prominent as before in Figure 7.12.
MIDI
files are a good option for testing because they have a fixed beat rate, and
were converted to WAV files in order to be tested. Files were sampled at Fs=8 KHz, decimation factor N= 2, maxtau=2,
bin=64, and 5 secs of duration.
Figure 7.1.18
AMI of MIDI file 80 bmp (t80bpm.wav) the max peak is at t=0.74 sec = 81 bpm.
This
particular file has 20 beats in 15 sec, approximately 79.5 bpm (measured by
myself). The piece doesn’t have drums, but the beat is very clearly performed
by a strong piano. The result of the
AMI 81 bpm is very acceptable.
This
file has 29 beats in 15 seconds approximately 110.91 bpm. As shown in Figure 7.9 the detection
failed. To find out if more data is
necessary for the detection, the length of the audio file was increased from 5
to 10 sec. The max peak in AMI was
again in t=0.27 (222 bmp).

Figure 7.1.19 AMI of MIDI file 110 bpm
(t110bpm.wav)
the max peak is at t=0.27 detection failed. Must be at t=0.55 to correspond to
110 bpm.
Listening
to the file no drums or any kind of clues were marking the beats. For this
reason, another 5 sec from the same song was selected, this time some
percussion was marking the beats. The
result was t=0.546 sec, very close to the 0.55 expected. Figure 7.20 shows the result.

Figure 7.1.20 AMI of (t110bpmkick.wav)
this time the file contains kicks. The
max peak is at t=0.546
= 109.89 bmp~110 bpm

Figure 7.1.21 MIDI file 124 bpm
(t124bpm.wav) the detection fail must be at t=0.48
This
MIDI file has10 beats in 5 sec at approximately 120 bpm. A maximum was expected close to t=0.48 but
there was not any prominent value in our range of detection, probably, because
this song did not have any drums at all.
After testing in 7.3.2 and 7.3.3 it can be said that this method has
difficulties when no drums or any other instrument marks the beat.
Some
music files from compact discs of popular music were sampled for these tests.
This
is a Beatles song, “Free as a Bird” from the album Anthology (2). Even though the beat is well marked by
Ringo’s drums, the AMI doesn’t present a clear peak. The max is at t=0.83 corresponding to 6 beats in 5 sec in the
song. It worked well, but was not very conclusive, despite the maximum value is
right but the pattern doesn’t show a clear peak around this value. Figure 7.22 shows the AMI for this audio
file.

Figure 7.1.22
AMI of file pballad8k5.wav, 6 beats in 5 sec, max peak at t= 0.83 sec corresponding
to 72 bpm.
Reggae
music has equally spaced beats, but with the structure one strong beat one beat
weak. This file preggae8k5.wav has 7
strong beats in 5 sec = 84 bpm. But it
could also be said that this file has 13 equal spaced beats in 5 sec. (quarter
note level see Figure 4.7). Note that
the AMI has the max peak at t=0.71, corresponding to 7 beats in 5 sec.

Figure 7.1.23
AMI of preggae8k5.wav the max peak is at t=0.713 corresponding to 7 strong
beats in 5 sec. Also the quarter note
level is detected at t=0.382 corresponding to 13 beats in 5 sec.
It
is very interesting to see how people perceive rhythm. I asked a former drummer to tell how many
beats he counted in this 5 sec file. He
said 13 beats, but for me the rate was just 7 beats. It is most interesting to see how AMI detects the maximum rate
for the strong beats at t=0.713 (7 beats in 5 sec), but also presents a second
smaller peak around t=0.382 corresponding to the quarter note (13 beats in 5
sec).
The
Merengue’s structure has one strong beat, one weak beat, one strong, one weak,
and so on. The 5 sec excerpt of the
Sergio Vargas’s song “Que Sera” (pmerengue8k5s.wav) has 6 strong beats in 5 sec
and 12 equally spaced beats in 5 sec.
In this case, the AMI yielded a bigger peak around 0.83 and a minor peak
around 0.41 as expected. Again, AMI
detects the strong beat rate and the quarter note beat rate.

Figure 7.1.24
AMI of pmerengue8k5.wav the max peak is at t=0.882 (6 beats in 5 sec), the
second notorious peak is at t=0.442 (12 beats in 5 sec).
The
Ricky Martin song “Living la Vida Loca” from the album with the same name, was
tested in both English and Spanish versions with very similar results. This song has the same beat rate in both
English and Spanish. The strong beat
rate is 7 beats in 5 sec (t=0.71) and 14 beats in 5 sec (t=0.35) for the quarter note beat rate. The detection was successful in both cases applying AMI as shown
in Figures 7.25, 7.26, 7.27, and 7.28.
|
Spanish Version |
English Version |
||
|
Figure 7.1.25 File platinsp.wav, “Living la Vida Loca” Spanish version
|
Figure 7.1.26 File platinen.wav, “Living la Vida Loca” English version |
||
|
Figure 7.1.27 AMI from Spanish version |
|||
|
Strong beat rate
t= 0.6853 sec
87.55 bpm |
Quarter of note beat
t=0.33 sec
181.8 bpm |
Strong beat rate
t= 0.6855 sec
87.52 bpm |
Quarter of note beat
t= 0.34
176.4 bpm |
Table
7.1 Results from the song “Living la Vida Loca” English and Spanish versions.
To
verify if AMI is able to detect strong and weak beats separately, a special
train of pulses was created with Sound Forge. One train of strong pulses was
created at 2 Hz, and another train of pulses of lower intensity was created at
4 Hz and then combined in one file as presented in Figure 7.29 The AMI result was as expected, the maximum
peak at t=0.5 sec marking the strong beats and the second peak at t=0.25 sec
corresponding to the weak beats.
|
Figure
7.1.29
Train of pulses at 4 and 2 Hz |
Figure 7.1.30
AMI the max peak at t=0.5 sec (2 Hz). The second peak at t=0.25 sec (4 Hz). |
|
Audio Signal |
Auto Mutual Information |
Autocorrelation |
|
|
Figure 7.1.31 AMI t=0.25 sec |
Figure 7.1.32 Autocorrelation t=0.25 sec |
|
1 Hz plus noise |
Figure 7.1.33 AMI t= 1 sec |
Figure
7.34 Autocorrelation t= 1 sec |
|
MIDI file 80bpm |
Figure 7.1.35 AMI
t=0.74 sec ~ 81 bpm. |
Figure 7.36 Autocorrelation t= 0.5 sec,
120 bpm (did not work) |
Figures 7.31, 7.33 and 7.35 present beat detection
with AMI for three different kind of audio files. Figures 7.32, 7.34, 7.36
present autocorrelation applied to the same audio files. These figures show a comparison of AMI and
Autocorrelation in beat detection. The
detection by autocorrelation worked only fine for the train of pulses cases,
when the audio files became more complex the autocorrelation was unable to
detect the beat. Despite of
autocorrelation, AMI can detect beats in more complex signals because it is an
operation for non-linear signals.