6 Proposing Average Mutual Information (AMI) for Beat Detection. 1

*  6.1 Envelope Detection. 2

*  6.2 Decimation. 3

*  6.3 Average Mutual Information Calculation. 4

 

 

6    Proposing Average Mutual Information (AMI) for beat detection

In this chapter a simple beat detection system based on AMI is proposed.  Beats are considered the fundamental component of rhythm and they correspond to the perception of equally spaced temporal units that define tempo in music.  In other words, tempo refers to the rate at which beats occur.  On the other hand, meter imposes an accent structure on beats and refers to the most basic level of rhythmic organization, e.g. “ one, two, three, one two, three...”  This study tries only to detect equally spaced beats.  Music with variable tempo is not considered.

The method attempts to prove that when AMI function is applied to audio signals the yielded result is the musical tempo (rate at which beats occur).  As described in Chapter 5, the proposed method assumes that an excerpt of music is the non-lineal system and all the instruments and voices are the variables, but they sound coherently during performance because they are following the same tempo marked for the beat.  If AMI is applied to some music excerpt, then the obtained value must be the common link or periodicity in the system, the musical tempo.

The system was implemented with the software TSTOOLS for Matlab available on the Internet [43].  It is a software package for signal processing with emphasis on nonlinear time-series analysis.  The TSTOOL package is written partly in Matlab and partly in C++ for computationally demanding algorithms.  All the information regarding this software is placed in the Appendix.

In Figure 6.1 the proposed beat detection algorithm is presented.  The process starts with an audio file in WAV format as input that is read and normalized.  Then the amplitude envelope is detected using the Hilbert transform (section 6.1).  This is done to make the onset/offsets of the signal prominent. These onset and offset are perceived as rhythm (Section 3.2.3 and suggested by [13] [31][33][37]).  Thus, to optimize the data set and consequently diminish the time in the AMI calculations, the signal is decimated by a factor of N as is explained in Section 6.2.  Then, AMI is calculated using the function provided by TSTOOLS which is further explained in section 6.3.  From the AMI result, the maximum peak in the range between 60 bpm and 240 bmp is selected as the beat rate.  This range was selected based on the typical music beat rates, and also because it is used in other similar studies [31].

Figure 6.1 Proposed beat detection algorithm based on AMI

6.1   Envelope detection

As was discussed in section 3.2.3, our auditory systems are sensitive to the signal’s onsets and offsets, perceiving rhythm as an AM envelope.  The Hilbert transform is a common and efficient technique for envelope detection.  Figure 6.2 presents the envelope of one audio file using this technique.

Figure 6.2 Implemented envelope detection based on Hilbert transform

The diagram in 6.3 gives an idea of how this envelope approximation process works.

 

a)      Original signal with envelope.

 

 

 

 

b)      Original signal overlayed with Hilbert transform.

 

 

 

c)      Absolute values of signals shown in b).

 

 

 

d)      Peak hold and sum of previous signals.

 

Figure 6.3 Envelope detection using Hilbert Transform [41]

The Hilbert Transform (t) of a signal x(t) is defined by the equation

Equation 61

where the integral is the Cauchy principal value integral. The reconstruction formula (equation 6-2) defines the Hilbert inverse transform.

Equation 6‑2

A Hilbert transformer produces a -90 degree phase shift for the positive frequency components of the input x(t), then the Hilbert transform of (t) is -x(t) and x(t) and (t) are orthogonal.  But on the other hand, the signal x(t) and its Hilbert transform (t) have the same amplitude spectrum and the same autocorrelation function.Then if Hilbert is applied the beat detection is not affected.

Matlab has already implemented the Hilbert transform. The syntax to use this function to obtain the envelope of a signal s1 is:

Envelope_S1=abs(hilbert(s1));

hilbert(x) returns a complex sequence called the analytic signal from a real data sequence.  The analytic signal has a real part, which is the original data, and an imaginary part, which contains the Hilbert transform.  The imaginary part is a version of the original real sequence with a 90° phase shift.  Sines are therefore transformed to cosines and vice versa.  The Hilbert transformed series has the same amplitude and frequency content as the original real data and includes phase information that depends on the phase of the original data.  The analytic signal for a sequence x has a one-sided Fourier transform, that is, negative frequencies are 0. To approximate the analytic signal, hilbert calculates the FFT of the input sequence, replaces those FFT coefficients that correspond to negative frequencies with zeros, and calculates the inverse FFT of the result.  This algorithm is fully detailed in the Matlab reference toolbox [21].

6.2 Decimation

Decimation facilitates the reduction of data information without losses when it is applied adequately.  Decimation by N is defined as taking every N sample starting with sample 0.  For example, {[0,1,2,3,4,5,6,7,8,9]}=[0,2,4,8].  In Figure 6.5 a) sin(x) is presented.  In Figure 6.5 b) the sin(x) is decimated by factor of 2.  The shape of the signal is still the same but it is only represented with half of the points used in a).  This is equivalent to reducing the sample rate by a factor of 2.

Figure 6.4 a) Signal sin(x) (b) Signal sin(x) decimated by 2

When decimation is applied, two factors must be considered: the scaling and the decimation factors.  If decimation is applied, the change in the frequency scale must be considered when the final result is obtained.  On the other hand, if the decimation factor is too high the degradation of the result is notorious.  In practice the decimation factor must be one that simplifies the calculations minimizing the amount of data and not degrade the final result.  When the decimation factor N increases, the AMI result becomes unclear and the peak disappears.  Figures from 6.5 to 6.9 present the results obtained using AMI over the same audio file with decimation factor between N=1 and N=20.

 

 

Figure 6.5 AMI using no decimation N=1

 

 

Figure 6.6 AMI using decimation N=2

 

 

Figure 6.7 AMI using decimation N=5

 

 

Figure 6.8 AMI using decimation N=10

 

 

 

Figure 6.9 AMI using decimation N=20

6.3   Average Mutual Information Calculation

As was explained in section 5.3, average mutual information (AMI) is a generalization of the correlation function.  It measures the state predictability or the hysteresis of a system and establishes a nonlinear relationship between measurements  drawn from a distributed set of variables ={} and measurementdrawn from another distributed set of variables ={}.  So AMI is the amount learned by the measurement of  about the measurement of .  The algorithm evaluates the normalized histogram of variables , , the normalized histogram of variables ,  and the joint normalized histogram of variables , , .

The time delay T must be large enough so that independent information about the system is in each component of the vector.  However, T must not be so large that the components of the vectors x(t) are independent with respect to each other.  Conversely, if the time delay is too short, the vector components will be independent and will not contain any new information.  AMI is applied to the audio signal as Auto Average Mutual Information, the signal is compared to itself delayed a time T.

AMI is calculated by the function provided by TSTOOLS, which requires the following syntax:

a=amutual(ts,maxtau,#partitions)

Where ts is the vector holding time series data, in this case the WAV file.

maxtau or maximal delay that should be much smaller than the length of the signal.  In all cases the value selected was 2, equivalent to the half of the set data.

#partitions is the number of bins used for histogram calculation, 64 and 128 were used depending of the complexity of the signal.

The output a is the vector of length maxtau-1, holding AMI.

The AMI algorithm (TSTOOLS) uses equidistant histogram boxes, so results are poor in a mathematical sense.  However, it is a fast algorithm based on ternary search trees to store only nonempty boxes.  More information about TSTOOLS is available in Appendix and at the TSTOOLS website [43].

After AMI is calculated the program re-scales the time axis according to the decimation parameter used previously.  From the results obtained, only the data corresponding from 1 Hz to 4 Hz (0.25 sec –1 sec in time) are considered to be in the range of response.  The maximum value or peak in this range is taken as the beat rate in seconds.  This range is equivalent to 60 bpm to 240 bpm; the common beat rates used in popular music.  In addition, the Matlab codes of some of the programs used in this study are compiled in the Appendix.