ESTIMATE OF MUSICAL PROGRESS THROUGH WAVELET SCALOGRAMS AND CONVOLUTIONAL NEURAL NETWORKS
Audio tempo estimation is one of the most fundamental tasks in Music Information Retrieval (MIR). In this work, a wavelet scalogram is used as a two-dimensional image representation of the audio signal. Different ways of generating the wavelet scalogram were tested by varying the mother wavelet function and scale levels. The images were used to train a Convolutional Neural Network (CNN) through supervised learning, relating the image to a target tempo value. The k-fold cross-validation method was used to ensure the generalization of the proposed model and to determine the most appropriate way to generate the scalogram for the problem. Data augmentation was implemented online, modifying the scalograms before training. Finally, the model was evaluated on widely used databases in the literature, and the results were compared to the state-of-the-art. Good results were achieved on the "GiantSteps" evaluation databases using Morlet and Shannon mother wavelets.