Three feature sets for representing timbral texture, rhythmic content and pitch content of music signals using statistical pattern recognition classifiers are evaluated for ten musical genres, classification of 61% (non-real time) and 44% (real time) was achieved[1]. Improved cepstrum feature extraction for optimal music and speech discrimination is done in [2], in which dynamic time warping based classification resulted in approximately 85% accuracy. Comparison between models of auditory perception based features and generic audio features in done using four audio feature sets to distinguish between five audio classes[3].
EMD has been employed in [4] to extract significant musical structures from audio where, it is demonstrated that a rhythmic and harmonic analysis of the signal may be carried out utilizing EMD. The suitability of ensemble EMD for musical tempo estimation for over 450 songs is shown in [5]. In [6] and [7], six types of audio are included in the database: and linear predictive coefficients based features gave classification accuracies of 92.1% and 93.7% with Support Vector Machine and Radial Basis Function Neural Network respectively. On the other hand, Auto Associative Neural Network model and Gaussian Mixture Model yielded classification accuracies of 93.1% and 92.9% respectively. In [8], characteristics from voice and music data in the time-domain and frequency-domain are taken and the performance was evaluated using SVM audio classifier, yielding an accuracy of roughly 90%. Tempo and modulation spectra of timbral features are extracted from the dataset comprising of Hindustani music, which yielded classification accuracy of 92.86% in order to discriminate between bhajan and qawwali genres using SVM and GMM model [9]. Five genres of Indian music are considered in order to classify them using GMM and k-NN models giving accuracy of 91.25% only[10]. Highest classification accuracy of 96.96 was achieved for classification of Tamil Classical & Tamil Folk music using SVM & k-NN models [11]. Using Classical, Folk, Ghazal and Sufi songs of Indian music as the dataset, spectral features are extracted and a classification accuracy of 90.8% was obtained with SVM model; Naïve-based model and k-NN based models showed even lesser accuracy [12]. Most recent works such as [13] used deep neural network model 1-layer RNN- LSTM in order to classify Hindustani & Carnatic music with highest classification accuracy of 96.08% using MFCC features only. Five different genres of Bangla music are classified using spectral features alone, thereby giving accuracy of 90.8% with a deep neural network model named BMNet-5[14].
After an extensive literature survey of the works carried out in both WAM and Indian music, we observed the trade-off between dataset used, audio features and classification models. Thus, we propose to use EMD based feature extraction and evaluate the performance using a k-NN classifier and SVM classifier. Also, very less works have been done so far in Indian music classification.