Music Genre Classification Analysis

Project Overview

This project explores music genre classification using machine learning techniques applied to Spotify audio features. By combining Principal Component Analysis (PCA) with K-Nearest Neighbors and Logistic Regression, we achieved impressive classification accuracies on five distinct musical genres.

500

Songs Analyzed

5

Music Genres

91%

Variance Captured

90%

Best Accuracy

Musical Genres Studied

We analyzed five distinct genres, each with unique audio characteristics that make them suitable for classification:

Heavy Metal

Highest energy, high tempo and loudness

Latino Pop

Highest valence (positiveness) and danceability

Neo Soul

Balanced features, moderate danceability

Punk Pop

High energy, similar to heavy metal

Study Music

Highest acousticness and instrumentalness

Methodology

Our approach combined dimensionality reduction with supervised learning techniques:

1

Data Collection

Gathered 100 songs per genre using Spotify Web API

2

Feature Extraction

Extracted 12 audio features including energy, danceability, and valence

3

Data Preprocessing

Standardized features and removed irrelevant variables

4

PCA Reduction

Reduced dimensions from 10 to 6 while retaining 91% variance

5

Model Training

Trained KNN (k=7) and Logistic Regression classifiers

6

Evaluation

Cross-validation and confusion matrix analysis

Principal Component Analysis Results

PCA effectively reduced our dataset dimensionality while preserving most of the variance:

Key Finding: The first 6 principal components captured 91.3% of the total variance, allowing us to reduce computational complexity while maintaining predictive power. The relatively low variance in the first component (40.21%) indicates that our audio features are not highly correlated, which is beneficial for classification.

Model Performance

Overall Accuracy

K-Value Optimization

Classification Results

KNN Confusion Matrix (89% Accuracy)

	Heavy Metal	Latino Pop	Neo Soul	Punk Pop	Study Music
Heavy Metal	25	0	1	2	0
Latino Pop	0	11	3	0	0
Neo Soul	0	1	9	0	0
Punk Pop	4	0	0	20	0
Study Music	0	0	0	0	24

Logistic Regression Matrix (90% Accuracy)

	Heavy Metal	Latino Pop	Neo Soul	Punk Pop	Study Music
Heavy Metal	25	0	2	1	0
Latino Pop	0	12	2	0	0
Neo Soul	0	1	9	0	0
Punk Pop	2	2	0	20	0
Study Music	0	0	0	0	24

Key Insights

🎯 Perfect Classification

Study Music achieved 100% accuracy in both models due to its distinct instrumental characteristics and high acousticness values.

🔄 Genre Confusion

Heavy Metal and Punk Pop showed some confusion due to similar instrumentation, loudness, and tempo characteristics.

📊 Model Comparison

Logistic Regression slightly outperformed KNN (90% vs 89%) with better handling of multi-class boundaries.

🎵 Feature Importance

Energy, Acousticness, and Instrumentalness were the most distinguishing features across genres.

Technical Implementation

🔧 Tools & Technologies

Python - Core programming language
Spotify Web API - Data collection
Scikit-learn - Machine learning models
Pandas & NumPy - Data manipulation
Matplotlib - Data visualization

📈 Model Optimization

Cross-validation for robust evaluation
Grid search for optimal k-value (k=7)
Feature standardization for PCA
One-vs-rest for multi-class classification

Future Improvements

Several enhancements could further improve the classification accuracy:

📊 Larger Dataset

Increasing the number of songs per genre could improve model generalization and reduce overfitting, especially for underperforming genres like Neo Soul.

🧠 Advanced Models

Implementing deep learning approaches like CNNs, RNNs, or LSTMs could capture more complex patterns in audio features.

🎵 Additional Features

Including rhythm patterns, temporal dynamics, and artist metadata could provide richer feature representations.

⚡ Real-time Classification

Developing a web application for real-time genre prediction would demonstrate practical applications of the model.

Music Genre Classification