Music Genre Classification

Using Principal Component Analysis Combined with K-Nearest Neighbors and Logistic Regression

Amanda Li, Chris Hyorok Lee, Dominick Vaske

Project Overview

This project explores music genre classification using machine learning techniques applied to Spotify audio features. By combining Principal Component Analysis (PCA) with K-Nearest Neighbors and Logistic Regression, we achieved impressive classification accuracies on five distinct musical genres.

500
Songs Analyzed
5
Music Genres
91%
Variance Captured
90%
Best Accuracy

Musical Genres Studied

We analyzed five distinct genres, each with unique audio characteristics that make them suitable for classification:

Heavy Metal
Highest energy, high tempo and loudness
Latino Pop
Highest valence (positiveness) and danceability
Neo Soul
Balanced features, moderate danceability
Punk Pop
High energy, similar to heavy metal
Study Music
Highest acousticness and instrumentalness

Methodology

Our approach combined dimensionality reduction with supervised learning techniques:

1

Data Collection

Gathered 100 songs per genre using Spotify Web API

2

Feature Extraction

Extracted 12 audio features including energy, danceability, and valence

3

Data Preprocessing

Standardized features and removed irrelevant variables

4

PCA Reduction

Reduced dimensions from 10 to 6 while retaining 91% variance

5

Model Training

Trained KNN (k=7) and Logistic Regression classifiers

6

Evaluation

Cross-validation and confusion matrix analysis

Principal Component Analysis Results

PCA effectively reduced our dataset dimensionality while preserving most of the variance:

Key Finding: The first 6 principal components captured 91.3% of the total variance, allowing us to reduce computational complexity while maintaining predictive power. The relatively low variance in the first component (40.21%) indicates that our audio features are not highly correlated, which is beneficial for classification.

Model Performance

Overall Accuracy

K-Value Optimization

Classification Results

KNN Confusion Matrix (89% Accuracy)

Heavy Metal Latino Pop Neo Soul Punk Pop Study Music
Heavy Metal 25 0 1 2 0
Latino Pop 0 11 3 0 0
Neo Soul 0 1 9 0 0
Punk Pop 4 0 0 20 0
Study Music 0 0 0 0 24

Logistic Regression Matrix (90% Accuracy)

Heavy Metal Latino Pop Neo Soul Punk Pop Study Music
Heavy Metal 25 0 2 1 0
Latino Pop 0 12 2 0 0
Neo Soul 0 1 9 0 0
Punk Pop 2 2 0 20 0
Study Music 0 0 0 0 24

Key Insights

🎯 Perfect Classification

Study Music achieved 100% accuracy in both models due to its distinct instrumental characteristics and high acousticness values.

🔄 Genre Confusion

Heavy Metal and Punk Pop showed some confusion due to similar instrumentation, loudness, and tempo characteristics.

📊 Model Comparison

Logistic Regression slightly outperformed KNN (90% vs 89%) with better handling of multi-class boundaries.

🎵 Feature Importance

Energy, Acousticness, and Instrumentalness were the most distinguishing features across genres.

Technical Implementation

🔧 Tools & Technologies

  • Python - Core programming language
  • Spotify Web API - Data collection
  • Scikit-learn - Machine learning models
  • Pandas & NumPy - Data manipulation
  • Matplotlib - Data visualization

📈 Model Optimization

  • Cross-validation for robust evaluation
  • Grid search for optimal k-value (k=7)
  • Feature standardization for PCA
  • One-vs-rest for multi-class classification

Future Improvements

Several enhancements could further improve the classification accuracy:

📊 Larger Dataset

Increasing the number of songs per genre could improve model generalization and reduce overfitting, especially for underperforming genres like Neo Soul.

🧠 Advanced Models

Implementing deep learning approaches like CNNs, RNNs, or LSTMs could capture more complex patterns in audio features.

🎵 Additional Features

Including rhythm patterns, temporal dynamics, and artist metadata could provide richer feature representations.

âš¡ Real-time Classification

Developing a web application for real-time genre prediction would demonstrate practical applications of the model.

Explore the Implementation

Dive deeper into the technical details and see the code in action

View Google Colab GitHub Repository Download Paper (PDF)