Opendatabay APP

Music Popularity Classification Dataset

Search Trends & Queries

Tags and Keywords

Music

Spotify

Hit

Prediction

Audio

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Music Popularity Classification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset, Spotify Hit Predictor, comprises over 40,000 individual music tracks from 1960 to 2019, complete with their audio features extracted using Spotify's Web API. Each track is meticulously labelled as either a 'Hit' (1) or a 'Flop' (0) based on specific criteria established by the author. The primary purpose of this dataset is to facilitate the development of classification models capable of predicting whether a song will achieve mainstream popularity. It offers a valuable resource for investigating the underlying characteristics that distinguish successful tracks from those that are less popular, providing insight into whether pop music adheres to a discernible formula.

Columns

  • track: The name of the track.
  • artist: The name of the artist who performed the track.
  • uri: The unique resource identifier for the track on Spotify.
  • danceability: A measure from 0.0 to 1.0 indicating how suitable a track is for dancing, considering elements like tempo, rhythm, and regularity.
  • energy: A perceptual measure (0.0 to 1.0) of intensity and activity, with higher values typically indicating fast, loud, and noisy tracks.
  • key: The estimated musical key of the track, represented by integers (e.g., 0 for C, 1 for C♯/D♭; -1 if no key was detected).
  • loudness: The average loudness of the track in decibels (dB), typically ranging from -60 to 0 dB.
  • mode: Indicates the modality of the track, with 1 representing major and 0 representing minor.
  • speechiness: Detects the presence of spoken words (0.0 to 1.0); values above 0.66 suggest entirely spoken words, while values between 0.33 and 0.66 may indicate mixed music and speech.
  • acousticness: A confidence measure (0.0 to 1.0) of whether the track is acoustic, with 1.0 indicating high confidence.
  • instrumentalness: Predicts the absence of vocals (0.0 to 1.0); values above 0.5 suggest instrumental tracks.
  • liveness: Detects the presence of an audience in the recording; values above 0.8 strongly suggest a live performance.
  • valence: A measure (0.0 to 1.0) describing the musical positiveness conveyed by the track, where high values denote happy or cheerful moods.
  • tempo: The estimated overall tempo of the track in beats per minute (BPM).
  • duration_ms: The duration of the track in milliseconds.
  • time_signature: An estimated overall time signature of the track, specifying beats per bar.
  • chorus_hit: The author's estimate of the chorus start time, specifically the timestamp of the beginning of the third section of the track in milliseconds.
  • sections: The total number of distinct sections within the track.
  • target: The classification variable, where '1' signifies a 'hit' (meaning the song appeared on Billboard's Hot-100 list at least once in its decade) and '0' signifies a 'flop'. A 'flop' specifically means the track did not appear on the hit list, its artist did not appear on the hit list, it belongs to a non-mainstream or avant-garde genre not represented on the hit list, and it was available in the US market.

Distribution

This dataset contains over 40,000 individual tracks, each represented by a collection of features and a 'hit' or 'flop' label. The data is typically provided in a CSV file format, making it readily accessible for analysis and model building. The structure includes both categorical identifiers (like track name and artist) and numerous numerical audio features derived from Spotify's API.

Usage

This dataset is ideally suited for:
  • Developing and testing machine learning classification models to predict music popularity.
  • Data analysis to uncover patterns and relationships between audio features and a song's commercial success.
  • Researching the science behind popular music and identifying common characteristics of hit songs.
  • Creating tools or applications that can forecast a track's potential for mainstream success.

Coverage

The dataset spans a significant time range from 1960 to 2019, offering decades of music data for analysis. While the audio features are broadly applicable, the 'flop' criteria specifically consider tracks available in the US market and their presence (or absence) in US-based popularity charts like Billboard's Hot-100. There is no specific demographic scope mentioned for the dataset itself, rather it covers music across these decades.

License

CC BY-NC-SA 4.0 license

Who Can Use It

This dataset is suitable for:
  • Data scientists and machine learning engineers keen on building predictive models.
  • Music industry professionals seeking to understand factors contributing to song popularity.
  • Researchers and academics studying music trends, audio analysis, or cultural phenomena.
  • Students and hobbyists interested in exploring large music datasets and applying data science techniques.

Dataset Name Suggestions

  • Spotify Hit Predictor: Decades of Tracks
  • Music Popularity Classification Dataset
  • Global Music Hit/Flop Analysis
  • Audio Features for Music Success
  • Track Popularity Prediction (1960-2019)

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

14/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format