Music Popularity Features Dataset
News & Media Articles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed to predict song popularity based on various musical attributes. Humans have a deep connection with songs and music, which can positively impact mood, reduce pain and anxiety, and enable emotional expression. Research highlights music's widespread benefits for physical and mental health. This particular dataset facilitates studies aimed at understanding songs and their popularity by analysing specific parameters. The core objective is to predict song popularity, presenting a straightforward yet challenging regression problem, notable for the presence of strong multicollinearity among its features.
Columns
- song_name: The name of the song.
- song_popularity: A numerical value representing the song's popularity, ranging from 0 to 100.
- song_duration_ms: The duration of the song in milliseconds.
- acousticness: A confidence measure from 0.0 to 1.0 indicating whether the track is acoustic.
- danceability: A measure from 0.0 to 1.0 describing how suitable a track is for dancing based on musical elements like tempo, rhythm stability, beat strength, and overall regularity.
- energy: A measure from 0.0 to 1.0 representing a perceptual measure of intensity and activity.
- instrumentalness: Predicts whether a track contains no vocals. Values closer to 1.0 indicate a greater likelihood of the track being instrumental.
- key: The key the track is in, represented as integers (e.g., 0 for C, 1 for C#, etc.).
- liveness: Detects the presence of an audience in the recording. Values above 0.8 indicate a strong likelihood the track was performed live.
- loudness: The overall loudness of a track in decibels (dB), typically ranging from -60 to 0 dB.
- audio_mode: Indicates the modality (major or minor) of a track, with 0 typically representing minor and 1 representing major.
- speechiness: Detects the presence of spoken words in a track. Values above 0.66 indicate spoken word, between 0.33 and 0.66 contain both music and speech, and below 0.33 indicate music and other non-speech-like tracks.
- tempo: The overall estimated tempo of a track in beats per minute (BPM).
- time_signature: An estimated overall time signature of a track.
- audio_valence: A measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track.
Distribution
The dataset is typically provided as a CSV file, with a sample file named
song_data.csv
. It contains 15 columns and 18,800 records (rows). The file size is 2.22 MB. All columns are fully populated, with no missing or mismatched values.Usage
This dataset is ideal for developing and evaluating regression models. It can be used to predict song popularity by analysing factors such as energy, acoustics, instrumentalness, liveness, and danceability. Users can clean the dataset if necessary, build various regression models, and then evaluate and compare their performance using metrics like R-squared (R2) and Root Mean Squared Error (RMSE).
Coverage
The dataset's specific geographic location, time range, and demographic scope are not detailed in the available information. However, it is noted that the dataset originates from Kaggle. All columns within the dataset are complete, with 100% valid data across all 18,800 records.
License
CC0: Public Domain
Who Can Use It
This dataset is suitable for beginners in data science and machine learning. It is particularly relevant for those interested in regression problems, including linear regression. Users can apply it to understand the factors contributing to song popularity and build predictive models.
Dataset Name Suggestions
- Song Popularity Prediction Data
- Music Popularity Features Dataset
- Audio Characteristics for Popularity
- Predicting Song Hits Data
Attributes
Original Data Source: Music Popularity Features Dataset