Opendatabay APP

Streaming Music Bias Research Dataset

Product Reviews & Feedback

Tags and Keywords

Spotify

Ratings

Music

Bias

Popularity

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Streaming Music Bias Research Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Presents a large-scale collection of Spotify Music popularity data designed to support research into recommendation system biases. It includes 1,017,947 binary ratings gathered from 6,696 anonymised users concerning 181,663 unique songs. The primary goal of this dataset is to allow researchers and practitioners to focus on the impact of data collection mechanisms, specifically by addressing issues like noisy implicit feedback (e.g., assuming a background stream implies enjoyment) and self-selection biases found in explicit rating systems.

Columns

The dataset features 7 primary columns:
  • ID: A sequential numerical identifier assigned to each rating record.
  • liked: A rating indicator of whether the song was enjoyed. This is a binary rating, primarily ranging from 0 to 2.
  • personalized: A score reflecting the level of personalisation applied to the song recommendation or interaction, typically ranging from 0 to 2.
  • song_id: An anonymised numerical identifier for the specific music track.
  • spotify_popularity: The popularity score of the track, ranging from 0 (minimum) to 100 (maximum). The average popularity across the data is 54.8.
  • timestamp: The date and time the user interaction or rating event was recorded.
  • user_id: An anonymised numerical identifier for the user who provided the rating.

Distribution

This tabular dataset is generally found in a CSV file format (piki_dataset.csv) and has a file size of approximately 49.93 MB. The data is structured with 7 columns and contains 1,017,947 valid records. There are no mismatched or missing values within the core rating counts. The data collection process is reported as ongoing, with expected updates occurring on a quarterly basis.

Usage

This data is ideal for model development and academic studies in the following areas:
  • Training and evaluating next-generation music recommendation systems that are resilient to feedback biases.
  • Researching methods to quantify and mitigate the influence of noisy implicit user feedback.
  • Analysing how self-selection biases in explicit rating capture impact the fairness and transparency of algorithms.
  • Developing new metrics for evaluating recommendation system quality beyond standard industry measures.

Coverage

The data covers a time range beginning on 21 June 2019 and extending through 12 June 2022. The geographic scope is noted to be North America. All songs and users featured in the dataset have been subjected to an anonymisation process to ensure privacy.

License

CC0: Public Domain

Who Can Use It

  • Machine Learning Engineers: Seeking real-world, large-scale music interaction data for building robust algorithms.
  • Data Bias and Ethics Researchers: Focusing on quantifying systemic unfairness in digital platforms.
  • Students and Academics: Utilising a publicly available, high-quality dataset for dissertations and research papers on recommender systems.

Dataset Name Suggestions

  • Streaming Music Bias Research Dataset
  • Spotify User Ratings and Popularity (2019-2022)
  • Anonymised Music Recommendation Feedback
  • North American Spotify Interaction Data

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

10/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format