Synthetic Song Popularity Data
News & Media Articles
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset presents fictional information on 50,000 songs, designed for creative analysis of music trends and song popularity. It was generated by ChatGPT and does not contain real-world data, making it an ideal resource for educational applications, including music analysis, trend forecasting, and studies on song popularity.
Columns
- song_id: A unique identifier for each song.
- song_title: The title of the song.
- artist: The artist who performs the song.
- album: The album on which the song is featured.
- genre: The music genre of the song.
- release_date: The date the song was released.
- duration: The length of the song, measured in seconds.
- popularity: A popularity score for the song, ranging from 1 to 100.
- stream: The total count of streams the song has received.
- language: The language in which the song is performed.
- explicit_content: Indicates whether the song contains explicit material, such as inappropriate language.
- label: The record label responsible for publishing the song.
- composer: The individual who composed the song.
- producer: The producer of the song.
- collaboration: Denotes whether the song features a collaboration with other artists.
Distribution
The dataset comprises 50,000 unique song records, typically found in a CSV file format. It includes 15 distinct columns. While most data fields are fully populated, there are some missing values: 10% for song duration, 5% for language, and a notable 70% for collaboration details.
Usage
This dataset is well-suited for a variety of analytical and educational purposes, including:
- Music analysis and trend forecasting.
- Song popularity studies.
- Data analytics and data visualization exercises.
- Exploratory data analysis.
- Data cleaning practice.
Coverage
The dataset spans a release date range from 6th October 1994 to 5th October 2024. It features 9 unique music genres, with Electronic and Pop each accounting for 25% of the entries, and the remaining 50% falling under "Other" genres. Songs are primarily in English (67%), followed by Spanish (9%), with other languages making up 24% across 7 unique languages. The explicit content distribution is balanced, with roughly 50% of songs marked as explicit.
License
CC BY-SA 4.0
Who Can Use It
This dataset is designed for individuals and organisations engaged in creative and educational endeavours. It is particularly valuable for:
- Data analysts and scientists seeking to practice their skills on a large, varied dataset.
- Researchers studying music trends or data patterns.
- Students undertaking projects in data analytics, statistics, or musicology.
Dataset Name Suggestions
- AI-Generated Fictional Music Dataset
- Synthetic Song Popularity Data
- Music Trend Simulation Records
- ChatGPT Music Dataset
- Fictional 50K Songs Catalogue
Attributes
Original Data Source: Synthetic Song Popularity Data