Multi-Lingual Lyrics for Genre Classification
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
The data serves as a resource for classifying songs based on their lyrical content. It features labeled examples drawn from multiple sources. A language detection library was employed to automatically label the lyrics, confirming the presence of 34 distinct languages within the collection.
Columns
- Song: The title of the track.
- Song year: The specific year in which the song was originally released.
- Artist: The name of the performing artist.
- Genre: The identified musical category of the song (e.g., Rock or Pop). This feature was generated via a custom labeling function utilizing the Spotify API, where the most frequent genre associated with an artist was chosen as the dominant genre.
- Lyrics: The unprocessed, raw lyrical content of the song.
- Track_id: A unique identifier for the song.
Distribution
The dataset is typically distributed as a CSV file. The included sample file (
test.csv) is approximately 10.14 MB and contains 6 columns and 7,935 valid records. Although the sample size is moderate, the overall product contains over 290,000 samples. The data structure is tailored for multiclass classification tasks involving text analysis.Usage
This data product is ideal for developing models for genre classification. It supports various computational linguistics projects, including experimenting with different feature engineering techniques, text mining, and natural language processing applications focusing on lyric structure.
Coverage
The lyrical content spans releases occurring between 1970 and 2016. The geographic and linguistic scope is wide, with lyrics available across 34 different languages. The data’s foundation comes from several distinct sources, contributing to its broad coverage of artists and time periods.
License
CC BY-SA 4.0
Who Can Use It
Intended users include data scientists, academic researchers, and students focused on information retrieval and text analysis. It is highly suited for anyone undertaking classification challenges where the goal is to predict musical genre based purely on lyrical input.
Dataset Name Suggestions
- Multi-Lingual Lyrics for Genre Classification
- Global Music Lyrics Corpus
- Music Genre Prediction Lyrics
Attributes
Original Data Source: Multi-Lingual Lyrics for Genre Classification
Loading...
