Opendatabay APP

Multi-Lingual Lyrics for Genre Classification

Data Science and Analytics

Tags and Keywords

Lyrics

Genre

Text

Classification

Multilingual

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Multi-Lingual Lyrics for Genre Classification Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

The data serves as a resource for classifying songs based on their lyrical content. It features labeled examples drawn from multiple sources. A language detection library was employed to automatically label the lyrics, confirming the presence of 34 distinct languages within the collection.

Columns

  • Song: The title of the track.
  • Song year: The specific year in which the song was originally released.
  • Artist: The name of the performing artist.
  • Genre: The identified musical category of the song (e.g., Rock or Pop). This feature was generated via a custom labeling function utilizing the Spotify API, where the most frequent genre associated with an artist was chosen as the dominant genre.
  • Lyrics: The unprocessed, raw lyrical content of the song.
  • Track_id: A unique identifier for the song.

Distribution

The dataset is typically distributed as a CSV file. The included sample file (test.csv) is approximately 10.14 MB and contains 6 columns and 7,935 valid records. Although the sample size is moderate, the overall product contains over 290,000 samples. The data structure is tailored for multiclass classification tasks involving text analysis.

Usage

This data product is ideal for developing models for genre classification. It supports various computational linguistics projects, including experimenting with different feature engineering techniques, text mining, and natural language processing applications focusing on lyric structure.

Coverage

The lyrical content spans releases occurring between 1970 and 2016. The geographic and linguistic scope is wide, with lyrics available across 34 different languages. The data’s foundation comes from several distinct sources, contributing to its broad coverage of artists and time periods.

License

CC BY-SA 4.0

Who Can Use It

Intended users include data scientists, academic researchers, and students focused on information retrieval and text analysis. It is highly suited for anyone undertaking classification challenges where the goal is to predict musical genre based purely on lyrical input.

Dataset Name Suggestions

  • Multi-Lingual Lyrics for Genre Classification
  • Global Music Lyrics Corpus
  • Music Genre Prediction Lyrics

Attributes

Listing Stats

VIEWS

4

DOWNLOADS

0

LISTED

18/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format