Lyrical Content Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset serves as a valuable resource for projects focused on lyrical analysis. It was originally compiled for a personal project and contains a collection of lyrics from eleven distinct artists, including prominent names such as Drake, J. Cole, Kendrick Lamar, Eminem, Nas, Skepta, Rapsody, Nicki Minaj, Dave, 2Pac, and Future. The data was gathered by utilising Spotify's and Genius' APIs. It is well-suited for various applications within text analysis, text pre-processing, exploratory data analysis (EDA), and text classification tasks. This dataset falls under the categories of Entertainment & Media Consumption and is highly relevant for AI & ML data applications.
Columns
The dataset comprises the following key columns:
- track_name: This column provides the name of each song.
- artist: This column lists the name of each artist.
- raw_lyrics: This contains the raw textual content of the lyrics, as scraped from the Genius website. It is important to note that some entries in this column may exhibit variations in formatting structure, leading to inconsistencies in text presentation.
- artist_verses: This column provides text extracted from the raw lyrics, specifically focusing on verses performed by the individual artist only.
Distribution
The data file is typically provided in a CSV format. The dataset contains 530 unique entries across track names, artists, and raw lyrics, indicating a total of 530 records. Specific file size details are not provided within the available information.
Usage
This dataset is ideally suited for a variety of applications and use cases, including but not limited to:
- Text analysis: For in-depth study of lyrical patterns, themes, and styles.
- Text pre-processing: As a source for cleaning and preparing text data for machine learning models.
- Text EDA (Exploratory Data Analysis): For initial investigation and understanding of the dataset's characteristics.
- Text classification: For building models to categorise or classify lyrical content based on artists or genres.
Coverage
The dataset's regional coverage is global. It includes lyrics from 11 different artists. Specifically, Drake and Nicki Minaj each account for 9% of the unique artist values, with the remaining 81% attributed to 430 other artists. There are no specific time range notes on the lyrical content.
License
CC0
Who Can Use It
This dataset is beneficial for a wide range of users, including:
- Data scientists and machine learning engineers: For developing and testing natural language processing (NLP) models, particularly for text analysis and classification tasks.
- Researchers: Those studying music, linguistics, cultural trends in hip-hop, or the evolution of lyrical content over time.
- Academics: For educational purposes in fields related to data science, text mining, and digital humanities.
- Developers: Creating applications that require analysis or classification of lyrical data.
Dataset Name Suggestions
- Rap Lyrics Collection
- Multi-Artist Hip-Hop Verses
- Lyrical Content Dataset
- Music Text Analysis Data
- Global Hip-Hop Lyrics
Attributes
Attributes
Original Data Source: Rap Lyrics Dataset