Dark Mode

Home

Data Categories

AI & ML Data

YouTube Content Classification Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

YouTube Content Classification Dataset

Social Media and Networking

Tags and Keywords

Arts

Tabular

Classification

Intermediate

Nlp

Trusted By

YouTube Content Classification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides YouTube video metadata, suitable for practising text classification using Natural Language Processing (NLP) techniques. It includes video IDs, titles, descriptions, and categories, making it a valuable resource for those looking to apply and refine their NLP skills. The dataset was generated by scraping YouTube, offering a real-world scenario for data cleaning and analysis, including challenges such as missing values and class imbalance.

Columns

Video ID: A unique identifier for each YouTube video. Note that this column contains some missing data.
title: The title of the YouTube video.
description: The textual description associated with the YouTube video.
category: The category under which the video was classified when scraped.
link: A direct URL to the YouTube video.

Distribution

The dataset is typically provided in a CSV file format. It contains approximately 3,400 video records, derived from an initial scrape of 3,600 videos. The dataset is known to be untidy, featuring missing values and imbalanced classes across its categories, presenting an opportunity for data cleaning and preprocessing exercises.

Usage

This dataset is ideally suited for:

Practising basic text classification using various NLP techniques.
Learning how to handle common data issues such as missing values and imbalanced classes.
Developing and applying data cleaning and preprocessing methods.
Experimenting with different machine learning algorithms for text analysis.

Coverage

The dataset has a global reach, as it comprises YouTube videos accessible worldwide. It was listed on 08/06/2025. The video categories included in the dataset were specifically queried across four main areas: Travel Vlogs, Food, Art and Music, and History. Users should be aware that the data includes missing values and exhibits class imbalance across these categories.

License

CCO

Who Can Use It

This dataset is intended for individuals and researchers, particularly those at an intermediate skill level, who wish to practise and improve their text classification and NLP capabilities. It is also highly beneficial for anyone looking to gain practical experience in data cleaning, handling missing data, and addressing class imbalance in real-world datasets.

Dataset Name Suggestions

YouTube Video Classification Data
NLP YouTube Metadata Dataset
YouTube Content Classification Dataset
Video Description Text Analysis Dataset

Attributes

Original Data Source: Youtube Videos Dataset (~3400 videos)

Listing Stats

VIEWS

DOWNLOADS

LISTED

08/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format

Recommended Datasets

Loading recommendations...