Opendatabay APP

Medium Post Titles and Categories

Data Science and Analytics

Tags and Keywords

Medium

Text

Nlp

Classification

Analytics

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Medium Post Titles and Categories Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

A collection of approximately 126,000 post titles, subtitles, and categories scraped from the online publishing platform, Medium. This data is intended for use in learning text analytics, the basics of Natural Language Processing (NLP), text classification, and other machine learning applications focused on textual data. It provides a rich sample of content from a popular blogging site, offering insights into online writing trends and content organisation.

Columns

  • category: The category assigned to the Medium blog post.
  • title: The main title of the Medium blog post.
  • subtitle: The secondary title or subtitle of the Medium blog post.
  • subtitle_truncated_flag: A boolean value indicating if Medium truncated the subtitle (true if truncated, false otherwise).

Distribution

The data is available in a single CSV file named medium_post_titles.csv, with a file size of approximately 20.59 MB. The dataset is structured with four columns and contains around 126,000 records.

Usage

Ideal applications for this dataset include:
  • Training machine learning models for text classification and content categorisation.
  • Performing text analytics to understand trends in blog post titles and subtitles.
  • Learning the basics of Natural Language Processing (NLP) techniques.
  • Developing content recommendation systems.

Coverage

The dataset consists of content scraped from the Medium platform. It does not have a specific geographical or demographic focus but reflects the variety of topics and authors present on the site. There is no specified time range for when the posts were scraped.

License

Attribution 3.0 Unported (CC BY 3.0)

Who Can Use It

  • Data Science Students and Beginners: Can use this dataset to practice fundamental NLP and text mining skills.
  • Researchers: Can analyse content trends, writing styles, and categorisation patterns on the Medium platform.
  • Machine Learning Engineers: Can use this data to build and test models for text classification and analysis.
  • Content Strategists: Can explore popular topics and effective title/subtitle structures.

Dataset Name Suggestions

  • Medium Post Titles and Categories
  • Text Analytics Dataset for Medium Articles
  • Medium Blog Post Content Analysis
  • NLP Dataset: Medium Post Titles
  • Medium Article Metadata

Attributes

Original Data Source: Medium Post Titles and Categories

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

03/10/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format