Medium Post Titles and Categories
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
A collection of approximately 126,000 post titles, subtitles, and categories scraped from the online publishing platform, Medium. This data is intended for use in learning text analytics, the basics of Natural Language Processing (NLP), text classification, and other machine learning applications focused on textual data. It provides a rich sample of content from a popular blogging site, offering insights into online writing trends and content organisation.
Columns
- category: The category assigned to the Medium blog post.
- title: The main title of the Medium blog post.
- subtitle: The secondary title or subtitle of the Medium blog post.
- subtitle_truncated_flag: A boolean value indicating if Medium truncated the subtitle (
true
if truncated,false
otherwise).
Distribution
The data is available in a single CSV file named
medium_post_titles.csv
, with a file size of approximately 20.59 MB. The dataset is structured with four columns and contains around 126,000 records.Usage
Ideal applications for this dataset include:
- Training machine learning models for text classification and content categorisation.
- Performing text analytics to understand trends in blog post titles and subtitles.
- Learning the basics of Natural Language Processing (NLP) techniques.
- Developing content recommendation systems.
Coverage
The dataset consists of content scraped from the Medium platform. It does not have a specific geographical or demographic focus but reflects the variety of topics and authors present on the site. There is no specified time range for when the posts were scraped.
License
Attribution 3.0 Unported (CC BY 3.0)
Who Can Use It
- Data Science Students and Beginners: Can use this dataset to practice fundamental NLP and text mining skills.
- Researchers: Can analyse content trends, writing styles, and categorisation patterns on the Medium platform.
- Machine Learning Engineers: Can use this data to build and test models for text classification and analysis.
- Content Strategists: Can explore popular topics and effective title/subtitle structures.
Dataset Name Suggestions
- Medium Post Titles and Categories
- Text Analytics Dataset for Medium Articles
- Medium Blog Post Content Analysis
- NLP Dataset: Medium Post Titles
- Medium Article Metadata
Attributes
Original Data Source: Medium Post Titles and Categories