Digital Content Performance Dataset
Entertainment & Media Consumption
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides details about articles published on Medium between January 2020 and August 2020. It includes information from a selection of prominent publications such as The Startup, Towards Data Science, and HackerNoon. The data was collected by picking random dates, which means some publications may appear more frequently than others. It is suitable for analysing article engagement, content characteristics, and trends in online publishing.
Columns
- Title: The main title of the article.
- SubTitle: The subtitle of the article, if available. If not, it is marked as 'nan'.
- Link: The direct URL to the article.
- Claps: The total number of claps an article received, indicating reader engagement.
- Reading_Time: The estimated reading time of the article.
- Responses: The count of comments or responses received on the article.
- Publication: The name of the publication where the article was featured.
- Title_clean: A cleaned version of the title, addressing any unsupported characters.
- SubTitle_clean: A cleaned version of the subtitle, addressing any unsupported characters.
- Title_wc: The word count of the article's title.
- SubTitle_wc: The word count of the article's subtitle.
Distribution
The dataset is typically provided in a CSV file format, with each row representing a single Medium article. While specific numbers for rows or records are not stated, the data includes quantitative fields like Claps, Reading_Time, and Responses, as well as categorical fields such as Publication names. It allows for analysis of various distributions, including the spread of claps, reading times, and responses, along with insights into article word counts and publication prevalence.
Usage
This dataset is ideal for various applications, including:
- Analysing article engagement: Understanding factors influencing claps, reading time, and responses.
- Identifying popular content trends: Discovering what topics and formats perform well on Medium.
- Publication performance analysis: Evaluating the reach and impact of different Medium publications.
- Natural Language Processing (NLP) research: Utilising article titles and subtitles for text analysis and topic modelling.
- Content strategy development: Informing decisions for content creators and digital marketers.
Coverage
The data covers a time range from January 2020 to August 2020. While a specific geographic scope is not detailed, the content originates from globally accessible online publications on Medium. The data collection methodology involved random date selection, which might lead to an uneven representation of publications.
License
CC0
Who Can Use It
This dataset is valuable for:
- Data scientists and analysts: For machine learning projects, statistical analysis, and predictive modelling of content performance.
- Content strategists and marketers: To understand audience engagement, popular topics, and optimise content creation.
- Researchers: In fields such as digital humanities, media studies, and computational linguistics.
- Students and educators: For learning data analysis, NLP, and web scraping concepts.
- Publishers: To benchmark their content performance and identify growth opportunities.
Dataset Name Suggestions
- Medium Articles Dataset 2020 Edition
- Medium Article Engagement Metrics 2020
- Digital Content Performance Data
- Online Article Analytics
Attributes
Original Data Source: Medium Articles Dataset 2020 Edition