Opendatabay APP

Towards Data Science Publishing Archive

Synthetic Data Generation

Tags and Keywords

Data

Science

Articles

Metrics

Publishing

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Towards Data Science Publishing Archive Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Captures detailed metrics for articles published on the Towards Data Science (TDS) platform, a major platform for thousands of individuals to share ideas and advance the understanding of data science. The collection includes over 45,000 articles and provides insights into publishing trends and reader engagement from late 2010 to mid-2021. This resource is essential for anyone wishing to analyse the growth and popularity of data science content over the last decade.

Columns

The dataset contains 8 distinct features for each article:
  • publish_date: The date the article went live. The data spans from 21 November 2010 to 31 July 2021.
  • title: The title of the article.
  • author: The individual credited with writing the article.
  • url: The unique web link to access the article.
  • claps: The count of reader support, functioning as a measure similar to 'likes.' The maximum observed value is 52,000.
  • responses: The total number of comments or responses generated by the article. The maximum observed value is 298.
  • reading_time: An estimate of the time required to read the article, calculated based on the assumption of an average adult reading speed (approximately 265 words per minute).
  • paid: A binary field indicating participation in the Medium Partner Program (1 signifies paid/member content, 0 signifies free content).

Distribution

The dataset is provided as a CSV file (tds_data.csv), currently amounting to 8.42 MB. It consists of 48,060 unique article records, each described by 8 features. The data is extracted from the platform archive and is scheduled for updates on a monthly basis.

Usage

This data product is ideally suited for several analytical applications:
  • Content Performance Modelling: Building models to predict article success (high claps or responses) based on structural features like title and reading time.
  • Trend Analysis: Tracking the evolution of popular data science topics and sub-disciplines over the covered time frame.
  • Platform Research: Studying the impact of monetisation strategies, such as the Medium Partner Program, on content production and engagement.
  • Natural Language Processing (NLP): Utilising article titles and features for topic clustering and classification tasks.

Coverage

The temporal scope of the articles spans from November 2010 through to July 2021. The content reflects publications on Towards Data Science, a platform associated with a Canadian-registered corporation, focusing specifically on articles related to data science, computer science, and related business topics.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists: For developing predictive algorithms related to content virality and reader behaviour.
  • Journalism Researchers: To study metrics and publishing patterns on high-volume online platforms.
  • Content Managers: To benchmark article performance and inform future editorial strategies within the data science domain.

Dataset Name Suggestions

  • TDS Article Engagement Metrics 2010–2021
  • Towards Data Science Publishing Archive
  • Data Science Content Performance Index

Attributes

Listing Stats

VIEWS

2

DOWNLOADS

1

LISTED

13/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format