Opendatabay APP

MachineLearning Subreddit Post Data

Reddit & Forum Data

Tags and Keywords

Reddit

Machinelearning

Deeplearning

Ai

Text

Trusted By
Trusted by company1Trusted by company2Trusted by company3
MachineLearning Subreddit Post Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This collection features Reddit posts dedicated to the field of Machine Learning (ML), serving as a primary resource for tracking current news, research developments, and ongoing discussions related to deep learning and Artificial Intelligence (AI). The data captures the activity from a central online community, allowing users to gain insights into trending topics and public engagement metrics within the AI sector.

Columns

The dataset contains nine columns detailing various aspects of each post:
  • title: The headline text of the Reddit submission.
  • score: The aggregated popularity score assigned to the post by users, ranging from 0 to 2943, with a mean score of approximately 61.9.
  • id: The unique identification string assigned to the post.
  • subreddit: The name of the community the post belongs to (exclusively 'MachineLearning').
  • url: The web link associated with the post.
  • num_comments: The total number of comments generated by the post, with a maximum of 211 and a mean of 12.2.
  • body: The text content of the submission, noting that approximately 17% of entries are missing or null (typically expected for link-only posts).
  • created: The Unix timestamp marking the creation time of the post.
  • timestamp: A date-time representation of when the post was created.

Distribution

The data is provided in a standard CSV file format named MachineLearning_reddit.csv, with a total size of 479.18 kB. The structure consists of 9 distinct columns and 469 valid records. The data is anticipated to be updated quarterly to ensure currency with ongoing trends.

Usage

This dataset is ideal for:
  • Sentiment analysis on public opinion regarding specific AI advancements.
  • Identifying discussion topics and clustering related content.
  • Following technological trends, such as Deep Learning, Data Augmentation, AutoML, and Reinforcement Learning, within the community.
  • Training Natural Language Processing (NLP) models focused on technical domain language.

Coverage

The data scope is strictly limited to posts originating from the Reddit machinelearning subreddit. The temporal coverage spans from December 28, 2020, through to November 14, 2021. No specific geographic or detailed demographic segmentation is available beyond the aggregated community activity.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists: For training classifiers, performing time-series analysis on scores, and executing advanced text mining operations.
  • Researchers and Academics: To study the evolution of interest in specific AI subfields.
  • AI/ML Developers: To gauge the popularity and relevance of new tools or concepts.

Dataset Name Suggestions

  • Machine Learning Community Trends
  • Reddit AI Post Metrics
  • Deep Learning Discussion Archive
  • MachineLearning Subreddit Post Data

Attributes

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

20/11/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in CSV Format