Opendatabay APP

Kaggle Dataset Engagement Metadata

Data Science and Analytics

Tags and Keywords

Kaggle

Metadata

Engagement

Trends

Medals

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Kaggle Dataset Engagement Metadata Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Metadata regarding 42,955 public datasets published on Kaggle between December 2015 and November 2021 provides a detailed look at community engagement and content trends. This collection captures the lifecycle and reception of shared data, detailing the time of creation, the specific license used, and the engagement metrics that determine popularity. Key indicators such as upvotes, downloads, and the resulting medal awards (Gold, Silver, Bronze) are included to facilitate analysis of what drives successful data contributions on the platform. The data excludes votes from novice users to reflect verified community endorsement, offering a clear view of quality as perceived by the active data science community.

Columns

  • Medal: The colour of the medal received (Gold, Silver, Bronze) based on community upvotes.
  • Created: The date and time when the dataset was originally published.
  • URL: The direct web address to the dataset page on Kaggle.
  • Views: The total count of page views the dataset has received.
  • Votes: The total number of upvotes.
  • Votes_Advanced: The vote count excluding those from users with the 'Novice' rank, used for medal calculation.
  • Downloads: The total number of times the dataset has been downloaded.
  • Kernels: The count of kernels (notebooks/scripts) associated with the dataset.
  • Title: The display title of the dataset.
  • Description: The text description provided by the author.
  • Tags: Keywords or categories assigned to the dataset (e.g., Business, Internet).
  • License: The specific licence under which the dataset is published (e.g., CC0, CC BY-SA 4.0).

Distribution

This dataset is structured as a tabular file (CSV format) containing approximately 43,000 records and 12 columns. It aggregates metadata from a six-year period, presenting a robust sample size for statistical analysis of platform activity and user behaviour.

Usage

  • Trend Analysis: Identify which topics or tags generate the most engagement and downloads over time.
  • Engagement Prediction: Build models to predict the likelihood of a dataset receiving a medal based on its description, license, or initial views.
  • Platform Research: Analyse the growth of open data sharing and the distribution of licence types within the data science community.
  • User Behaviour Study: Understand the correlation between views, downloads, and advanced votes to gauge data utility versus popularity.

Coverage

The data covers public activity on the Kaggle platform globally from 18 August 2016 to 21 November 2021. It encompasses a wide variety of domains as indicated by tags, including Computer Science, Internet, and Online Communities.

License

CC0: Public Domain

Who Can Use It

  • Data Analysts investigating community trends and content popularity.
  • Community Managers seeking to understand user engagement mechanics.
  • Researchers studying the ecosystem of open data platforms.
  • Machine Learning Engineers creating recommendation systems for data discovery.

Dataset Name Suggestions

  • Kaggle Dataset Engagement Metadata
  • Data Science Community Trends 2015-2021
  • Kaggle Medals and Metrics Archive
  • Public Dataset Popularity Attributes

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

10/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format