Top Kaggle Datasets Analysed
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a curated list of Kaggle's most popular datasets, offering insights into their characteristics and community engagement. Kaggle, a Google LLC subsidiary, functions as an online community for data scientists and machine learning practitioners, facilitating the discovery and publication of datasets, model building, collaborative work, and participation in data science challenges. Launched in 2010 with machine learning competitions, Kaggle has since expanded to include a public data platform, a cloud-based data science workbench, and artificial intelligence education resources.
Columns
- id: An identifier for each dataset, with values ranging from 0 to 998. It has a mean of 499 and a standard deviation of 288, and 999 valid entries.
- title: The title of the dataset, featuring 987 unique titles out of 999 valid entries.
- uploaded_by: Indicates the entity or user who uploaded the dataset. There are 712 unique uploaders among 999 valid entries, with 'UCI Machine Learning' and 'Kaggle' being among the most frequent.
- last_updated: Specifies the last update time for each dataset. There are 33 unique update frequencies out of 999 valid entries, with "Updated 4 years ago" being the most common.
- usability: Represents the usability score of the dataset, ranging from 1.2 to 10.0, with a mean of 8.12 and a standard deviation of 1.62 across 999 valid entries.
- files: Describes the number and type of files included in the dataset, with 299 unique configurations. "1 File (CSV)" is the most frequent (32%) among 999 valid entries.
- size: Denotes the size of the dataset. There are 466 unique size values out of 999 valid entries, with 1 MB (4%) and 3 MB (3%) being common.
- upvotes: Shows the number of upvotes received by the dataset, ranging from 114 to 9710, with a mean of 458 and a standard deviation of 789 across 999 valid entries.
- badge: Indicates any badge earned by the dataset, such as 'Gold' (47%) or 'Silver' (42%), with 4 unique badge types among 999 valid entries.
Distribution
The dataset is typically provided as a data file in CSV format, specifically noted as
kaggle_-1000.csv
with a size of 108 kB. It comprises 9 columns and includes 999 valid records. The listed datasets within this file commonly consist of 1 CSV file (32% of instances) and frequently appear in sizes such as 1 MB (4% of instances).Usage
This dataset is ideal for:
- Identifying and exploring popular public datasets on Kaggle.
- Understanding trends in data science and machine learning datasets.
- Sourcing data for machine learning model development and analysis.
- Gaining insights into community engagement and dataset popularity metrics.
- Supporting research into dataset characteristics and usability on data platforms.
Coverage
The dataset has a global scope, reflecting the international community of data scientists and machine learning practitioners on Kaggle. The temporal coverage relates to the update history of the listed datasets, with entries showing updates up to four years ago, and pertains to datasets hosted on Kaggle, a platform established in 2010.
License
CC0: Public Domain
Who Can Use It
- Data Scientists: To discover trending datasets, inform their projects, and analyse dataset attributes.
- Machine Learning Engineers: For sourcing data to train and test models, and for understanding dataset quality indicators.
- Academics and Researchers: To study data usage patterns, dataset popularity, and the evolution of data science resources.
- Platform Administrators: To gain insights into platform content and user behaviour.
- Data Enthusiasts: To explore a vast array of high-quality datasets.
Dataset Name Suggestions
- Kaggle Top 1000 Datasets List
- Popular Kaggle Datasets Overview
- Kaggle Dataset Metrics
- Top Kaggle Datasets Analysed
Attributes
Original Data Source: Top Kaggle Datasets Analysed