Highly Voted Kaggle Code Metrics
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This collection features metadata pertaining to highly favoured kernels found on the Kaggle platform. Kernels are central features within Kaggle, enabling users to learn, share insights, and contribute to the community. The data was gathered to allow users to explore various trends and success factors linked to top-performing kernels, such as investigating what influences high vote counts. It includes records detailing more than 900 of the most frequently upvoted kernels.
Columns
The dataset includes 12 specific columns describing the kernels:
- Votes: The specific number of votes received by the kernel.
- Owner: The individual or entity responsible for creating the kernel.
- Kernel: The name or title given to the code submission.
- Dataset: Information relating to the datasets utilised by the kernel.
- Version History: Details concerning the iteration history of the kernel.
- Tags: Keywords or descriptors associated with the kernel's subject matter.
- Output: A description of the outputs generated by the kernel.
- Code type: Indicates whether the content is classified as a script or a notebook.
- Language: The main programming language used, typically Python or R.
- Comments: The total number of comments associated with the kernel post.
- Views: The total number of times the kernel has been viewed.
- Forks: The count of times the kernel has been copied or branched by other users.
Distribution
The data is usually available in a CSV format, with a file size example being 621.85 kB for
voted-kaggle-kernels.csv. It contains 971 records across 12 columns. While many fields are fully populated, certain fields exhibit missing values. For example, the Tags field has 63% missing entries. Conversely, the Language field is complete (100% valid), showing Python as the dominant language at 75%, and the Notebook format is the most frequent code type at 61%. A small percentage (2%) of records are missing data for Version History, Views, and Forks.Usage
This dataset is highly suitable for performing analysis focused on community engagement and data science trends within the Kaggle environment. Ideal applications include:
- Investigating the criteria or attributes that lead to a high volume of votes for a kernel.
- Analysing which kernel owners have secured the largest cumulative number of votes.
- Studying the popularity distribution of programming languages (such as Python versus R) or code types (script versus notebook) among the most successful kernels.
- Exploratory data analysis of general popularity metrics including Views, Comments, and Forks.
Coverage
The scope is strictly limited to metadata collected from the Kaggle platform. The data was captured on a specific date, 26 February 2018, providing a snapshot of the status of highly voted kernels up to that point. This dataset is noted as having an expected update frequency of 'Never'.
License
CC0: Public Domain
Who Can Use It
- Kaggle Contributors: To gain inspiration and insight into the creation of popular and highly engaging kernels.
- Data Researchers: To study community behaviour and content performance metrics within a major data science ecosystem.
- Platform Analysts: To examine how different content characteristics, such as language or output type, influence visibility and user interaction.
Dataset Name Suggestions
- Upvoted Kaggle Kernels
- Highly Voted Kaggle Code Metrics
- Kaggle Kernel Popularity Statistics
- Favorited Kernel Metadata
Attributes
Original Data Source: Highly Voted Kaggle Code Metrics
Loading...
