Opendatabay APP

Kaggle User Geographical Data

Data Science and Analytics

Tags and Keywords

Kaggle

Users

Location

Country

Region

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Kaggle User Geographical Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset augments the official Meta-Kaggle Users.csv file by providing crucial location data, specifically country and region information for Kaggle users. The original Meta-Kaggle dataset contains user details such as Username, DisplayName, RegisterDate, and PerformanceTier but lacks geographical context. This supplemental dataset is designed to facilitate location-based analysis of the Kaggle user base. It defines "active" users as those who have received or given an upvote, created content (forum threads, posts, notebooks, or datasets), or made a competition submission, and also exist within the Meta-Kaggle Users dataset. It is important to note that usernames and display names are deliberately excluded, with only the userid provided to allow for joining with the main Meta-Kaggle file. Some limitations include potential missing data if users did not input details or inconsistencies if information was updated after the data scraping process.

Columns

  • userId: A unique identifier for each Kaggle user. This numerical field represents the Kaggle User ID.
  • country: Indicates the country specified by the Kaggle user. This field shows geographical distribution, with examples including India (24%), United States (17%), and a wide range of other countries.
  • region: Details the specific region within a user's country. This field includes values such as Karnataka (4%) and a notable percentage of null values (14%), indicating users who may not have specified a region.

Distribution

The dataset is provided as a CSV file, UserCountries.csv, with a file size of 11.02 MB. It is structured with 3 columns: userId, country, and region. The userId column contains 347,000 valid records, with values ranging from 368 to 18.9 million and an average of 5.64 million. The country column also has 347,000 valid records, with only 1 missing entry and 240 unique country entries, India being the most frequent. The region column contains 297,000 valid records, with 49,600 (14%) missing entries and 2,073 unique regions, Karnataka being the most frequent.

Usage

This dataset is ideally suited for various analytical and research applications related to the Kaggle community. It can be used for geographical analysis of data science trends, understanding the global distribution of active Kaggle users, and exploring regional engagement patterns. Researchers can leverage it to study the demographics of data science participation, while community managers can use it to tailor regional content and support strategies. It is particularly useful for augmenting existing user data from the official Meta-Kaggle dataset to gain deeper insights into user locations and their associated activities.

Coverage

The dataset provides global geographical coverage, detailing countries and regions as self-reported by Kaggle users. The data has a cut-off date of 01 January 2019. It focuses on Kaggle users deemed "active," which includes those who have engaged with the platform by giving or receiving upvotes, creating various content forms (threads, posts, notebooks, datasets), or submitting to competitions. Data availability notes indicate that some users may not have provided location details, leading to missing data, especially in the 'region' field (14% missing). Furthermore, updates to user profiles post-scraping may introduce inconsistencies. As of 15 February 2024, the scrapers used to collect this data are no longer functional due to changes in the Kaggle UI layout.

License

CC BY-NC-SA 4.0

Who Can Use It

This dataset is beneficial for a range of users:
  • Data Scientists: For conducting geographical analysis of user behaviour and trends within the data science community.
  • Researchers: To study the global distribution, activity, and engagement patterns of online communities.
  • Community Managers: To understand the geographical spread of their user base and develop region-specific engagement strategies.
  • Platform Analysts: To enrich existing user data and gain deeper, location-based insights into platform usage and demographics.

Dataset Name Suggestions

  • Kaggle User Geographical Data
  • Global Kaggle Community Locations
  • Kaggle Users by Country and Region
  • Meta-Kaggle Location Augmentation
  • Kaggle User Geodemographics

Attributes

Original Data Source: Kaggle User Geographical Data

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

26/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format