Dark Mode

Home

Data Categories

AI & ML Data

Global Name Gender Data

FREE DATASET LIBRARY

Verified Data Provider

£0

Global Name Gender Data

Data Science and Analytics

Tags and Keywords

Gender

Name

Probability

Demographics

Babies

Trusted By

Global Name Gender Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset offers a mapping of first names to genders, providing both the raw counts and calculated probabilities for each gender association. It integrates data on male and female baby names from official government sources across several countries, including the United States, the United Kingdom (specifically England and Wales), Canada (British Columbia), and Australia. The primary purpose is to provide a reliable basis for gender attribution based on first names.

Columns

Name: A string field that lists first or given names. This column features 133,910 unique name entries and is entirely valid across all 147,000 records. 'James' is identified as the most frequently occurring name.
Gender: This is a categorical string field that indicates the assigned gender, either 'M' for male or 'F' for female. The data shows a distribution where 61% of names are associated with 'F' and 39% with 'M'. There are only two unique values in this column, and it is 100% valid.
Count: An integer field representing the total occurrences of a specific name-gender combination. The values range significantly, from 1 up to approximately 5.3 million. The average count is around 2,480.
Probability: A float field indicating the calculated probability of a given name being associated with a particular gender. The probabilities predominantly range from 0.00 to 0.01.

Distribution

The dataset is provided as a CSV file, specifically named name_gender_dataset.csv, and has a file size of 3.77 MB. It is structured with 4 distinct columns and contains approximately 147,000 records, with all entries confirmed as valid.

Usage

This dataset is highly suitable for a range of analytical and application development purposes. It can be used for text analysis, classification tasks, and clustering initiatives. Specific use cases include building predictive models for gender identification based on names, conducting in-depth demographic research, supporting market segmentation efforts, and enriching various natural language processing applications.

Coverage

The data spans several key geographic regions and timeframes:

United States: Information is sourced from Baby Names from Social Security Card Applications, covering the period from 1880 to 2019.
United Kingdom: Data from Baby names in England and Wales Statistical bulletins, covering 2011 to 2018.
Canada: British Columbia's 100 Years of Popular Baby names, from 1918 to 2018.
Australia: Popular Baby Names from the Attorney-General's Department, covering 1944 to 2019. The dataset's scope is focused exclusively on first/given names of male and female babies born within these periods.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

This dataset is an ideal resource for:

Data Scientists and Machine Learning Engineers: For developing and refining models that predict gender based on textual name data.
Researchers and Academics: Engaging in studies related to demographics, social trends, and linguistic patterns concerning names.
Marketers and Business Analysts: For segmenting audiences and personalising communication strategies by inferring gender from names.
Software Developers: For integrating name-gender attribution functionalities into diverse applications and services.

Dataset Name Suggestions

Global Name Gender Data
First Name Gender Probabilities
Gender By Name Attributes
Multinational Baby Names Gender
Name Gender Classifier Data

Attributes

Original Data Source: Global Name Gender Data

Listing Stats

VIEWS

DOWNLOADS

LISTED

13/08/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format

Recommended Datasets

Loading recommendations...