Gendered Naming Patterns Dataset
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
An exploration of first name trends for babies across the United States, United Kingdom, Canada, and Australia, this data combines raw counts of names for males and females from various government sources. It calculates the probability of a name being associated with a specific gender based on the aggregated counts, offering insights into naming patterns over different periods in these countries. The data is useful for tasks such as classification and clustering within the social sciences.
Columns
- Name: The first or given name (String).
- Gender: The gender associated with the name, indicated as 'M' for male or 'F' for female (Category/String).
- Count: The total number of occurrences for the name (Integer).
- Probability: The calculated probability of the name belonging to the specified gender (Float).
Distribution
The data is provided in a tabular CSV format (
data.csv
) with a file size of 3.77 MB. It contains 147,270 instances or rows, and consists of 4 columns or features. There are no missing values in any of the columns.Usage
Ideal applications for this data include social science research, onomastic studies (the study of names), and developing machine learning models for gender classification based on names. It can also be used for data cleaning tasks, trend analysis of popular names over time, and clustering names based on their characteristics.
Coverage
- Geographic Scope: The dataset includes data from four countries:
- United States: 1880 to 2019
- United Kingdom (England and Wales): 2011 to 2018
- Canada (British Columbia): 1918 to 2018
- Australia: 1944 to 2019
- Time Range: The data spans various time periods, with the earliest records from 1880 and the most recent from 2019, depending on the country.
- Demographic Scope: The data pertains to the first names of male and female babies.
License
Creative Commons Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Social Scientists and Researchers: To study cultural trends, naming conventions, and demographic patterns.
- Data Scientists and Analysts: For building predictive models for gender classification, performing cluster analysis, and practising data preparation techniques.
- Marketing Professionals: To understand name popularity for personalised marketing campaigns or product development.
- Genealogists and Historians: To analyse historical naming trends within specific geographic regions.
Dataset Name Suggestions
- Cross-Country Naming Trends and Gender Probability
- Historical Baby Names: US, UK, Canada & Australia
- Gendered Naming Patterns Dataset
- First Name Gender Statistics (1880-2019)
- International Baby Name Counts and Probabilities
Attributes
Original Data Source: Gendered Naming Patterns Dataset