Opendatabay APP

Historical Figures Mortality Dataset

NLP / Natural Language Processing

Tags and Keywords

Mortality

Demographics

History

Biographical

Occupatio

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Historical Figures Mortality Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides structured information on the life, work, and death of over 1.22 million notable deceased individuals [1]. It is the largest dataset of its kind and was developed using a five-step method to infer birth and death years, binary gender, and occupation from community-submitted data across various language versions of Wikipedia [1]. The technical approach employed text mining to clean historical data and minimise missing values [1]. This resource offers new insights into the demographics of mortality in relation to gender and profession throughout history [1].

Columns

  • Id: A unique Wikidata identifier for each individual [2].
  • Name: The full name of the person [2].
  • Short description: A brief description of the individual. Approximately 6% of entries are missing this information [2].
  • Gender: The gender or sex of the individual. Male is the most common gender, representing 80% of entries, with around 11% of gender data missing [3].
  • Country: The country or historical region associated with the individual. Approximately 27% of this data is missing, with the United States of America being the most common country, representing 12% of entries [3].
  • Occupation: The occupation title of the individual. Around 17% of occupation data is missing, and Artist is the most frequent occupation, accounting for 23% of entries [3].
  • Birth year: The year of birth, ranging from -2700 to 2016, with a mean of 1840 [4-6]. All entries have birth year data [5].
  • Death year: The year of death, ranging from -2659 to 2021, with a mean of 1910 [6-8]. Almost all entries include death year data, with only one missing value [8].
  • Manner of death: Describes how the individual died (e.g., suicide, natural causes, or capital punishment). This field is largely incomplete, with 96% of entries missing this information. Natural causes is the most common recorded manner of death, representing 3% of entries [8].
  • Age of death: The age at which the individual died, ranging from 0 to 169 years, with a mean age of 69.3 years [8-10]. Nearly all entries have age of death data, with only one missing value [10].

Distribution

The dataset is provided as a CSV file, specifically AgeDataset-V1.csv, and has a file size of 116.78 MB [2]. It contains 10 columns and includes data on 1.22 million people [2].

Usage

This dataset is ideal for:
  • Conducting demographic research, particularly on historical mortality trends [1].
  • Analysing the relationship between gender, profession, and life span [1].
  • Studying the life cycles and work patterns of notable individuals throughout history [1].
  • Social science and computer science research applications [11].

Coverage

The dataset covers individuals from more than 300 contemporary or historical regions globally [1]. The time range for birth years spans from -2700 to 2016, and for death years, from -2659 to 2021 [4-8]. Demographically, it includes 107,000 females and 124 non-binary people, alongside a significant number of male individuals [1, 3]. It also features 90,000 researchers [1]. It is important to note the significant amount of missing data for 'Manner of death' (96%), 'Country' (27%), 'Occupation' (17%), and 'Gender' (11%) [2, 3, 8].

License

CC BY-NC-SA 4.0

Who Can Use It

  • Academic researchers in history, sociology, demography, and computer science for studying historical patterns and social phenomena [1, 11].
  • Data scientists and analysts interested in large-scale historical datasets and text mining applications [1, 11].
  • Anyone with an interest in biographical data or the lives of notable historical figures.

Dataset Name Suggestions

  • Historical Figures Mortality Dataset
  • Notable Deceased Persons Life & Work Data
  • Global Historical Demographics Dataset
  • Prominent Individuals Life History Dataset
  • Wikipedia Notable Lives Dataset

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

14/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format