Opendatabay APP

Titanic Survival Prediction Dataset

LLM Fine-Tuning Data

Tags and Keywords

Titanic

Survival

Prediction

Classification

Passengers

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Titanic Survival Prediction Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset pertains to the sinking of the RMS Titanic, one of the most infamous shipwrecks in history. On 15 April 1912, during its maiden voyage, the Titanic struck an iceberg and sank, leading to the deaths of 1,502 out of 2,224 passengers and crew due to an insufficient number of lifeboats. While luck played a role, certain groups of people demonstrated a higher likelihood of survival. The primary goal for users of this dataset is to construct a predictive model that identifies the types of individuals who were more likely to survive, utilising passenger details such as name, age, gender, and socio-economic class. Additionally, the objective involves understanding and preparing the dataset, building robust classification models, fine-tuning their hyperparameters, and comparing various algorithm evaluation metrics.

Columns

The dataset contains the following columns:
  • PassengerId: A unique identifier for each passenger.
  • Survived: Indicates whether the passenger survived (1) or not (0).
  • Pclass: The passenger's ticket class (1st, 2nd, or 3rd class).
  • Name: The full name of the passenger.
  • Sex: The gender of the passenger (male or female).
  • Age: The age of the passenger in years.
  • SibSp: The number of siblings or spouses aboard the Titanic with the passenger.
  • Parch: The number of parents or children aboard the Titanic with the passenger.
  • Ticket: The ticket number.
  • Fare: The passenger's fare.
  • Cabin: The cabin number.
  • Embarked: The port from which the passenger embarked (Cherbourg, Queenstown, or Southampton).

Distribution

The dataset is provided as a CSV file named Titanic-Dataset.csv, with a size of 61.19 kB. It features 12 columns. Most columns contain 891 valid records, representing the total number of passengers. However, the 'Age' column has 177 missing values (20%), 'Cabin' has 687 missing values (77%), and 'Embarked' has 2 missing values.

Usage

This dataset is ideally suited for:
  • Developing classification models to predict passenger survival.
  • Conducting data clean-up and exploratory data analysis.
  • Experimenting with hyperparameter tuning for machine learning algorithms.
  • Comparing the performance of various classification algorithms to determine the most effective predictive approach.

Coverage

The dataset covers passengers and crew involved in the RMS Titanic's maiden voyage on 15 April 1912. The demographic scope includes individuals across different ages, genders, socio-economic classes, and family structures. Geographic relevance is tied to the ports of embarkation: Cherbourg, Queenstown, and Southampton. It should be noted that there are significant gaps in data availability for passenger age (20% missing) and cabin numbers (77% missing).

License

This dataset is under a CC0: Public Domain license.

Who Can Use It

This dataset is highly valuable for:
  • Machine Learning Engineers: To build, train, and evaluate predictive models.
  • Data Scientists: For in-depth statistical analysis and feature engineering.
  • Students and Beginners in Data Science: It is classified as a "Beginner" dataset, making it an excellent resource for learning classification tasks and data pre-processing.
  • Researchers: Interested in historical data analysis and factors influencing survival in disaster scenarios.

Dataset Name Suggestions

  • Titanic Survival Prediction Dataset
  • Titanic Passenger Survival Data
  • RMS Titanic Survival Analytics
  • Historical Titanic Survival Factors

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

08/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format