Opendatabay APP

Superherodb Analytics Set

Entertainment & Media Consumption

Tags and Keywords

Arts

Computer

Nlp

Comics

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Superherodb Analytics Set Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset aims to make text analytics and Natural Language Processing (NLP) more engaging by providing a collection of superhero profiles. It enables users to enhance their NLP skills through the analysis of superhero history and powers. The data was collected from Superherodb and is presented in a clean, tabular format. A key feature of this dataset is its blend of categorical and numerical features, such as overall score, intelligence score, creator, alignment, gender, and eye colour, alongside rich text features including superhero history and power descriptions. This combination allows for the extraction of many interesting insights.

Columns

  • name: The superhero's common name.
  • real_name: The superhero's real identity.
  • full_name: The superhero's full name.
  • overall_score: An overall score derived from the superhero's power statistics.
  • history_text: A textual description of the superhero's background and history.
  • powers_text: A textual description of the superhero's unique abilities and powers.
  • intelligence_score: A numerical rating of the superhero's intelligence.
  • strength_score: A numerical rating of the superhero's strength.
  • speed_score: A numerical rating of the superhero's speed.
  • durability_score: A numerical rating of the superhero's durability.
  • power_score: A numerical rating of the superhero's power level.
  • combat_score: A numerical rating of the superhero's combat prowess.
  • creator: The creator of the superhero.
  • alignment: The superhero's moral alignment (e.g., good, bad, neutral).
  • gender: The gender of the superhero.
  • eye_colour: The eye colour of the superhero.
  • alter_egos: Other identities used by the superhero.
  • occupation: The superhero's profession or role.
  • base: The superhero's operational base.
  • teams: The teams or groups the superhero belongs to.
  • type_race: The superhero's race or species.
  • height: The height of the superhero.
  • weight: The weight of the superhero.

Distribution

The dataset is provided in a tabular format, typically a CSV file. It contains 1447 distinct superhero entries. Each row represents a single superhero, encompassing various features including detailed textual descriptions and quantifiable scores for attributes like intelligence and strength. Some columns, such as 'name', 'real_name', 'full_name', 'history_text', and 'powers_text', may contain null or unknown values.

Usage

This dataset is ideal for a variety of applications, including:
  • Conducting text analytics and NLP experiments.
  • Developing formulas to identify character traits, such as determining the "coolest" or "strongest" superhero by combining text features with power statistics.
  • Building text classification models to predict a superhero's creator using only their history and powers descriptions.
  • Exploring unsupervised learning techniques for clustering superhero data.
  • Analysing character demographics, such as identifying the top-ranked female superheroes within the collection.

Coverage

The dataset's scope is global, encompassing a diverse range of 1447 different superheroes. While it includes demographic details like gender, with 23% of the superheroes being women, specific time ranges for the data's origin or collection are not provided in the sources. Information for certain attributes, such as history or powers, can be unknown for some entries.

License

CCO

Who Can Use It

This dataset is suitable for:
  • Data scientists and NLP practitioners: For refining text analytics and machine learning competencies.
  • Researchers: Studying character development, narrative structures, and fictional demographics.
  • Students: As an engaging resource for learning about data analysis, text processing, and machine learning principles.
  • Content creators and developers: Building applications, quizzes, or fan-based analyses related to superheroes.

Dataset Name Suggestions

  • Superhero Power & Profile Data
  • Comic Character NLP
  • Superherodb Analytics Set
  • Heroic Text Dataset
  • NLP Hero Compendium

Attributes

Original Data Source: Superheroes NLP Dataset

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

08/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free