Opendatabay APP

Superheroes NLP Dataset

Entertainment & Media Consumption

Related Searches

Arts and Entertainment

Computer Science

NLP

Comics and Animation

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Superheroes NLP Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Context The aim of this dataset is to make text analytics and NLP even funnier. All of us have dreamed to be like a superhero and save the world, yet we are still on Kaggle figuring out how python works. Then, why not improve our NLP competences by analyzing Superheros' history and powers?
The particularity of this dataset is that it contains categorical and numerical features such as overall_score, intelligence_score, creator, alignment, gender, eye_color but also text features history_text and powers_text. By combining the two, a lot of interesting insights can be gathered!
Content We collected all data from superherodb and cooked for you in a nice and clean tabular format.
The dataset contains 1447 different Superheroes. Each superhero row has:
overall_score - derivated by superherodb from the power stats features. Can you find the relationship? history_text - History of the Superhero (text features) powers_text - Description of Superheros' powers (text features) intelligence_score, strength_score, speed_score, durability_score, power_score and combat_score. (power stats features) "Origin" (full_name, alter_egos, …) "Connections" (occupation, base, teams, …) "Appareance" (gender, type_race, height, weight, eye_color, …) Your turn There are numerous ways you can have fun with this dataset. Now is up to you!
Some ideas to start:
  • Who is the coolest superhero? Given only the two text columns, can you find a formula to find the coolest superhero?
  • Who is the stronger superhero of all time? By combining text features with the power stats features, can you try to say who is the most strong superhero of all time?
  • Text classification: can you predict who is the Superhero creator just by using the text columns? (yes, you can!) Moreover, can you find a good way to cluster data in an unsupervised manner?
  • Who is the top 10 Woman Superheroes? 23% of the Superheroes are woman, can you spot who is the top 10?
Acknowledgements The following Github repository contains the code used to scrape this Dataset.
Original Data Source: Superheroes NLP Dataset

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

08/06/2025

REGION

GLOBAL

UDQSSQUALITY

5 / 5

VERSION

1.0

Free