Superheroes NLP Dataset
Entertainment & Media Consumption
Related Searches
Trusted By




"No reviews yet"
Free
About
Context
The aim of this dataset is to make text analytics and NLP even funnier. All of us have dreamed to be like a superhero and save the world, yet we are still on Kaggle figuring out how python works. Then, why not improve our NLP competences by analyzing Superheros' history and powers?
The particularity of this dataset is that it contains categorical and numerical features such as overall_score, intelligence_score, creator, alignment, gender, eye_color but also text features history_text and powers_text. By combining the two, a lot of interesting insights can be gathered!
Content
We collected all data from superherodb and cooked for you in a nice and clean tabular format.
The dataset contains 1447 different Superheroes. Each superhero row has:
overall_score - derivated by superherodb from the power stats features. Can you find the relationship?
history_text - History of the Superhero (text features)
powers_text - Description of Superheros' powers (text features)
intelligence_score, strength_score, speed_score, durability_score, power_score and combat_score. (power stats features)
"Origin" (full_name, alter_egos, …)
"Connections" (occupation, base, teams, …)
"Appareance" (gender, type_race, height, weight, eye_color, …)
Your turn
There are numerous ways you can have fun with this dataset. Now is up to you!
Some ideas to start:
-
Who is the coolest superhero? Given only the two text columns, can you find a formula to find the coolest superhero?
-
Who is the stronger superhero of all time? By combining text features with the power stats features, can you try to say who is the most strong superhero of all time?
-
Text classification: can you predict who is the Superhero creator just by using the text columns? (yes, you can!) Moreover, can you find a good way to cluster data in an unsupervised manner?
-
Who is the top 10 Woman Superheroes? 23% of the Superheroes are woman, can you spot who is the top 10?
Acknowledgements
The following Github repository contains the code used to scrape this Dataset.
Original Data Source: Superheroes NLP Dataset