Wikia Comic Character Archive
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Data features detailed information on comic book characters sourced from Marvel Wikia and DC Wikia. The material was initially collected to analyze demographic trends and patterns in character creation within the comic industry, particularly concerning gender, alignment, and physical attributes. It includes key character identifiers, physical attributes like eye and hair color, status (alive or deceased), and first appearance dates. The character appearance counts reflect statistics recorded as of September 2, 2014.
Columns
The dataset is split into two files, with
dc-wikia-data.csv containing 13 columns across 6,896 records:- page_id: The unique identifier for that character's page within the wikia. This field is 100% valid, with a mean value of 147,000.
- name: The name of the character. This field is 100% valid, with 6,896 unique values.
- urlslug: The unique URL component for the character within the wikia. This field is 100% valid.
- ID: The identity status of the character (e.g., Public Identity, Secret Identity). This field is 71% valid, with Public Identity accounting for 36% of entries.
- ALIGN: Indicates the character's moral alignment (Good, Bad, or Neutral). This field is 91% valid, with Bad Characters and Good Characters making up the majority of entries (42% and 41%, respectively).
- EYE: The eye color of the character. This field is 47% valid, missing 53% of records, with Blue Eyes being the most common color (16%).
- HAIR: The hair color of the character. This field is 67% valid, with Black Hair being the most common color (23%).
- SEX: The character's sex (e.g., Male or Female). Male Characters account for 69% of the data. This field is 98% valid.
- GSM: Indicates if the character belongs to a gender or sexual minority. This field is highly incomplete, with 99% of data missing.
- ALIVE: Status indicating if the character is Living (75%) or Deceased (25%). This field is nearly 100% valid.
- APPEARANCES: The total count of the character's appearances in comic books. Values range up to 3,093. This field is 95% valid.
- FIRST APPEARANCE: The month and year of the character's first appearance in a comic book. This field is 99% valid.
- YEAR: The year of the first appearance, ranging from 1935 to 2013. This field is 99% valid.
Distribution
The material is organized into two primary files:
dc-wikia-data.csv (1.11 MB) and marvel-wikia-data.csv. The DC file contains 6,896 records. Identifying information such as character name and unique ID are 100% valid. However, fields related to physical traits (EYE) and gender/sexual minority status (GSM) show substantial levels of missing data. The data reflects a snapshot collected in 2014. The expected update frequency is Annually.Usage
This resource is suitable for analyzing character demographics, investigating alignment statistics (Good versus Bad), and studying the chronological evolution of character characteristics over time using the YEAR field. It enables comparative analysis between the two major comic universes, Marvel and DC.
Coverage
The scope covers characters derived from the Marvel and DC comic universes. The temporal span of character introductions ranges from 1935 up to 2013. The content includes biographical details, alignment, physical attributes, and appearance counts.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
The dataset is intended for researchers studying media representation, comic book historians, and data scientists interested in text and demographic analysis of fictional content.
Dataset Name Suggestions
- Marvel and DC Character Demographics
- Wikia Comic Character Archive
- Comic Book Character Attributes
Attributes
Original Data Source: Wikia Comic Character Archive
Loading...
