HappyWhale Classification Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This resource provides detailed taxonomic classification for the cetacean species involved in the HappyWhale competition. The information, curated from Wikipedia, organises species by infraorder, family, genus, and specific species name. It serves as a foundational reference for understanding genetic relationships and predicting fin variations among marine mammals. The data accounts for identified spelling errors in the original source material and clarifies species entries that correspond to a single taxon, ensuring accuracy for analysis.
Columns
The dataset contains eight fields, essential for defining the classification structure:
- specy_id: The identifier used for species within the competition metadata (26 unique values).
- infraorder: The highest level of classification provided; uniformly listed as Cetacea.
- family: The family classification level (e.g., Delphinidae, Balaenopteridae).
- genus: The genus classification level (18 unique values).
- specy: The specific species name (26 unique values).
- wikipedia: A URL link directly to the Wikipedia article defining the species.
- image: A URL link to an image of an individual specimen.
- size: A URL link to an image comparing the size of the species to a human, though two records contain missing values.
Distribution
The information is delivered in a single CSV file,
species.csv, with a file size of 6.99 kB. It features 8 columns and 26 unique entries corresponding to the distinct species fields. All fields are fully populated except for the size comparison image field, which has two missing records.Usage
This classification system is ideal for enhancing machine learning models, particularly those attempting to identify individual whales or dolphins. It can be used to generate species catalogues that include large images and taxonomic hierarchy. It is specifically beneficial for research focusing on fin variation prediction and the structural relationships between marine mammal species.
Coverage
The data focuses exclusively on the taxonomy of 26 different species of marine mammals relevant to the HappyWhale project. The taxonomic scope is deep, covering Infraorder (Cetacea) through Family, Genus, and Specy. Specific taxonomic notes are included, such as the grouping of species like pilot_whale and globis under the Globicephala melas taxon.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Data Scientists: Utilising species classification as a feature engineering step for identification algorithms.
- Marine Biologists: Researchers needing a verified and structured list of cetacean taxonomy.
- Educators: Creating educational materials on marine mammal classification and biology.
- Competition Participants: Individuals in the HappyWhale competition needing supplementary data on taxonomic relationships.
Dataset Name Suggestions
- Cetacean Species Taxonomy List
- HappyWhale Classification Data
- Marine Species Hierarchy
- Whale Taxonomy Reference CSV.
Attributes
Original Data Source: HappyWhale Classification Data
Loading...
