Opendatabay APP

BirdCLEF Unified Training Metadata 2021-2023

Data Science and Analytics

Tags and Keywords

Birdclef

Bioacoustics

Ornithology

Audio

Metadata

Trusted By
Trusted by company1Trusted by company2Trusted by company3
BirdCLEF Unified Training Metadata 2021-2023 Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Unified training metadata aggregating labels from the Kaggle BirdCLEF 2021, 2022, and 2023 competitions provides a consolidated resource for avian bioacoustics research. This collection facilitates the development of machine learning models for identifying bird species by sound, containing cleaned training labels and file paths. It addresses data redundancy by excluding 6,686 duplicates that overlapped between competitions and removing ambiguous samples found in multiple class folders.

Columns

  • primary_label: The code representing the primary bird species identified in the recording (e.g., 'houspa').
  • secondary_labels: A list of codes for any background bird species audible in the recording.
  • type: The classification of the sound, such as 'song' or 'call'.
  • filename: The unique identifier for the audio file (e.g., XC316684.ogg).
  • filepath: The relative location path to the audio file within the directory structure.

Distribution

  • Format: CSV (train_21_22_23.csv)
  • Size: 9.64 MB
  • Rows: Approximately 87,900 records
  • Structure: 5 columns

Usage

Ideal for training and testing audio classification models, specifically in the domain of ornithology and wildlife monitoring. The dataset supports applications in:
  • Bioacoustics research
  • Automated bird identification systems
  • Educational tools for biology
  • Environmental monitoring analysis

Coverage

  • Geographic/Taxonomic Scope: Covers 768 unique primary bird species labels.
  • Time Range: Aggregates data from the 2021, 2022, and 2023 competition cycles.
  • Demographic/Data Notes: 'houspa' is the most common primary label (1%). The 'type' field includes songs (31%) and calls (24%).

License

CC BY-NC-SA 4.0

Who Can Use It

  • Data Scientists and Machine Learning Engineers
  • Ornithologists and Biologists
  • Conservationists
  • Audio Signal Processing Researchers
  • Educators in Life Sciences

Dataset Name Suggestions

  • BirdCLEF Unified Training Metadata 2021-2023
  • Consolidated Avian Bioacoustics Labels
  • Kaggle BirdCLEF 3-Year Aggregate Metadata
  • Cleaned Bird Sound Classification Index

Attributes

Listing Stats

VIEWS

2

DOWNLOADS

0

LISTED

07/12/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Loading...

Free

Download Dataset in ZIP Format