Opendatabay APP

WinoBias Resolution Dataset

Social Media and Networking

Tags and Keywords

Social

Science

Data

Visualization

Nlp

Trusted By
Trusted by company1Trusted by company2Trusted by company3
WinoBias Resolution Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a valuable resource designed for coreference resolution, specifically focusing on addressing gender bias. It contains Winograd-schema style sentences where entities are referred to by occupations such as the nurse, the doctor, or the carpenter. The main objective is to facilitate the accurate and effective resolution of coreference in these sentences, particularly when it comes to gender-related biases. By examining the relationships between words and their referents in context, coreference resolution models can uncover and address instances where gender stereotypes might be perpetuated. Each entry includes attributes like part-of-speech tags, parse bits, word senses, speaker information, named entity recognition tags, verbal predicates, lemma forms of predicates, and coreference clusters.

Columns

The dataset includes several CSV files, and while some columns may repeat across files with shared information, the key columns and their descriptions are as follows:
  • part_number: The number of the sentence part in the dataset (Integer).
  • word_number: The position of the word in the sentence (Integer).
  • tokens: The individual words in each sentence (Text).
  • pos_tags: Part-of-speech tags associated with each token (Text).
  • parse_bit: Syntactic structure information for each token (Text).
  • predicate_lemma: The lemma of the verb used in the sentence (Text).
  • word_sense: The sense of each word in context (Text).
  • speaker: The speaker in each sentence (Text).
  • ner_tags: Named entity recognition tags that identify specific types like organisations or locations (Text).
  • verbal_predicates: Verbal predicates in sentences identified by their corresponding verbs (Text).
  • coreference_clusters: Groups of words that refer to the same entity (Text).
  • document_id: Document identifier (specific to type1_anti_test.csv).
  • predicate_framenet_id: Predicate FrameNet ID (specific to type1_anti_test.csv).

Distribution

The dataset is provided in CSV file format. It consists of Winograd-schema style sentences. Specific numbers for rows or records per file are not available in the provided details. The dataset includes various files such as type2_anti_validation.csv, type2_pro_test.csv, type1_pro_validation.csv, and type1_anti_test.csv, each serving a particular evaluation or testing purpose related to gender-biased sentences.

Usage

This dataset is ideal for several applications and use cases:
  • Bias Detection: Evaluate and measure the presence of gender bias in coreference resolution models.
  • Model Improvement: Enhance existing coreference resolution models by training them on gender-biased examples.
  • Algorithm Development: Develop new algorithms or techniques specifically for addressing gender bias in coreference resolution.
  • Evaluation: Assess the performance of coreference resolution models, gaining insights into potential areas of bias within algorithms.

Coverage

The dataset's scope is global. While it was listed on 21/06/2025, no specific time range for the data collection itself is provided. Its demographic scope focuses on gender bias as it relates to occupations, using examples like "the nurse" or "the doctor" to explore how gender stereotypes might be perpetuated or addressed in language processing.

License

CC0

Who Can Use It

This dataset is primarily intended for researchers, developers, and evaluators who are working on improving coreference resolution systems. Users can leverage this data to:
  • Analyse data and perform necessary pre-processing steps based on specific research or analysis goals, such as gender bias detection.
  • Develop or evaluate existing coreference resolution models.
  • Gain a better understanding of gender biases present in coreference resolution and find ways to mitigate such biases.

Dataset Name Suggestions

  • Gender-Biased Coreference Data
  • Occupation Coreference Bias Dataset
  • NLP Gender Fairness Data
  • WinoBias Resolution Dataset

Attributes

Original Data Source: WinoBias Coreference Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

21/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in ZIP Format