Opendatabay APP

Subjectivity Question Answering Dataset

Education & Learning Analytics

Tags and Keywords

Education

Nlp

Psychology

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Subjectivity Question Answering Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset focuses on question answering (QA) with an emphasis on subjectivity. It aims to explore how extractive QA systems perform when dealing with less factual answers and how modelling subjectivity can enhance their performance. Subjectivity refers to the expression of internal opinions or beliefs that cannot be objectively observed. The dataset comprises approximately 10,000 questions paired with reviews from six diverse domains: books, movies, grocery, electronics, hotels (TripAdvisor), and restaurants. Each question and its corresponding answer span (highlighted within a review) are assigned a subjectivity label by annotators. This allows for studying the intricate interactions between subjectivity and QA performance, as a subjective question may not always lead to a subjective answer.

Columns

  • domain: The category or domain of the review, such as hotels or books.
  • question: The question, crafted based on a query opinion.
  • review: The review that mentions the neighbouring opinion.
  • human_ans_spans: The text span labelled by annotators as the answer.
  • human_ans_indices: The character-level start and end indices of the highlighted answer span.
  • question_subj_level: The subjectivity level of the question, on a 1 to 5 scale (1 being the most subjective).
  • ques_subj_score: The subjectivity score of the question, computed using the TextBlob package.
  • is_ques_subjective: A boolean subjectivity label for the question, derived from question_subj_level (scores below 4 are considered subjective).
  • answer_subj_level: The subjectivity level of the answer span, on a 1 to 5 scale (5 being the most subjective).
  • ans_subj_score: The subjectivity score of the answer span, computed using the TextBlob package.
  • is_ans_subjective: A boolean subjectivity label for the answer, derived from answer_subj_level (scores below 4 are considered subjective).
  • nn_mod: The modifier of the neighbouring opinion, which appears in the review.
  • nn_asp: The aspect of the neighbouring opinion, which appears in the review.
  • query_mod: The modifier of the query opinion around which a question is manually written.
  • query_asp: The aspect of the query opinion around which a question is manually written.
  • item_id: The unique identifier of the item or business discussed in the review.
  • review_id: A unique identifier associated with the review.
  • q_review_id: A unique identifier assigned to the question-review pair.
  • q_reviews_id: A unique identifier assigned to all question-review pairs sharing a common question.

Distribution

The dataset files are provided in standard CSV format. It contains approximately 10,000 questions over reviews. Specific row or record counts for individual files are not available, but the overall dataset size is substantial for question answering tasks.

Usage

This dataset is ideal for:
  • Developing and evaluating extractive Question Answering (QA) systems, particularly those dealing with subjective content.
  • Research into modelling subjectivity in text.
  • Improving performance in sentiment analysis and wordsense disambiguation.
  • Extracting opinions from user-generated reviews, where opinions are modelled as (modifier, aspect) pairs (e.g., "good, hotel" or "terrible, acting").
  • Applying Matrix Factorisation techniques to identify implication relationships between expressed opinions, such as "responsive keys" implying "good keyboard".

Coverage

The dataset's geographic scope is global. It encompasses content from six different domains: books, movies, grocery, electronics, TripAdvisor (hotels), and restaurants. The dataset was listed on 17 June 2025. Specific demographic information or a fixed time range for the reviews themselves is not provided.

License

CC0

Who Can Use It

This dataset is valuable for researchers, developers, and data scientists working in Natural Language Processing (NLP) and Artificial Intelligence. It can be used by those aiming to:
  • Build more nuanced QA systems capable of handling subjective queries.
  • Analyse and understand opinions expressed in online reviews.
  • Improve algorithms for sentiment analysis and text comprehension.
  • Develop tools for automated opinion extraction and summarisation.

Dataset Name Suggestions

  • Subjectivity Question Answering Dataset
  • SubjQA for Subjective Review Comprehension
  • Opinion-Based QA Dataset
  • Review Subjectivity QA Corpus
  • Subjective Review Question Answering

Attributes

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

17/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format