Subjectivity Question Answering Dataset
Education & Learning Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset focuses on question answering (QA) with an emphasis on subjectivity. It aims to explore how extractive QA systems perform when dealing with less factual answers and how modelling subjectivity can enhance their performance. Subjectivity refers to the expression of internal opinions or beliefs that cannot be objectively observed. The dataset comprises approximately 10,000 questions paired with reviews from six diverse domains: books, movies, grocery, electronics, hotels (TripAdvisor), and restaurants. Each question and its corresponding answer span (highlighted within a review) are assigned a subjectivity label by annotators. This allows for studying the intricate interactions between subjectivity and QA performance, as a subjective question may not always lead to a subjective answer.
Columns
- domain: The category or domain of the review, such as hotels or books.
- question: The question, crafted based on a query opinion.
- review: The review that mentions the neighbouring opinion.
- human_ans_spans: The text span labelled by annotators as the answer.
- human_ans_indices: The character-level start and end indices of the highlighted answer span.
- question_subj_level: The subjectivity level of the question, on a 1 to 5 scale (1 being the most subjective).
- ques_subj_score: The subjectivity score of the question, computed using the TextBlob package.
- is_ques_subjective: A boolean subjectivity label for the question, derived from
question_subj_level
(scores below 4 are considered subjective). - answer_subj_level: The subjectivity level of the answer span, on a 1 to 5 scale (5 being the most subjective).
- ans_subj_score: The subjectivity score of the answer span, computed using the TextBlob package.
- is_ans_subjective: A boolean subjectivity label for the answer, derived from
answer_subj_level
(scores below 4 are considered subjective). - nn_mod: The modifier of the neighbouring opinion, which appears in the review.
- nn_asp: The aspect of the neighbouring opinion, which appears in the review.
- query_mod: The modifier of the query opinion around which a question is manually written.
- query_asp: The aspect of the query opinion around which a question is manually written.
- item_id: The unique identifier of the item or business discussed in the review.
- review_id: A unique identifier associated with the review.
- q_review_id: A unique identifier assigned to the question-review pair.
- q_reviews_id: A unique identifier assigned to all question-review pairs sharing a common question.
Distribution
The dataset files are provided in standard CSV format. It contains approximately 10,000 questions over reviews. Specific row or record counts for individual files are not available, but the overall dataset size is substantial for question answering tasks.
Usage
This dataset is ideal for:
- Developing and evaluating extractive Question Answering (QA) systems, particularly those dealing with subjective content.
- Research into modelling subjectivity in text.
- Improving performance in sentiment analysis and wordsense disambiguation.
- Extracting opinions from user-generated reviews, where opinions are modelled as (modifier, aspect) pairs (e.g., "good, hotel" or "terrible, acting").
- Applying Matrix Factorisation techniques to identify implication relationships between expressed opinions, such as "responsive keys" implying "good keyboard".
Coverage
The dataset's geographic scope is global. It encompasses content from six different domains: books, movies, grocery, electronics, TripAdvisor (hotels), and restaurants. The dataset was listed on 17 June 2025. Specific demographic information or a fixed time range for the reviews themselves is not provided.
License
CC0
Who Can Use It
This dataset is valuable for researchers, developers, and data scientists working in Natural Language Processing (NLP) and Artificial Intelligence. It can be used by those aiming to:
- Build more nuanced QA systems capable of handling subjective queries.
- Analyse and understand opinions expressed in online reviews.
- Improve algorithms for sentiment analysis and text comprehension.
- Develop tools for automated opinion extraction and summarisation.
Dataset Name Suggestions
- Subjectivity Question Answering Dataset
- SubjQA for Subjective Review Comprehension
- Opinion-Based QA Dataset
- Review Subjectivity QA Corpus
- Subjective Review Question Answering
Attributes
Original Data Source: NLP SubjQA: Question Answering Dataset