Oncology Patient Experience Data
Health Information Systems & Technology
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a unique collection of patient comments and experiences related to cancer treatments, focusing on sentiment analysis. It aims to offer insights into patient feedback, helping to understand concerns and preferences. The dataset is suitable for developing machine learning models to classify sentiments, identify trends, and explore recurring themes in patient discussions about cancer therapies [1].
Columns
- cancer: This column identifies the specific type of cancer or acts as a general cancer indicator associated with the patient comment. For example, it may specify 'breast cancer' or be marked as 'none' if not specified [2-15].
- Comment: Contains the original, raw patient comment or experience regarding their cancer treatment [2-10, 12-15].
- tokens: Represents the tokenised version of the patient comment, where the text has been broken down into individual words or linguistic units [2].
- stopwordremove_tokens: Provides the tokenised text after common stopwords (e.g., 'the', 'is', 'and') have been removed, aiding in focused text analysis [2].
- lemmatized_text: Features the lemmatised version of the text, meaning words have been reduced to their base or dictionary form (lemma), which is useful for linguistic analysis [2].
- none: This column appears to be a placeholder or a field that was not populated with specific information in the provided excerpts [2-15].
Distribution
The dataset comprises 14,419 patient comments. While the exact file format is not specified in the "Sentiments in Oncology" source, data files are typically provided in CSV format [1, 16]. The structure includes multiple linguistic processing stages for each comment, such as tokenisation and lemmatisation. Specific details on records per file are not available.
Usage
This dataset is highly versatile and can be used for several key applications:
- Developing machine learning models: Useful for classifying patient sentiments and identifying trends in feedback on cancer treatments [1].
- Gaining insights for healthcare professionals: Provides a valuable resource for understanding patient concerns and preferences, thereby enhancing patient care and support [1].
- Informing pharmaceutical companies: Enables the collection of real-world feedback on treatments, which can assist in drug development and improving patient outcomes [1].
- Text mining and natural language processing (NLP): Ideal for exploring common themes within patient comments [1].
Coverage
The dataset's geographic scope is global [17]. It includes patient comments collected from various multilingual websites in English, French, and German [1]. The time range covered by the data spans from 2005 to 2024 [1]. Demographically, the data reflects experiences from a broad spectrum of cancer patients.
License
CC-BY
Who Can Use It
- Data scientists and machine learning engineers: For building sentiment classification models and conducting text analytics on patient feedback [1].
- Healthcare researchers and professionals: To better understand the patient experience, identify common concerns, and improve support systems [1].
- Pharmaceutical companies: For real-world evidence gathering, product development insights, and understanding treatment perceptions [1].
- Academics and students: For research projects in medical informatics, NLP, and public health [1].
Dataset Name Suggestions
- Cancer Treatment Patient Sentiments
- Oncology Patient Experience Data
- Global Cancer Treatment Feedback
- Afinitor Treatment Patient Comments
- Multilingual Cancer Sentiment Analysis Data
Attributes
Original Data Source: Sentiments in Oncology