Medium User Feedback Classification Data
Product Reviews & Feedback
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Application reviews from the Google Play Store for the Medium platform are provided for text classification tasks. The data was collected using the google_play_scraper library and filtered to include English language reviews. A portion of the data was manually labeled and subsequently used to fine-tune a roBERTa model for category and sentiment prediction. This resource is highly valuable for training Natural Language Processing (NLP) models focused on user feedback analysis and categorization.
Columns
reviewId
: The unique identifier for the specific review.content
: The full text of the user's review.score
: The numerical rating given to the application, typically ranging from 1 to 5, with a mean score around 4.35.thumbsUpCount
: The total number of upvotes or likes received by the review.reviewCreatedVersion
: The version of the application running when the review was submitted (approximately 10% missing).at
: The date and time the review was posted.replyContent
: The text of any reply provided by the developer (approximately 96% missing).repliedAt
: The date the developer's reply was posted (if applicable).predicted_category
: The predicted classification category, which includes USER_EXPERIENCE, CONTENT, INTERFACE, or SUBSCRIPTION.sentiment
: The predicted sentiment using an NLP model, classified as POSITIVE, NEUTRAL, or NEGATIVE.appVersion
: Another field tracking the application version at the time of the review (approximately 78% missing).
Distribution
The data is typically available in a CSV file format, totaling approximately 10.48 MB. It contains 11 distinct columns. There are 62,633 records in total. Fields related to application version or developer replies have noticeable percentages of missing values, but core data points like review text and predicted categories are 100% valid.
Usage
This dataset is ideal for training and evaluating text classification models, particularly those focused on user feedback and mobile application reviews. It can be used for sentiment analysis modelling, exploratory data analysis on user ratings, and creating systems to automatically route feedback based on defined categories (Subscription, Content, Interface, or User Experience). It is also suitable for advanced NLP research.
Coverage
The data is collected from the Google Play Store and is filtered for English language reviews. The records span a time range from 24 June 2015, extending through the collection period to 24 June 2025 (maximum date noted in the date field statistics). The geographic scope is general, reflecting users of the Medium application on Google Play globally who submit reviews in English.
License
CC0: Public Domain
Who Can Use It
Data scientists and machine learning engineers interested in Natural Language Processing (NLP) or text classification projects will find this valuable. Academic researchers studying consumer feedback mechanisms or app store sentiment dynamics can utilise it. Product managers can gain quantitative insights into specific user pain points across the platform.
Dataset Name Suggestions
- Google Play Medium App Text Reviews
- Medium User Feedback Classification Data
- NLP Dataset: Medium App Store Reviews
- Play Store Review Sentiment and Category Data
Attributes
Original Data Source: Medium User Feedback Classification Data