NoReC Norwegian Reviews Dataset
Reviews & Ratings
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset, known as the Norwegian Review Corpus, is a modified version of the NoReC dataset designed for document-level sentiment analysis. Its primary purpose is to provide an accessible collection of Norwegian reviews and ratings. The original data points remain unchanged, with the modification primarily being its compilation into a CSV file for ease of use, ensuring simplicity and authenticity for various analytical tasks.
Columns
The dataset includes the following columns:
- id: A unique identifier for each review.
- title: The title associated with the review.
- excerpt: A brief excerpt or summary of the review content.
- text: The full text of the review.
- rating: The numerical rating assigned to the review, serving as the label for sentiment analysis. Values range from 1 to 6, and can be processed to create binary or multiclass datasets.
- split: A suggested split identifier (e.g., 'train', 'dev', 'test') for dividing the dataset into training, validation, and test sets.
- authors: The author(s) of the review.
- language: The dialect of the review, primarily Norwegian Bokmål (nb) at 99%, with a smaller portion in Nynorsk (nn) at 1%.
- category: The category of the review, with prominent categories including 'screen' (36%) and 'music' (34%).
- source: The original source publication or platform of the review, such as 'vg' (30%) and 'sa' (16%).
Distribution
The dataset is provided in a CSV file format. It contains approximately 43,437 records, each representing a review. The structure includes a designated 'split' column, which can be used to divide the dataset into training, development, and test sets, typically with an 80% train, 10% dev, and 10% other distribution.
Usage
This dataset is ideally suited for:
- Training and validating document-level sentiment analysis models in the Norwegian language.
- Developing and evaluating machine learning models for text classification based on review ratings.
- Research and development in Natural Language Processing (NLP), specifically focusing on Norwegian text.
- Creating applications that analyse user feedback and ratings.
Coverage
The dataset is focused on Norwegian-language reviews, primarily in the Bokmål dialect. Geographically, its scope is considered global, making it suitable for international research and applications involving Norwegian content. While the dataset was listed on 24/06/2025, the specific time range for the actual review content is not explicitly provided.
License
CC-BY-NC
Who Can Use It
Intended users include:
- Data scientists and machine learning engineers working on NLP tasks and sentiment analysis in Norwegian.
- Researchers and academics interested in computational linguistics, especially for low-resource languages or specific regional dialects.
- Developers building applications that require the analysis of user reviews and feedback, such as recommendation systems or customer service tools.
Dataset Name Suggestions
- Norwegian Review Corpus for Sentiment Analysis
- NoReC Norwegian Reviews Dataset
- Nordic Review Sentiment Data
- Norwegian Document Sentiment Collection
Attributes
Original Data Source: Norwegian Review Corpus