Dark Mode

Home

Data Categories

AI & ML Data

NoReC Norwegian Reviews Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

NoReC Norwegian Reviews Dataset

Reviews & Ratings

Tags and Keywords

Travel

Ratings

Reviews

Classification

Nlp

Norwegian

Trusted By

NoReC Norwegian Reviews Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset, known as the Norwegian Review Corpus, is a modified version of the NoReC dataset designed for document-level sentiment analysis. Its primary purpose is to provide an accessible collection of Norwegian reviews and ratings. The original data points remain unchanged, with the modification primarily being its compilation into a CSV file for ease of use, ensuring simplicity and authenticity for various analytical tasks.

Columns

The dataset includes the following columns:

id: A unique identifier for each review.
title: The title associated with the review.
excerpt: A brief excerpt or summary of the review content.
text: The full text of the review.
rating: The numerical rating assigned to the review, serving as the label for sentiment analysis. Values range from 1 to 6, and can be processed to create binary or multiclass datasets.
split: A suggested split identifier (e.g., 'train', 'dev', 'test') for dividing the dataset into training, validation, and test sets.
authors: The author(s) of the review.
language: The dialect of the review, primarily Norwegian Bokmål (nb) at 99%, with a smaller portion in Nynorsk (nn) at 1%.
category: The category of the review, with prominent categories including 'screen' (36%) and 'music' (34%).
source: The original source publication or platform of the review, such as 'vg' (30%) and 'sa' (16%).

Distribution

The dataset is provided in a CSV file format. It contains approximately 43,437 records, each representing a review. The structure includes a designated 'split' column, which can be used to divide the dataset into training, development, and test sets, typically with an 80% train, 10% dev, and 10% other distribution.

Usage

This dataset is ideally suited for:

Training and validating document-level sentiment analysis models in the Norwegian language.
Developing and evaluating machine learning models for text classification based on review ratings.
Research and development in Natural Language Processing (NLP), specifically focusing on Norwegian text.
Creating applications that analyse user feedback and ratings.

Coverage

The dataset is focused on Norwegian-language reviews, primarily in the Bokmål dialect. Geographically, its scope is considered global, making it suitable for international research and applications involving Norwegian content. While the dataset was listed on 24/06/2025, the specific time range for the actual review content is not explicitly provided.

License

CC-BY-NC

Who Can Use It

Intended users include:

Data scientists and machine learning engineers working on NLP tasks and sentiment analysis in Norwegian.
Researchers and academics interested in computational linguistics, especially for low-resource languages or specific regional dialects.
Developers building applications that require the analysis of user reviews and feedback, such as recommendation systems or customer service tools.

Dataset Name Suggestions

Norwegian Review Corpus for Sentiment Analysis
NoReC Norwegian Reviews Dataset
Nordic Review Sentiment Data
Norwegian Document Sentiment Collection

Attributes

Original Data Source: Norwegian Review Corpus

Listing Stats

VIEWS

DOWNLOADS

LISTED

24/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

NoReC Norwegian Reviews Dataset

Reviews & Ratings

Tags and Keywords

Travel

Ratings

Reviews

Classification

Nlp

Norwegian

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS