Opendatabay APP

NLP Corpus of Spanish Film Reviews

Entertainment & Media Consumption

Tags and Keywords

Movies

Nlp

Spanish

Filmaffinity

Trusted By
Trusted by company1Trusted by company2Trusted by company3
NLP Corpus of Spanish Film Reviews Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset offers a valuable corpus of film reviews in Spanish, specifically designed to support Natural Language Processing (NLP) research and development. In a field that often focuses heavily on the English language, this collection provides a much-needed resource for understanding natural language within the Spanish context. It comprises user-generated criticisms of over 50 highly relevant Spanish films, sourced from the Filmaffinity.com website. The aim is to foster knowledge sharing in Spanish NLP among users.

Columns

  • film_name: The title of the film.
  • gender: The genre of the film (e.g., comedy, horror, action).
  • film_avg_rate: The average rating of the film, based on votes from all users.
  • review_rate: The specific rating assigned by the user who authored the review.
  • review_title: The title given to the individual film review.
  • review_text: The full text of the film criticism itself. It is important to note that the data file uses a double pipe "||" as a separator, which may cause display issues with extra columns on some platforms, such as Kaggle.

Distribution

The dataset is structured in a tabular format, typically available as a CSV file. It contains reviews related to more than 50 Spanish films. Specific counts for rows or records are not provided; however, the file's delimiter is a double pipe "||".

Usage

This dataset is ideally suited for various applications in Natural Language Processing (NLP) focusing on the Spanish language. It can be used for:
  • Developing and testing NLP models for sentiment analysis on Spanish text.
  • Training machine learning models for text classification or topic modelling.
  • Learning and experimenting with NLP techniques using a real-world Spanish corpus.
  • Facilitating knowledge exchange and collaborative projects on Spanish NLP.

Coverage

The dataset focuses exclusively on Spanish films and Spanish language reviews. The films included are those considered most relevant at the time the dataset was created, ensuring a relevant and current body of criticism from Filmaffinity.com users. There is no specified time range beyond the creation date for the included films.

License

CC0

Who Can Use It

This dataset is particularly beneficial for:
  • Spanish-speaking Kaggle users looking to contribute to and learn from NLP projects in their native language.
  • Researchers and students in artificial intelligence, linguistics, or data science focusing on NLP within the Spanish context.
  • Developers building applications that require understanding or processing Spanish text, especially in the entertainment or media sectors.
  • Anyone interested in analysing user-generated content and opinions on films in Spanish.

Dataset Name Suggestions

  • Spanish Film Review Dataset
  • Filmaffinity Spanish Movie Criticisms
  • NLP Corpus of Spanish Film Reviews
  • Spanish Language Movie Reviews

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

16/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free