Finnish Semantic Concreteness Dataset
Synthetic Biology & Genetic Engineering
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a list of Finnish words along with their concreteness values, ranging from 1 (highly abstract) to 5 (very concrete). Its primary purpose is to facilitate poem generation in Finnish and support research in Natural Language Generation (NLG). The data has been specifically produced to assist in generating Finnish poetry that considers aesthetic and framing elements, as demonstrated in a notable 2019 publication on the subject.
Columns
- word: Represents a Finnish word, or multiple words, reflecting how the data has been translated.
- concreteness: This numerical value indicates the average concreteness of the top translations of the 'word'. A value of 1 signifies abstractness, while a value of 5 indicates high concreteness.
Distribution
The dataset is typically provided in a CSV data file format. It contains 35,780 unique words and a total of approximately 35,805 records. The concreteness values span a range from 1.07 to 5.00. The distribution of concreteness values across the dataset varies, with significant counts across all ranges. For example, there are 2,688 records with concreteness values between 4.80 and 5.00.
Usage
This dataset is ideal for:
- Developing and evaluating Finnish poetry generation systems.
- Conducting research in Natural Language Processing (NLP), particularly in areas related to semantic properties of words.
- Analysing and generating content for Finnish literature.
- Psycholinguistic studies on word perception and concreteness in the Finnish language.
Coverage
The dataset offers global coverage for Finnish words. It was produced to support a publication from 2019 and was listed on a marketplace on 17/06/2025, with a version of 1.0. Its scope is focused on the Finnish language.
License
CC-BY-NC
Who Can Use It
- AI and LLM developers creating applications that interact with the Finnish language.
- Researchers in computational linguistics, NLP, and psycholinguistics focusing on Finnish.
- Data scientists and linguists interested in word semantics and text analysis in Finnish.
- Creative technologists and artists working on generative poetry or text in Finnish.
Dataset Name Suggestions
- Finnish Concreteness Lexicon
- Finnish Word Concreteness Values
- Finnish Poetry Generation Data
- Finnish Semantic Concreteness Dataset
Attributes
Original Data Source: Finnish Words and their Concreteness Values