Opendatabay APP

Indonesian Poetry Collection Dataset

Entertainment & Media Consumption

Tags and Keywords

Text

Literature

Nlp

Art

Languages

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Indonesian Poetry Collection Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a collection of Indonesian poems, known as puisi, along with their respective titles and authors. It comprises 7,223 unique poems, meticulously scraped from a dedicated poetry website using BeautifulSoup. The title and author fields were subsequently extracted and parsed using regular expressions from the main poem text that included header information. This dataset is designed to support natural language processing (NLP) tasks, literary analysis, and the development of artificial intelligence and machine learning models focused on text generation or understanding in the Indonesian language.

Columns

  • puisi: Contains the full text of the poem.
  • title: Represents the title of the individual poem.
  • author: Identifies the author of the poem.
  • puisi_with_header: Includes the poem text combined with its title and author information as originally scraped.

Distribution

The dataset is typically provided in a CSV file format. It contains 7,223 distinct records, each representing a single poem with its associated metadata. Specific details regarding file size are not available, but the structure is consistent with tabular data.

Usage

This dataset is ideally suited for:
  • Developing and training Natural Language Processing (NLP) models for text analysis, sentiment analysis, or language generation in Indonesian.
  • Conducting literary research and studies on Indonesian poetry.
  • Building AI applications that can generate or understand poetic text.
  • Exploratory data analysis on linguistic patterns and themes within Indonesian literature.

Coverage

The dataset focuses exclusively on Indonesian puisi, providing a linguistic and cultural scope specific to Indonesia. While the source material is global, the content itself is language-specific. There are no explicit geographical, time range, or demographic notes on data availability for specific groups or years within the provided information.

License

CCO

Who Can Use It

  • AI and Machine Learning Researchers: For training and evaluating models on Indonesian text data, particularly for creative text generation or understanding.
  • Linguists and Literary Scholars: To analyse the structure, themes, and authorship of Indonesian poetry.
  • Data Scientists: For projects involving text mining, natural language processing, or building recommendation systems based on literary content.
  • Developers: Interested in integrating poetic data into applications or services.

Dataset Name Suggestions

  • Indonesian Poetry Collection
  • Puisi Indonesia Dataset
  • Indonesian Poem Corpus
  • Bahasa Indonesia Puisi Data

Attributes

Original Data Source: Puisi Indonesia

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

08/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free