Dark Mode

Home

Data Categories

Web & Social Media Data

Hugging Face Reddit-Prompted Narrative Collection

FREE DATASET LIBRARY

Verified Data Provider

£0

Hugging Face Reddit-Prompted Narrative Collection

Reddit & Forum Data

Tags and Keywords

Gemma

Essays

Synthetic

Literature

Nlp

Trusted By

Hugging Face Reddit-Prompted Narrative Collection Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Rewriting narratives with the formidable prowess of Gemma-7b-it, these records illuminate the pathways of imagination through over 2,000 meticulously reshaped essays. Derived from a diverse tapestry of Reddit writing prompts and refined using ChatGPT-generated instructions, the collection explores universal themes of resilience, identity, and the human condition. Each entry represents a creative synthesis that fuses text with visual storytelling elements, capturing a wide spectrum of emotions across both familiar and fantastical landscapes. The project serves as a significant resource for exploring the transformative power of synthetic literary expression and the boundless possibilities of machine-driven storytelling.

Columns

string_field_0: The primary text container holding the core essay content or rewritten narrative.
string_field_1: An additional text field typically containing metadata or secondary narrative descriptions.
string_field_2: A supplemental string column intended for categorical tags or auxiliary identifiers.
string_field_3: A contextual data field used for storing further details or relationship identifiers within the dataset.

Distribution

The information is delivered in a CSV file titled gemma_clean new.csv with a file size of approximately 6.77 MB. It consists of 4 distinct columns with varying levels of data density. While the first column contains 74,100 valid records with a 12% missing rate, the subsequent fields show higher proportions of missing data, ranging from 65% to 83% across the auxiliary columns. The structure is designed for annual updates to reflect ongoing developments in creative text generation.

Usage

This resource is ideal for training and evaluating large language models in the domain of creative and expressive writing. It is well-suited for natural language processing tasks such as text summarisation, sentiment analysis, and the study of narrative structure. Additionally, researchers can utilise the essays to benchmark the performance of models like Gemma-7b-it on the A100 platform or to explore the fusion of text and imagery within automated storytelling pipelines.

Coverage

The scope is rooted in over 100 distinct writing prompts curated from digital archives, reflecting a wide range of human experiences and fictional genres. It provides a contemporary snapshot of AI-generated literature, with content that spans ancient myths to distant futures. The demographic focus is broad, addressing universal truths that transcend specific geographic boundaries, while the technical coverage highlights the use of bfloat16 and TGI architectures for high-efficiency processing.

License

CC0: Public Domain

Who Can Use It

Natural language processing researchers can leverage these essays to study the nuances of machine-generated prose and improve the coherence of synthetic text. Educators and students in literature or computer science might utilise the collection to compare AI-driven narratives with traditional human writing. Furthermore, developers working with BigQuery and NLTK can find this a valuable primary source for practicing advanced data analytics and text mining techniques.

Dataset Name Suggestions

The Gemma Replicator: A Synthetic Literary Corpus
Gemma-7b-it Creative Writing and Narrative Archive
Tapestry of AI-Generated Essays and Writing Prompts
Hugging Face Reddit-Prompted Narrative Collection
Advanced Synthetic Expression and Text Generation Registry

Attributes

Original Data Source: Hugging Face Reddit-Prompted Narrative Collection

Listing Stats

VIEWS

DOWNLOADS

LISTED

30/12/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format

Recommended Datasets

Loading recommendations...