Opendatabay APP

Detect AI-Generated Text

Machine Learning and AI

Tags and Keywords

Natural Language Processing

Text Classification

AI Detection

Machine Learning Dataset

Language Models

NLP Benchmarking

Textual Data

Educational Technology

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Detect AI-Generated Text Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset contains a collection of over 28,000 essays, including both human-written and AI-generated content. It is designed for the development of machine learning models capable of detecting whether an essay was written by a student or a language model (LLM). The dataset offers a realistic challenge for creating accurate text classification models in the domain of natural language processing (NLP).

Dataset Features:

  • AI_ID A unique identifier for each Text row.
  • Text: The essay content, which can be written by a human or generated by an AI.
  • Generated: The target label indicating the source of the essay:
    • 0: Human-written essay.
    • 1: AI-generated essay.

Usage:

This dataset is ideal for a variety of NLP applications, including:
  • Training and Testing Models: Develop machine learning models to classify essays based on their origin.
  • Text Analysis: Explore linguistic patterns and differences between human and AI-generated essays.
  • NLP Algorithm Benchmarking: Compare the performance of various algorithms for text classification.
  • Educational Technology: Develop tools to assist educators in identifying AI-generated essays.

Coverage:

The dataset includes a diverse range of essays, with a mix of topics and writing styles. It is well-suited for training classification models and conducting linguistic studies.

License:

CC0 (Public Domain)

Who Can Use It:

This dataset is intended for researchers, data scientists, machine learning practitioners, educators, and students interested in exploring text classification and AI detection in writing.

How to Use It:

  • Develop predictive models for essay classification.
  • Conduct feature analysis to understand distinguishing characteristics between human and AI-generated text.
  • Benchmark different machine learning and deep learning algorithms for NLP tasks.
  • Investigate stylistic differences in AI-generated versus human-written content.

Listing Stats

VIEWS

20

DOWNLOADS

5

LISTED

21/11/2024

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free