Detect AI-Generated Text
Machine Learning and AI
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset contains a collection of over 28,000 essays, including both human-written and AI-generated content. It is designed for the development of machine learning models capable of detecting whether an essay was written by a student or a language model (LLM). The dataset offers a realistic challenge for creating accurate text classification models in the domain of natural language processing (NLP).
Dataset Features:
- AI_ID A unique identifier for each Text row.
- Text: The essay content, which can be written by a human or generated by an AI.
- Generated: The target label indicating the source of the essay:
- 0: Human-written essay.
- 1: AI-generated essay.
Usage:
This dataset is ideal for a variety of NLP applications, including:
- Training and Testing Models: Develop machine learning models to classify essays based on their origin.
- Text Analysis: Explore linguistic patterns and differences between human and AI-generated essays.
- NLP Algorithm Benchmarking: Compare the performance of various algorithms for text classification.
- Educational Technology: Develop tools to assist educators in identifying AI-generated essays.
Coverage:
The dataset includes a diverse range of essays, with a mix of topics and writing styles. It is well-suited for training classification models and conducting linguistic studies.
License:
CC0 (Public Domain)
Who Can Use It:
This dataset is intended for researchers, data scientists, machine learning practitioners, educators, and students interested in exploring text classification and AI detection in writing.
How to Use It:
- Develop predictive models for essay classification.
- Conduct feature analysis to understand distinguishing characteristics between human and AI-generated text.
- Benchmark different machine learning and deep learning algorithms for NLP tasks.
- Investigate stylistic differences in AI-generated versus human-written content.