Dark Mode

Home

Data Categories

AI & ML Data

Text Spam Classification Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Text Spam Classification Dataset

Fraud Detection & Risk Management

Tags and Keywords

Internet

Nlp

Text

Binary

Trusted By

Text Spam Classification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for classifying text messages as either 'spam' or 'ham' (legitimate). It provides crucial information for developing fraud detection and risk management systems, particularly relevant for platforms handling a high volume of text-based communication.

Columns

data: Represents the actual text content of a message.
text_type: Indicates the classification of the text, specifically whether it is 'ham' (legitimate) or 'spam'.

Distribution

The dataset is typically provided in a CSV file format. It contains text entries classified as either 'ham' or 'spam', with approximately 70% of entries being 'ham' and 30% being 'spam'. There are 20,334 unique values in the dataset. Specific numbers for total rows or records are not available.

Usage

This dataset is ideal for training and evaluating machine learning models aimed at text classification, especially for:

Building robust spam filters for messaging applications.
Developing automated fraud detection systems.
Enhancing risk management protocols by identifying malicious text patterns.
Research in Natural Language Processing (NLP) and binary classification tasks.

Coverage

The dataset's regional scope is global. While a specific time range for data collection is not detailed, the listing date indicates it was listed on 11/06/2025. No specific notes on data availability for certain demographic groups or years are provided.

License

CCO

Who Can Use It

This dataset is valuable for:

Data Scientists and Machine Learning Engineers: To develop and refine text classification models.
Developers: Integrating spam detection functionalities into applications.
Researchers: Exploring new methods in NLP, binary classification, and fraud detection.
Organisations: Implementing internal risk management and content moderation tools.

Dataset Name Suggestions

💬 Telegram Spam or Ham
Text Spam Classification Dataset
Ham Spam Detector Data
Fraudulent Text Identifier
Messaging Spam Corpus

Attributes

Original Data Source: 💬 Telegram Spam or Ham

Listing Stats

VIEWS

DOWNLOADS

LISTED

11/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Text Spam Classification Dataset

Fraud Detection & Risk Management

Tags and Keywords

Internet

Nlp

Text

Binary

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS