Opendatabay APP

Text Spam Classification Dataset

Fraud Detection & Risk Management

Tags and Keywords

Internet

Nlp

Text

Binary

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Text Spam Classification Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for classifying text messages as either 'spam' or 'ham' (legitimate). It provides crucial information for developing fraud detection and risk management systems, particularly relevant for platforms handling a high volume of text-based communication.

Columns

  • data: Represents the actual text content of a message.
  • text_type: Indicates the classification of the text, specifically whether it is 'ham' (legitimate) or 'spam'.

Distribution

The dataset is typically provided in a CSV file format. It contains text entries classified as either 'ham' or 'spam', with approximately 70% of entries being 'ham' and 30% being 'spam'. There are 20,334 unique values in the dataset. Specific numbers for total rows or records are not available.

Usage

This dataset is ideal for training and evaluating machine learning models aimed at text classification, especially for:
  • Building robust spam filters for messaging applications.
  • Developing automated fraud detection systems.
  • Enhancing risk management protocols by identifying malicious text patterns.
  • Research in Natural Language Processing (NLP) and binary classification tasks.

Coverage

The dataset's regional scope is global. While a specific time range for data collection is not detailed, the listing date indicates it was listed on 11/06/2025. No specific notes on data availability for certain demographic groups or years are provided.

License

CCO

Who Can Use It

This dataset is valuable for:
  • Data Scientists and Machine Learning Engineers: To develop and refine text classification models.
  • Developers: Integrating spam detection functionalities into applications.
  • Researchers: Exploring new methods in NLP, binary classification, and fraud detection.
  • Organisations: Implementing internal risk management and content moderation tools.

Dataset Name Suggestions

  • 💬 Telegram Spam or Ham
  • Text Spam Classification Dataset
  • Ham Spam Detector Data
  • Fraudulent Text Identifier
  • Messaging Spam Corpus

Attributes

Original Data Source: 💬 Telegram Spam or Ham

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

11/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free