Text Spam Classification Dataset
Fraud Detection & Risk Management
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is designed for classifying text messages as either 'spam' or 'ham' (legitimate). It provides crucial information for developing fraud detection and risk management systems, particularly relevant for platforms handling a high volume of text-based communication.
Columns
- data: Represents the actual text content of a message.
- text_type: Indicates the classification of the text, specifically whether it is 'ham' (legitimate) or 'spam'.
Distribution
The dataset is typically provided in a CSV file format. It contains text entries classified as either 'ham' or 'spam', with approximately 70% of entries being 'ham' and 30% being 'spam'. There are 20,334 unique values in the dataset. Specific numbers for total rows or records are not available.
Usage
This dataset is ideal for training and evaluating machine learning models aimed at text classification, especially for:
- Building robust spam filters for messaging applications.
- Developing automated fraud detection systems.
- Enhancing risk management protocols by identifying malicious text patterns.
- Research in Natural Language Processing (NLP) and binary classification tasks.
Coverage
The dataset's regional scope is global. While a specific time range for data collection is not detailed, the listing date indicates it was listed on 11/06/2025. No specific notes on data availability for certain demographic groups or years are provided.
License
CCO
Who Can Use It
This dataset is valuable for:
- Data Scientists and Machine Learning Engineers: To develop and refine text classification models.
- Developers: Integrating spam detection functionalities into applications.
- Researchers: Exploring new methods in NLP, binary classification, and fraud detection.
- Organisations: Implementing internal risk management and content moderation tools.
Dataset Name Suggestions
- 💬 Telegram Spam or Ham
- Text Spam Classification Dataset
- Ham Spam Detector Data
- Fraudulent Text Identifier
- Messaging Spam Corpus
Attributes
Original Data Source: 💬 Telegram Spam or Ham