£0

Text Message Spam/Ham Dataset

Data Science and Analytics

Tags and Keywords

Email

Nlp

Trusted By

Text Message Spam/Ham Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed to facilitate the training of machine learning models for classifying SMS messages as either spam or not spam, often referred to as 'ham'. It comprises a collection of real, English, and non-encoded SMS messages, each meticulously labelled to indicate its status as legitimate or unsolicited. This makes it particularly valuable for research into mobile phone spam, enabling the development of automated tools for identification and blocking, as well as providing a foundation for studying the characteristics of spam messages and devising strategies for avoidance.

Columns

sms: This column contains the actual text content of the SMS message. (String)
label: This column provides the classification for each SMS message, indicating whether it is 'ham' (legitimate) or 'spam' (unsolicited). (String)
- There are 5171 unique SMS message texts.
- Label counts: 4,827 messages are labelled as 'ham' and 747 messages are labelled as 'spam'.

Distribution

The dataset is typically provided in a CSV file format, such as train.csv. It contains 5574 individual SMS messages. The messages are structured with two key fields: the message text itself and its corresponding label (ham or spam).

Usage

Training machine learning models to effectively distinguish between legitimate and spam SMS messages.
Developing tools capable of automatically identifying and blocking unwanted messages on mobile phones.
Conducting academic or industry research into the evolving nature and characteristics of spam messages.
Formulating strategies and preventative measures for users to identify and avoid unsolicited communications.

Coverage

This dataset covers SMS messages globally. The messages are in English, representing real and non-encoded content. While a specific time range for data collection isn't provided, it is a public set collected for mobile phone spam research.

License

CCO

Who Can Use It

Data Scientists and Machine Learning Engineers: For developing and refining text classification models.
Mobile Security Developers: To create or enhance spam filtering applications.
Academic Researchers: For studies on unsolicited communication patterns and natural language processing.
Analysts: To gain insights into the properties of spam messages.

Dataset Name Suggestions

SMS Spam Collection
SMS Message Classifier Data
Mobile Spam Detection Dataset
Text Message Spam/Ham Data

Attributes

Original Data Source: SMS Spam Collection (Text Classification)

Listing Stats

VIEWS

DOWNLOADS

LISTED

08/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0