Dark Mode

Home

Data Categories

AI & ML Data

Newsletter Spam URL Classification

FREE DATASET LIBRARY

Verified Data Provider

£0

Newsletter Spam URL Classification

Data Science and Analytics

Tags and Keywords

Url

Spam

Classification

Security

Internet

Trusted By

"No reviews yet"

Free

About

Contains approximately 87,500 URLs, classified as either spam or not spam, making it ideal for developing binary classification models. About one-third of the URLs in this collection are designated as spam. The data originates from links found in over 100 newsletters, which are parsed every half-hour. A link is programmatically flagged as spam if it appears three or more times within a single newsletter or includes a likely subscribe/unsubscribe URL. This dataset was created by 'The Pudding'.

Columns

url: The specific URL string.
is_spam: A boolean value indicating whether the URL is classified as spam (true) or not (false).

Distribution

The dataset is provided as a single CSV file (url_spam_classification.csv) with a size of 11.58 MB. It contains two columns and approximately 148,000 records.

Usage

Ideal applications for this dataset include training and evaluating machine learning models for spam detection, content filtering systems, and cybersecurity research. It can be used to build a binary classification model to automatically identify and flag malicious or unwanted URLs.

Coverage

The dataset consists of URLs collected from a wide variety of internet newsletters without specific geographical or demographic limitations. The data represents a snapshot of links appearing in these newsletters over a period of time, and it is not expected to be updated.

License

CC0: Public Domain

Who Can Use It

Data Scientists and Machine Learning Engineers: Can use this dataset to build, train, and validate spam URL classification models.
Cybersecurity Analysts: Can leverage this data for research into malicious link patterns and to enhance security protocols.
Software Developers: Can integrate models trained on this data into applications to filter spam content and protect users.

Dataset Name Suggestions

Newsletter Spam URL Classification
Spam vs. Ham URL Links
URL Spam Detection Dataset
Binary Classification of Web Links
Spam URL Collection

Attributes

Original Data Source: Newsletter Spam URL Classification

Listing Stats

VIEWS

DOWNLOADS

LISTED

17/09/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Newsletter Spam URL Classification

Data Science and Analytics

Tags and Keywords

Url

Spam

Classification

Security

Internet

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS