Opendatabay APP

Newsletter Spam URL Classification

Data Science and Analytics

Tags and Keywords

Url

Spam

Classification

Security

Internet

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Newsletter Spam URL Classification Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

Contains approximately 87,500 URLs, classified as either spam or not spam, making it ideal for developing binary classification models. About one-third of the URLs in this collection are designated as spam. The data originates from links found in over 100 newsletters, which are parsed every half-hour. A link is programmatically flagged as spam if it appears three or more times within a single newsletter or includes a likely subscribe/unsubscribe URL. This dataset was created by 'The Pudding'.

Columns

  • url: The specific URL string.
  • is_spam: A boolean value indicating whether the URL is classified as spam (true) or not (false).

Distribution

The dataset is provided as a single CSV file (url_spam_classification.csv) with a size of 11.58 MB. It contains two columns and approximately 148,000 records.

Usage

Ideal applications for this dataset include training and evaluating machine learning models for spam detection, content filtering systems, and cybersecurity research. It can be used to build a binary classification model to automatically identify and flag malicious or unwanted URLs.

Coverage

The dataset consists of URLs collected from a wide variety of internet newsletters without specific geographical or demographic limitations. The data represents a snapshot of links appearing in these newsletters over a period of time, and it is not expected to be updated.

License

CC0: Public Domain

Who Can Use It

  • Data Scientists and Machine Learning Engineers: Can use this dataset to build, train, and validate spam URL classification models.
  • Cybersecurity Analysts: Can leverage this data for research into malicious link patterns and to enhance security protocols.
  • Software Developers: Can integrate models trained on this data into applications to filter spam content and protect users.

Dataset Name Suggestions

  • Newsletter Spam URL Classification
  • Spam vs. Ham URL Links
  • URL Spam Detection Dataset
  • Binary Classification of Web Links
  • Spam URL Collection

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

0

LISTED

17/09/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format