Dark Mode

Home

Data Categories

AI & ML Data

Phishing URL Classifier Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Phishing URL Classifier Dataset

Website Analytics & User Experience

Tags and Keywords

Nlp

Deep

Data

Binary

Websites

Trusted By

Phishing URL Classifier Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a curated collection of over 800,000 URLs, designed to represent a variety of online domains. Approximately 52% of these domains are identified as legitimate entities, while the remaining 47% are categorised as phishing domains, indicating potential online threats. The dataset consists of two key columns: "url" and "status". The "status" column uses binary encoding, where 0 signifies phishing domains and 1 indicates legitimate domains. This balanced distribution between phishing and legitimate instances helps ensure the dataset's robustness for analysis and model development.

Columns

url: This field contains the Uniform Resource Locators (URLs) for each domain, including both legitimate and phishing entries.
status: This field denotes the classification of the URL. A value of 0 represents a phishing domain, indicating a potential risk, while a value of 1 signifies a legitimate domain, offering assurance.

Distribution

The dataset is provided in a CSV file format. It contains 808,042 unique entries. The distribution of statuses is approximately 394,982 entries flagged as phishing (0) and 427,028 entries flagged as legitimate (1). This offers an almost equal balance across the two categories.

Usage

This dataset is ideal for applications aimed at understanding, combating, and mitigating online threats. It can be used for developing models related to phishing detection, binary classification, and website analytics. It is also suitable for data cleaning exercises and projects involving Natural Language Processing (NLP) and Deep Learning.

Coverage

The data collection for this dataset is global in scope. While a specific time range for data collection is not provided, the dataset was listed on 05/06/2025.

License

CCO

Who Can Use It

This dataset is particularly valuable for researchers and practitioners working in the fields of AI and Machine Learning. Intended users include those looking to:

Develop and train models for identifying malicious URLs.
Analyse patterns distinguishing legitimate websites from phishing attempts.
Enhance cybersecurity measures and protect users from online threats.

Dataset Name Suggestions

URL Phishing Detection
Legitimate and Malicious URLs
Online Threat URL Dataset
Phishing URL Classifier Data
Web Security URL Collection

Attributes

Original Data Source: Phishing and Legitimate URLS

Listing Stats

VIEWS

DOWNLOADS

LISTED

05/06/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format

Recommended Datasets

Loading recommendations...