Dark Mode

Home

Data Categories

AI & ML Data

Webpage Phishing Detection Data

FREE DATASET LIBRARY

Verified Data Provider

£0

Webpage Phishing Detection Data

Data Science and Analytics

Tags and Keywords

Phishing

Cyber

Security

Url

Classification

Trusted By

Webpage Phishing Detection Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a focused collection of features for identifying phishing websites. It was created by merging two distinct datasets with identical characteristics, carefully selecting the most relevant attributes and removing any redundant information. The primary purpose is to offer a streamlined and effective resource for web page phishing detection and analysis. Each entry represents a website, with various URL features used to classify it as either legitimate or a phishing attempt.

Columns

url_length: The total character length of the URL.
n_dots: The count of '.' characters present in the URL.
n_hypens: The count of '-' characters found in the URL.
n_underline: The count of '_' characters within the URL.
n_slash: The count of '/' characters in the URL.
n_questionmark: The count of '?' characters in the URL.
n_equal: The count of '=' characters in the URL.
n_at: The count of '@' characters in the URL.
n_and: The count of '&' characters in the URL.
n_exclamation: The count of '!' characters in the URL.
n_space: The count of ' ' characters in the URL.
n_tilde: The count of '~' characters in the URL.
n_comma: The count of ',' characters in the URL.
n_plus: The count of '+' characters in the URL.
n_asterisk: The count of '*' characters in the URL.
n_hastag: The count of '#' characters in the URL.
n_dollar: The count of '$' characters in the URL.
n_percent: The count of '%' characters in the URL.
n_redirection: The count of redirections associated with the URL.
phishing: The label indicating whether the URL is a phishing site (1) or a legitimate site (0).

Distribution

The data is provided in CSV format, with each row representing a unique website and each column detailing a specific feature. The dataset contains approximately 100,000 records and has a file size of 4.22 MB. It consists of 20 distinct columns.

Usage

This dataset is ideal for training and evaluating machine learning models designed for phishing website detection. It can be used for:

Developing and testing URL classification algorithms.
Research into the characteristics of phishing URLs.
Educational purposes in cyber security and data science.
Building predictive systems to identify malicious websites.

Coverage

The dataset's origin is from two sources published in 2020 and 2021, focusing on web page phishing. While specific geographic or demographic scopes are not detailed, the features are based on URL characteristics, making them generally applicable to English-language web content.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

Cyber Security Analysts: For understanding and identifying phishing patterns.
Data Scientists and Machine Learning Engineers: For building and refining URL classification models.
Researchers: Studying online security threats and developing detection methods.
Educators: As a practical resource for teaching cyber security and data analysis.
Developers: Creating tools and systems to protect users from phishing.

Dataset Name Suggestions

Phishing URL Features Dataset
Webpage Phishing Detection Data
Merged Phishing URL Dataset
Cyber Security Phishing Dataset
URL Phishing Indicators

Attributes

Original Data Source:Webpage Phishing Detection Data

Listing Stats

VIEWS

DOWNLOADS

LISTED

26/08/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Webpage Phishing Detection Data

Data Science and Analytics

Tags and Keywords

Phishing

Cyber

Security

Url

Classification

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS