Opendatabay APP

Webpage Phishing Detection Data

Data Science and Analytics

Tags and Keywords

Phishing

Cyber

Security

Url

Classification

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Webpage Phishing Detection Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset provides a focused collection of features for identifying phishing websites. It was created by merging two distinct datasets with identical characteristics, carefully selecting the most relevant attributes and removing any redundant information. The primary purpose is to offer a streamlined and effective resource for web page phishing detection and analysis. Each entry represents a website, with various URL features used to classify it as either legitimate or a phishing attempt.

Columns

  • url_length: The total character length of the URL.
  • n_dots: The count of '.' characters present in the URL.
  • n_hypens: The count of '-' characters found in the URL.
  • n_underline: The count of '_' characters within the URL.
  • n_slash: The count of '/' characters in the URL.
  • n_questionmark: The count of '?' characters in the URL.
  • n_equal: The count of '=' characters in the URL.
  • n_at: The count of '@' characters in the URL.
  • n_and: The count of '&' characters in the URL.
  • n_exclamation: The count of '!' characters in the URL.
  • n_space: The count of ' ' characters in the URL.
  • n_tilde: The count of '~' characters in the URL.
  • n_comma: The count of ',' characters in the URL.
  • n_plus: The count of '+' characters in the URL.
  • n_asterisk: The count of '*' characters in the URL.
  • n_hastag: The count of '#' characters in the URL.
  • n_dollar: The count of '$' characters in the URL.
  • n_percent: The count of '%' characters in the URL.
  • n_redirection: The count of redirections associated with the URL.
  • phishing: The label indicating whether the URL is a phishing site (1) or a legitimate site (0).

Distribution

The data is provided in CSV format, with each row representing a unique website and each column detailing a specific feature. The dataset contains approximately 100,000 records and has a file size of 4.22 MB. It consists of 20 distinct columns.

Usage

This dataset is ideal for training and evaluating machine learning models designed for phishing website detection. It can be used for:
  • Developing and testing URL classification algorithms.
  • Research into the characteristics of phishing URLs.
  • Educational purposes in cyber security and data science.
  • Building predictive systems to identify malicious websites.

Coverage

The dataset's origin is from two sources published in 2020 and 2021, focusing on web page phishing. While specific geographic or demographic scopes are not detailed, the features are based on URL characteristics, making them generally applicable to English-language web content.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

  • Cyber Security Analysts: For understanding and identifying phishing patterns.
  • Data Scientists and Machine Learning Engineers: For building and refining URL classification models.
  • Researchers: Studying online security threats and developing detection methods.
  • Educators: As a practical resource for teaching cyber security and data analysis.
  • Developers: Creating tools and systems to protect users from phishing.

Dataset Name Suggestions

  • Phishing URL Features Dataset
  • Webpage Phishing Detection Data
  • Merged Phishing URL Dataset
  • Cyber Security Phishing Dataset
  • URL Phishing Indicators

Attributes

Original Data Source:Webpage Phishing Detection Data

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

26/08/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format