Webpage Phishing Detection Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset provides a focused collection of features for identifying phishing websites. It was created by merging two distinct datasets with identical characteristics, carefully selecting the most relevant attributes and removing any redundant information. The primary purpose is to offer a streamlined and effective resource for web page phishing detection and analysis. Each entry represents a website, with various URL features used to classify it as either legitimate or a phishing attempt.
Columns
- url_length: The total character length of the URL.
- n_dots: The count of '.' characters present in the URL.
- n_hypens: The count of '-' characters found in the URL.
- n_underline: The count of '_' characters within the URL.
- n_slash: The count of '/' characters in the URL.
- n_questionmark: The count of '?' characters in the URL.
- n_equal: The count of '=' characters in the URL.
- n_at: The count of '@' characters in the URL.
- n_and: The count of '&' characters in the URL.
- n_exclamation: The count of '!' characters in the URL.
- n_space: The count of ' ' characters in the URL.
- n_tilde: The count of '~' characters in the URL.
- n_comma: The count of ',' characters in the URL.
- n_plus: The count of '+' characters in the URL.
- n_asterisk: The count of '*' characters in the URL.
- n_hastag: The count of '#' characters in the URL.
- n_dollar: The count of '$' characters in the URL.
- n_percent: The count of '%' characters in the URL.
- n_redirection: The count of redirections associated with the URL.
- phishing: The label indicating whether the URL is a phishing site (1) or a legitimate site (0).
Distribution
The data is provided in CSV format, with each row representing a unique website and each column detailing a specific feature. The dataset contains approximately 100,000 records and has a file size of 4.22 MB. It consists of 20 distinct columns.
Usage
This dataset is ideal for training and evaluating machine learning models designed for phishing website detection. It can be used for:
- Developing and testing URL classification algorithms.
- Research into the characteristics of phishing URLs.
- Educational purposes in cyber security and data science.
- Building predictive systems to identify malicious websites.
Coverage
The dataset's origin is from two sources published in 2020 and 2021, focusing on web page phishing. While specific geographic or demographic scopes are not detailed, the features are based on URL characteristics, making them generally applicable to English-language web content.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Cyber Security Analysts: For understanding and identifying phishing patterns.
- Data Scientists and Machine Learning Engineers: For building and refining URL classification models.
- Researchers: Studying online security threats and developing detection methods.
- Educators: As a practical resource for teaching cyber security and data analysis.
- Developers: Creating tools and systems to protect users from phishing.
Dataset Name Suggestions
- Phishing URL Features Dataset
- Webpage Phishing Detection Data
- Merged Phishing URL Dataset
- Cyber Security Phishing Dataset
- URL Phishing Indicators
Attributes
Original Data Source:Webpage Phishing Detection Data