Phishing URL Classifier Dataset
Website Analytics & User Experience
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is a curated collection of over 800,000 URLs, designed to represent a variety of online domains. Approximately 52% of these domains are identified as legitimate entities, while the remaining 47% are categorised as phishing domains, indicating potential online threats. The dataset consists of two key columns: "url" and "status". The "status" column uses binary encoding, where 0 signifies phishing domains and 1 indicates legitimate domains. This balanced distribution between phishing and legitimate instances helps ensure the dataset's robustness for analysis and model development.
Columns
- url: This field contains the Uniform Resource Locators (URLs) for each domain, including both legitimate and phishing entries.
- status: This field denotes the classification of the URL. A value of 0 represents a phishing domain, indicating a potential risk, while a value of 1 signifies a legitimate domain, offering assurance.
Distribution
The dataset is provided in a CSV file format. It contains 808,042 unique entries. The distribution of statuses is approximately 394,982 entries flagged as phishing (0) and 427,028 entries flagged as legitimate (1). This offers an almost equal balance across the two categories.
Usage
This dataset is ideal for applications aimed at understanding, combating, and mitigating online threats. It can be used for developing models related to phishing detection, binary classification, and website analytics. It is also suitable for data cleaning exercises and projects involving Natural Language Processing (NLP) and Deep Learning.
Coverage
The data collection for this dataset is global in scope. While a specific time range for data collection is not provided, the dataset was listed on 05/06/2025.
License
CCO
Who Can Use It
This dataset is particularly valuable for researchers and practitioners working in the fields of AI and Machine Learning. Intended users include those looking to:
- Develop and train models for identifying malicious URLs.
- Analyse patterns distinguishing legitimate websites from phishing attempts.
- Enhance cybersecurity measures and protect users from online threats.
Dataset Name Suggestions
- URL Phishing Detection
- Legitimate and Malicious URLs
- Online Threat URL Dataset
- Phishing URL Classifier Data
- Web Security URL Collection
Attributes
Original Data Source: Phishing and Legitimate URLS