Opendatabay APP

Phishing URL Classifier Dataset

Website Analytics & User Experience

Tags and Keywords

Nlp

Deep

Data

Binary

Websites

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Phishing URL Classifier Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a curated collection of over 800,000 URLs, designed to represent a variety of online domains. Approximately 52% of these domains are identified as legitimate entities, while the remaining 47% are categorised as phishing domains, indicating potential online threats. The dataset consists of two key columns: "url" and "status". The "status" column uses binary encoding, where 0 signifies phishing domains and 1 indicates legitimate domains. This balanced distribution between phishing and legitimate instances helps ensure the dataset's robustness for analysis and model development.

Columns

  • url: This field contains the Uniform Resource Locators (URLs) for each domain, including both legitimate and phishing entries.
  • status: This field denotes the classification of the URL. A value of 0 represents a phishing domain, indicating a potential risk, while a value of 1 signifies a legitimate domain, offering assurance.

Distribution

The dataset is provided in a CSV file format. It contains 808,042 unique entries. The distribution of statuses is approximately 394,982 entries flagged as phishing (0) and 427,028 entries flagged as legitimate (1). This offers an almost equal balance across the two categories.

Usage

This dataset is ideal for applications aimed at understanding, combating, and mitigating online threats. It can be used for developing models related to phishing detection, binary classification, and website analytics. It is also suitable for data cleaning exercises and projects involving Natural Language Processing (NLP) and Deep Learning.

Coverage

The data collection for this dataset is global in scope. While a specific time range for data collection is not provided, the dataset was listed on 05/06/2025.

License

CCO

Who Can Use It

This dataset is particularly valuable for researchers and practitioners working in the fields of AI and Machine Learning. Intended users include those looking to:
  • Develop and train models for identifying malicious URLs.
  • Analyse patterns distinguishing legitimate websites from phishing attempts.
  • Enhance cybersecurity measures and protect users from online threats.

Dataset Name Suggestions

  • URL Phishing Detection
  • Legitimate and Malicious URLs
  • Online Threat URL Dataset
  • Phishing URL Classifier Data
  • Web Security URL Collection

Attributes

Original Data Source: Phishing and Legitimate URLS

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

05/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free