Dark Mode

Home

Data Categories

AI & ML Data

Global Phishing URL Detection Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Global Phishing URL Detection Dataset

Data Science and Analytics

Tags and Keywords

Phishing

Cybersecurity

Malicious

Url

Detection

Trusted By

Global Phishing URL Detection Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

A substantial collection comprising 134,850 legitimate and 100,945 phishing URLs, developed to facilitate the training of machine learning frameworks for cybersecurity. Sourced from the latest web data available during the 2024 study, this repository includes features extracted directly from the source code of webpages and URLs. It provides raw data alongside derived metrics such as character probability and title match scores, serving as the foundation for the PhiUSIIL phishing detection framework published in Computers & Security.

Columns

URL: The specific web address string.
URLLength: The character length of the URL.
Domain: The domain name associated with the URL.
DomainLength: The character length of the domain.
IsDomainIP: Binary indicator classifying if the domain is an IP address.
TLD: Top-Level Domain (e.g., com, org).
URLSimilarityIndex: A derived score indicating the similarity index of the URL.
CharContinuationRate: A feature derived from the source code regarding character usage.
TLDLegitimateProb: The probability score of the TLD being legitimate.
URLTitleMatchScore: A score derived from matching the URL to the page title.
URLCharProb: Character probability metrics derived from the URL.
Label: Classification tag where 1 corresponds to a legitimate URL and 0 to a phishing URL.
FILENAME: A system column which can be ignored during analysis.

Distribution

Format: CSV
Size: 56.85 MB
Structure: 56 columns
Records: 235,795 unique values (134,850 legitimate and 100,945 phishing)

Usage

Training incremental learning models for phishing detection.
Analysing URL structures to identify malicious patterns in social networks and email.
Cybersecurity education and simulation of attack vectors.
Evaluating feature extraction techniques from web source code.

Coverage

Geographic/Scope: Global internet URLs.
Demographic: Covers various sectors including Social Networks, Email, Messaging, Mobile, and Wireless.
Time Range: Contains the latest URLs analysed during the construction of the 2024 PhiUSIIL framework.

License

Attribution 4.0 International (CC BY 4.0)

Who Can Use It

Cybersecurity Researchers
Machine Learning Engineers
Data Scientists specialising in fraud detection
Network Security Analysts

Dataset Name Suggestions

PhiUSIIL Phishing and Legitimate URL Repository
Malicious Webpage Feature Collection
Global Phishing URL Detection Dataset

Attributes

Original Data Source: Global Phishing URL Detection Dataset

Listing Stats

VIEWS

DOWNLOADS

LISTED

04/12/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Global Phishing URL Detection Dataset

Data Science and Analytics

Tags and Keywords

Phishing

Cybersecurity

Malicious

Url

Detection

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in CSV Format

RECOMMENDED DATASETS