Dark Mode

Home

Data Categories

AI & ML Data

Malicious URL Classification Data

FREE DATASET LIBRARY

Verified Data Provider

£0

Malicious URL Classification Data

Data Science and Analytics

Tags and Keywords

Phishing

Url

Security

Cyber

Detection

Trusted By

Malicious URL Classification Data Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This resource offers crucial insights for detecting and analyzing phishing domains embedded within URLs. It encompasses a wide array of features extracted from URLs, significantly bolstering the capacity to uncover potential phishing attempts. The data provides a detailed view, shedding light on attributes frequently associated with malicious activities. It is derived from a consolidation of other datasets, with added features incorporated for completeness.

Columns

The data contains 13 distinct features related to URL structure and content. Key columns include:

Phising (Label): Indicates whether a URL is classified as phishing (1) or not (0).
NumDots: The count of dot symbols (.) found in the URL.
UrlLength: The total length of the URL string.
AtSymbol: Registers the presence of the "@" symbol in the URL.
NumDash: The count of dash marks (-) present in the URL.
NumPercent: The count of percent marks (%) found in the URL.
NumQueryComponents: The count of question marks (?) in the URL, used to determine the number of query sections.
IpAddress: A binary indicator (1/0) noting if the URL uses a direct IP address.
HttpsInHostname: Notes the presence of 'https' within the hostname portion of the URL.
PathLevel: Defines the depth of the directory hierarchy in the path of a URL.
PathLength: Represents the total number of segments in the URL path.
NumNumericChars: The count of numeric characters (0-9) within the URL.

Distribution

The data is provided in a file named Phising_dataset_predict.csv, with a size of 23.11 MB. It consists of 13 columns and contains approximately 663,000 records (rows). Note that 5% of the records in the primary "Phising" label column are missing, resulting in 630,000 valid entries for that feature. The dataset is static and has an expected update frequency of 'Never'.

Usage

This data is ideally suited for developing and evaluating models designed for cyber security. Specific use cases include training machine learning algorithms (such as classification models) to detect malicious URLs, enabling automated phishing prevention systems, and facilitating research into the structural indicators of web attacks.

Coverage

The scope is focused on features derived from the structure and composition of URLs. It does not contain geographic, time range, or demographic information, concentrating solely on attributes relevant to identifying phishing attempts based on URL metrics like length, character counts, and symbol presence.

License

CC0: Public Domain

Who Can Use It

Intended users include Cyber security researchers who need labelled data for attack pattern analysis; Machine learning engineers developing fraud detection or filtering software; Data scientists building URL reputation scoring systems; and Students engaged in academic projects concerning network security.

Dataset Name Suggestions

Phishing URL Detection Features
Web Attack Feature Set
Malicious URL Classification Data
URL Phishing Indicator Metrics

Attributes

Original Data Source: Malicious URL Classification Data

Listing Stats

VIEWS

DOWNLOADS

LISTED

05/11/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

Free

Download Dataset in CSV Format

Recommended Datasets

Loading recommendations...