Permission and API-Based Malware Data
Data Science and Analytics
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
Designed to support malware research and the validation of detection methods. It contains the preprocessed characteristics of various malware binaries, structured specifically for classification tasks. The primary objective of using this data is to distinguish between 'Malware' and 'Goodware' based on the feature attributes extracted from the binaries. It provides a robust foundation for building and testing security-focused machine learning models.
Columns
The dataset consists of 241 attributes per instance. These features are grouped into two primary categories:
- Permission-based features: Attributes 1 through 214 relate to permission requests and characteristics.
- API-based features: Attributes 215 through 241 relate to Application Programming Interface calls.
The target variable is a class label indicating either 1) Malware or 2) Goodware. All attributes are highly usable and the dataset is reported to have no missing values, though some specific feature counts suggest minor missingness in individual columns.
Distribution
The data is currently available in a static format, expected never to be updated. It consists of 4,465 instances (records) and 241 attributes. The file format is anticipated to be CSV, with a size of approximately 2.17 MB (for TUANDROMD.csv). This is the preprocessed version of the original TUANDROMD data. There are no recommended official data splits provided.
Usage
Ideal applications for this data include:
- Developing novel malware detection methods and algorithms.
- Training and testing binary classification models (Malware vs Goodware).
- Conducting security research on permission and API-based indicators of malicious activity.
- Evaluating the effectiveness and robustness of existing malware classifiers.
Coverage
The data coverage is focused entirely on the technical characteristics of malware binaries. There are no explicit geographic, temporal, or demographic limitations detailed for this dataset. The scope is confined to the feature set (241 attributes) representing the characteristics of the malware instances collected.
License
Attribution 4.0 International (CC BY 4.0)
Who Can Use It
- Cybersecurity Researchers: For building state-of-the-art detection systems.
- Data Scientists/Machine Learning Engineers: For practicing binary classification and feature engineering on security data.
- Academics and Students: For educational purposes related to computer science and programming, particularly in the fields of crime and classification.
Dataset Name Suggestions
- TUNADROMD Malware Classifier
- Android Binary Security Features
- Permission and API-Based Malware Data
- Goodware vs Malware Classification Set
Attributes
Original Data Source: Permission and API-Based Malware Data
Loading...
