Opendatabay APP

Fraudulent Job Posting Detection Dataset

Fraud Detection & Risk Management

Tags and Keywords

Text

Nlp

Jobs

Binary

Employment

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Fraudulent Job Posting Detection Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is designed for the prediction of real or fake job postings, addressing the growing concern of fraudulent job descriptions in the online sphere. It contains a collection of 18,000 job descriptions, a notable portion of which are identified as fraudulent—approximately 800 entries. The data includes both detailed textual information from the job descriptions themselves and various meta-information pertaining to the jobs. It serves as a valuable resource for developing machine learning models capable of classifying job descriptions as either legitimate or deceptive. Furthermore, the dataset can be utilised for identifying distinctive traits and features, such as specific words, entities, or phrases, that are characteristic of fraudulent job postings. Researchers and developers can also leverage this dataset to run contextual embedding models for identifying similar job descriptions or to perform exploratory data analysis to uncover interesting insights related to employment fraud.

Columns

This dataset is composed of columns that capture both textual and structured meta-information about job postings. As no specific data sample with column headers was provided, the following columns are inferred based on the dataset's stated purpose and the nature of job advertisements:
  • Job ID: A unique identifier for each individual job posting.
  • Title: The advertised job title (e.g., 'Marketing Intern', 'Head of Content').
  • Location: Geographic details of the job, which may include 'Country', 'State', and 'City'.
  • Department: The specific department within the organisation where the role is situated.
  • Salary Range: The indicated remuneration for the position, typically an annual salary or hourly wage.
  • Company Profile: A descriptive overview of the hiring company.
  • Job Description: The detailed narrative of the role, encompassing responsibilities, qualifications, and benefits.
  • Requirements/Qualifications: Specific skills, prior experience, and educational background necessary for the role.
  • Employment Type: The nature of employment (e.g., 'Full-time', 'Part-time', 'Internship').
  • Experience Level: The required seniority or experience for the position (e.g., 'Entry-level', 'Mid-Senior level').
  • Education Required: The minimum educational qualification expected from candidates.
  • Industry: The sector in which the hiring company operates.
  • Function: The primary professional function of the role (e.g., 'Sales', 'Customer Service', 'Marketing').
  • Is Fake: A binary flag (e.g., 0 or 1) indicating whether the job posting is genuine or fraudulent, serving as the target variable for classification tasks.

Distribution

The dataset comprises 18,000 job descriptions, with approximately 800 of these identified as fraudulent. The data is typically provided in a CSV file format, a common standard for structured datasets. It incorporates a blend of textual content and meta-information for each job posting. Specific figures for file size are not available, but the volume of records makes it a substantial resource for analysis. Sample files would usually be updated separately to the platform.

Usage

This dataset is ideally suited for various applications, including:
  • Fraud Detection Models: Developing classification models to accurately predict whether a job description is fraudulent or real.
  • Feature Identification: Pinpointing key characteristics, such as specific words, entities, or phrases, that are indicative of fraudulent job postings.
  • Semantic Analysis: Running contextual embedding models to identify job descriptions that are semantically similar.
  • Exploratory Data Analysis (EDA): Performing in-depth analysis to uncover insightful patterns and trends within the job market and fraud landscape.

Coverage

The dataset's geographical scope is global. While specific time ranges for the job postings themselves are not explicitly detailed, the dataset was listed on 05/06/2025. There are no specific notes on demographic scope beyond its relevance to employment data. The dataset includes 18,000 job descriptions, with 800 confirmed as fake, providing a clear availability for both legitimate and fraudulent examples.

License

CCO

Who Can Use It

This dataset is particularly useful for:
  • Data Scientists and Machine Learning Engineers: For building and testing fraud detection and text classification models.
  • Researchers: To study patterns in online recruitment fraud and develop new detection methodologies.
  • Job Board Platforms and HR Technology Companies: To implement automated systems for identifying and flagging suspicious job postings, enhancing platform integrity.
  • Analysts: For performing exploratory data analysis to gain insights into employment trends and fraudulent activities.

Dataset Name Suggestions

  • Fraudulent Job Posting Detection Dataset
  • Job Scam Identification Data
  • Employment Fraud Classification Dataset
  • Deceptive Job Description Dataset
  • Real and Fake Job Posting Data

Attributes

Listing Stats

VIEWS

1

DOWNLOADS

1

LISTED

05/06/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free