Dark Mode

Home

Data Categories

AI & ML Data

Professional Resume Text Dataset

FREE DATASET LIBRARY

Verified Data Provider

£0

Professional Resume Text Dataset

NLP / Natural Language Processing

Tags and Keywords

Resume

Categorisation

Jobs

Nlp

Text

Trusted By

Professional Resume Text Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a collection of professional resumes available in both PDF and string formats. Its primary purpose is to facilitate data extraction and the categorisation of resumes into predefined job-related labels. It provides a valuable resource for developing and testing models that can classify resumes based on their content, making it suitable for a variety of applications in human resources and text analysis.

Columns

ID: A unique identifier that also serves as the filename for the corresponding PDF resume.
Resume_str: Contains the complete resume text extracted into a string format.
Resume_html: Presents the resume data in its original HTML format, as it appeared during the web scraping process.
Category: Indicates the specific job sector or category for which the resume was intended. The dataset includes 24 distinct categories, such as HR, Designer, Information-Technology, Teacher, Advocate, Business-Development, Healthcare, Fitness, Agriculture, BPO, Sales, Consultant, Digital-Media, Automobile, Chef, Finance, Apparel, Engineering, Accountant, Construction, Public-Relations, Banking, Arts, Aviation.

Distribution

The dataset consists of over 2400 resumes. These are provided in both string and PDF formats. The PDF files are organised into a data folder, with each job category having its own subfolder containing the relevant resumes. Each PDF file's name corresponds to its unique ID. For example, there are 118 files for 'ACCOUNTANT' and 'ADVOCATE', 63 for 'AGRICULTURE', and 120 for 'BUSINESS-DEVELOPMENT' and 'INFORMATION-TECHNOLOGY'.

Usage

This dataset is ideal for various applications and use cases, including:

Building models for resume categorisation.
Performing data extraction from resume documents.
Academic research in natural language processing (NLP) and text mining.
Practical applications in recruitment and HR technology.
Fine-tuning large language models (LLMs) for understanding job-related text.
Educational purposes in data science and machine learning.

Coverage

The dataset includes resume examples from a wide array of job categories, reflecting a broad scope of professional roles. While specific geographic or demographic details are not provided, the data was sourced from public resume examples. The dataset is expected to be updated annually, ensuring its continued relevance.

License

CC0: Public Domain

Who Can Use It

This dataset is particularly useful for:

Data Scientists and Machine Learning Engineers: To train and evaluate resume classification algorithms.
Researchers: For studies in NLP, information retrieval, and human resources analytics.
Developers: To integrate resume processing capabilities into applications.
Educators and Students: As a practical resource for learning about text data processing and categorisation.

Dataset Name Suggestions

LiveCareer Resume Classification Dataset
Multi-Category Job Resume Collection
Professional Resume Text Dataset
HR Resume Categorisation Examples

Attributes

Original Data Source: Professional Resume Text Dataset

Listing Stats

VIEWS

110

DOWNLOADS

LISTED

14/07/2025

REGION

GLOBAL

QUALITY

5 / 5

VERSION

1.0

FREE DATASET LIBRARY

£0

Professional Resume Text Dataset

NLP / Natural Language Processing

Tags and Keywords

Resume

Categorisation

Jobs

Nlp

Text

Trusted By

Free

About

Columns

Distribution

Usage

Coverage

License

Who Can Use It

Dataset Name Suggestions

Attributes

Listing Stats

Free

Download Dataset in ZIP Format

RECOMMENDED DATASETS