Opendatabay APP

Professional Resume Text Dataset

Not Specified

Tags and Keywords

Resume

Categorisation

Jobs

Nlp

Text

Trusted By
Trusted by company1Trusted by company2Trusted by company3
Professional Resume Text Dataset Dataset on Opendatabay data marketplace

"No reviews yet"

Free

About

This dataset is a collection of professional resumes available in both PDF and string formats. Its primary purpose is to facilitate data extraction and the categorisation of resumes into predefined job-related labels. It provides a valuable resource for developing and testing models that can classify resumes based on their content, making it suitable for a variety of applications in human resources and text analysis.

Columns

  • ID: A unique identifier that also serves as the filename for the corresponding PDF resume.
  • Resume_str: Contains the complete resume text extracted into a string format.
  • Resume_html: Presents the resume data in its original HTML format, as it appeared during the web scraping process.
  • Category: Indicates the specific job sector or category for which the resume was intended. The dataset includes 24 distinct categories, such as HR, Designer, Information-Technology, Teacher, Advocate, Business-Development, Healthcare, Fitness, Agriculture, BPO, Sales, Consultant, Digital-Media, Automobile, Chef, Finance, Apparel, Engineering, Accountant, Construction, Public-Relations, Banking, Arts, Aviation.

Distribution

The dataset consists of over 2400 resumes. These are provided in both string and PDF formats. The PDF files are organised into a data folder, with each job category having its own subfolder containing the relevant resumes. Each PDF file's name corresponds to its unique ID. For example, there are 118 files for 'ACCOUNTANT' and 'ADVOCATE', 63 for 'AGRICULTURE', and 120 for 'BUSINESS-DEVELOPMENT' and 'INFORMATION-TECHNOLOGY'.

Usage

This dataset is ideal for various applications and use cases, including:
  • Building models for resume categorisation.
  • Performing data extraction from resume documents.
  • Academic research in natural language processing (NLP) and text mining.
  • Practical applications in recruitment and HR technology.
  • Fine-tuning large language models (LLMs) for understanding job-related text.
  • Educational purposes in data science and machine learning.

Coverage

The dataset includes resume examples from a wide array of job categories, reflecting a broad scope of professional roles. While specific geographic or demographic details are not provided, the data was sourced from public resume examples. The dataset is expected to be updated annually, ensuring its continued relevance.

License

CC0: Public Domain

Who Can Use It

This dataset is particularly useful for:
  • Data Scientists and Machine Learning Engineers: To train and evaluate resume classification algorithms.
  • Researchers: For studies in NLP, information retrieval, and human resources analytics.
  • Developers: To integrate resume processing capabilities into applications.
  • Educators and Students: As a practical resource for learning about text data processing and categorisation.

Dataset Name Suggestions

  • LiveCareer Resume Classification Dataset
  • Multi-Category Job Resume Collection
  • Professional Resume Text Dataset
  • HR Resume Categorisation Examples

Attributes

Original Data Source: Professional Resume Text Dataset

Listing Stats

VIEWS

0

DOWNLOADS

0

LISTED

14/07/2025

REGION

GLOBAL

Universal Data Quality Score Logo UDQSQUALITY

5 / 5

VERSION

1.0

Free