Professional Resume Text Dataset
Not Specified
Tags and Keywords
Trusted By




"No reviews yet"
Free
About
This dataset is a collection of professional resumes available in both PDF and string formats. Its primary purpose is to facilitate data extraction and the categorisation of resumes into predefined job-related labels. It provides a valuable resource for developing and testing models that can classify resumes based on their content, making it suitable for a variety of applications in human resources and text analysis.
Columns
- ID: A unique identifier that also serves as the filename for the corresponding PDF resume.
- Resume_str: Contains the complete resume text extracted into a string format.
- Resume_html: Presents the resume data in its original HTML format, as it appeared during the web scraping process.
- Category: Indicates the specific job sector or category for which the resume was intended. The dataset includes 24 distinct categories, such as HR, Designer, Information-Technology, Teacher, Advocate, Business-Development, Healthcare, Fitness, Agriculture, BPO, Sales, Consultant, Digital-Media, Automobile, Chef, Finance, Apparel, Engineering, Accountant, Construction, Public-Relations, Banking, Arts, Aviation.
Distribution
The dataset consists of over 2400 resumes. These are provided in both string and PDF formats. The PDF files are organised into a data folder, with each job category having its own subfolder containing the relevant resumes. Each PDF file's name corresponds to its unique ID. For example, there are 118 files for 'ACCOUNTANT' and 'ADVOCATE', 63 for 'AGRICULTURE', and 120 for 'BUSINESS-DEVELOPMENT' and 'INFORMATION-TECHNOLOGY'.
Usage
This dataset is ideal for various applications and use cases, including:
- Building models for resume categorisation.
- Performing data extraction from resume documents.
- Academic research in natural language processing (NLP) and text mining.
- Practical applications in recruitment and HR technology.
- Fine-tuning large language models (LLMs) for understanding job-related text.
- Educational purposes in data science and machine learning.
Coverage
The dataset includes resume examples from a wide array of job categories, reflecting a broad scope of professional roles. While specific geographic or demographic details are not provided, the data was sourced from public resume examples. The dataset is expected to be updated annually, ensuring its continued relevance.
License
CC0: Public Domain
Who Can Use It
This dataset is particularly useful for:
- Data Scientists and Machine Learning Engineers: To train and evaluate resume classification algorithms.
- Researchers: For studies in NLP, information retrieval, and human resources analytics.
- Developers: To integrate resume processing capabilities into applications.
- Educators and Students: As a practical resource for learning about text data processing and categorisation.
Dataset Name Suggestions
- LiveCareer Resume Classification Dataset
- Multi-Category Job Resume Collection
- Professional Resume Text Dataset
- HR Resume Categorisation Examples
Attributes
Original Data Source: Professional Resume Text Dataset