REsFil Machine Learning
REsFil Machine Learning
ARTICLEINFO ABSTRACT
Article History:
This study explores the utilization of Machine Learning (ML) and Natural
Language Processing (NLP) in automating the resume screening process.
Accepted: 10 April 2024
Traditional methods, often manual and subjective, fail to efficiently manage the
Published: 19 April 2024
volume and variety of resumes. By employing NLP techniques like named entity
recognition and part-of-speech tagging, coupled with ML classifiers such as K-
Nearest Neighbors and Support Vector Machines, we propose a system that
Publication Issue
enhances the precision of candidate selection while significantly reducing time
Volume 10, Issue 2
and effort.
March-April-2024
Keywords : Machine Learning, Natural Language Processing, Resume Screening,
Page Number
NLTK, K-Nearest Neighbors, Support Vector Machines.
602-606
Copyright © 2024 The Author(s): This is an open-access article distributed under the terms of the Creative 602
Commons Attribution 4.0 International License (CC BY-NC 4.0)
Dr. Sandeep Tayal et al Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol., March-April-2024, 10 (2) : 602-606
screening process. By automating the extraction and qualifications. Data for model training comes from
interpretation of information from resumes, this Kaggle.
system aims to significantly reduce the time and effort
B. Resume Screening Classification using Artificial
involved in screening candidates while improving the
Intelligence and Natural Language Processing
accuracy and objectivity of the selection process.
The paper "Resume Screening Classification using
ML algorithms can analyse vast amounts of data to Artificial Intelligence and Natural Language
identify patterns and make predictions. In the context Processing" introduces the Prospect model, a
of resume screening, ML models are trained to machine learning-based system for automating
categorize candidates based on their suitability for a resume screening. This model achieves a
role, using historical hiring data and outcomes as a remarkable accuracy of 93.5%, significantly
learning basis. This approach enables the system to outperforming traditional convolutional neural
evaluate candidates more accurately and consistently network models by 19.5%. It employs a two-phase
than human screeners, potentially uncovering strong approach, starting with the pre-processing and
candidates who might otherwise have been overlooked feature extraction from a unique dataset called the
due to unconventional career paths or non-traditional Prospect dataset, which includes around 5,000
skill sets. NLP techniques are utilized to interpret the resumes. This setup ensures an unbiased
textual content of resumes, extracting valuable classification of resumes into "selected" or "rejected"
information such as skills, work experience, education, categories based on a sophisticated matching score
and achievements. This technology allows the system algorithm and custom logic. The integration of
to understand and process the natural language found artificial intelligence and machine learning
in resumes, transforming unstructured text into techniques in this model offers a promising
structured data that can be easily analysed and direction for enhancing the efficiency and fairness
compared across candidates. of the resume screening process.
Processing (NLP). The literature review highlights tokenization, and cleaning. Through the application
significant contributions to the field, including of advanced NLP techniques like named entity
works by Nandhini S, Gomathi S, Lavanya S, recognition and part-of-speech tagging, the system
Kondapalli Sai Pranay, Shweta Agrawal, and Sumit adeptly extracts critical data from resumes, such as
Gupta, among others. These studies collectively skills, education, and work experiences, while
explore various methodologies for extracting and accommodating various resume formats and
ranking data from resumes using NLP techniques languages. This foundational work facilitates the
and matching them with job descriptions through transition to feature extraction, where key data
ML algorithms. The emphasis across the research is points are transformed into a numerical format
on enhancing the efficiency and accuracy of the suitable for ML model training, employing
resume screening process, which is crucial for methodologies like TF-IDF and word embeddings.
streamlining recruitment and ensuring optimal job- Subsequent stages involve the deployment of ML
candidate matches. classifiers—K-Nearest Neighbours, Support Vector
Machines, and One v/s Rest among them—to
categorize resumes effectively. This process is
III.PROPOSED SYSTEM
refined through rigorous training and testing phases,
A. Problem Statement
employing metrics such as accuracy, precision,
recall, and F1 score for evaluation. The culmination
The current manual process of resume screening is of the system's development sees the integration of
labor-intensive, time-consuming, and susceptible to the NLP-based resume parser with ML classifiers,
bias, failing to efficiently handle the volume and resulting in a comprehensive automated screening
diversity of job applications. This necessitates an system. This system is meticulously tested and
innovative approach to automate and enhance the refined with industry feedback, ensuring it not only
screening process. Leveraging Natural Language meets but exceeds the requirements of modern
Processing (NLP) and Machine Learning (ML) recruitment processes by delivering a solution that
technologies, this study proposes a system aimed at is both scalable and unbiased, significantly
improving the accuracy, efficiency, and fairness of enhancing the efficiency, accuracy, and fairness of
candidate selection, addressing the pressing need candidate selection in the recruitment landscape.
for a scalable and unbiased recruitment solution in C. System Architecture
the digital age.
B. Solution
To address the inefficiencies and limitations of
traditional manual resume screening, this research
introduces a cutting-edge automated system,
leveraging the synergy of Natural Language
Processing (NLP) and Machine Learning (ML)
technologies. The solution encompasses a
comprehensive strategy beginning with the
collection and preprocessing of a diverse dataset of
Image source:
resumes, ensuring readiness for detailed analysis
through techniques such as text normalization,
IV. REFERENCES