0% found this document useful (0 votes)
32 views62 pages

Resume-Analyzer-dissertation (1)

The document outlines a project titled 'AI-based Resume Analyzer and Job Recommendation System' submitted by Mohd Amaan Khan for the Bachelor of Computer Applications degree at Jamia Hamdard. The project aims to automate resume screening and job role prediction using machine learning and natural language processing techniques, enhancing recruitment efficiency for both employers and job seekers. It includes a web application that allows users to upload resumes, receive job role predictions, and identify skill gaps, although it currently lacks integration with job boards and advanced recruitment features.

Uploaded by

belarashidatifa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views62 pages

Resume-Analyzer-dissertation (1)

The document outlines a project titled 'AI-based Resume Analyzer and Job Recommendation System' submitted by Mohd Amaan Khan for the Bachelor of Computer Applications degree at Jamia Hamdard. The project aims to automate resume screening and job role prediction using machine learning and natural language processing techniques, enhancing recruitment efficiency for both employers and job seekers. It includes a web application that allows users to upload resumes, receive job role predictions, and identify skill gaps, although it currently lacks integration with job boards and advanced recruitment features.

Uploaded by

belarashidatifa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 62

RESUME

ANALYZER

Submitted by:

Mohd Amaan Khan (2022-


301-093)
In partial fulfilment for the award of the degree of

BACHELOR OF COMPUTER APPLICATION(BCA)


Under the Supervision of
Dr. IMRAN HUSSAIN

0|Page
Department of computer science & Engineering
School of Engineering Sciences & Technology

JAMIA HAMDARD
( Deemed to be University )
New Delhi-110062 1|Page
2025
DECLARATION

I hereby declare that the project entitled "AI-based Resume Analyzer and Job
Recommendation System" submitted in partial fulfillment of the requirements
for the Bachelor of Computer Applications (BCA) program is our original
work and has not been submitted previously, either in part or in full, for the
award of any degree or diploma.
This project was carried out under the supervision of Dr. Imran Hussain,
Faculty, Jamia Hamdard, New Delhi.
The project work has been completed at Jamia Hamdard, New Delhi, to the
best of our abilities and efforts, and it embodies the results of our own work and
investigations, except where otherwise stated.

Name Enrollment No. Signature

Mohd Amaan Khan 2022-301-093 ______ ____

Place: Jamia Hamdard


Date: 14 May 2025

Supervisor:
Dr. Imran Hussain
Assistant Professor
Jamia Hamdard, New Delhi

(Signature) __________

ACKNOWLEDGEMENT
2|Page
Place: Jamia Hamdard
Date: 14 May 2025

I would like to express our sincere gratitude to Dr. Imran Hussain, our project
supervisor, for his invaluable guidance, constant support, and encouragement
throughout the course of this project. His insights and expertise played a crucial
role in the successful completion of this work.

I also thankful to Dr. Imran Hussain, along with the faculty members and staff
of the School of Computer Science and Engineering, Jamia Hamdard, for
providing us with the necessary resources, guidance, and a supportive
environment to successfully carry out this project.

My heartfelt thanks go to the Department of Computer Science and


Engineering for their unwavering support, academic guidance, and for
fostering an environment conducive to innovation and learning throughout this
journey.
Finally, we acknowledge the cooperation among our team members, whose
collective efforts and dedication made this project possible.

Mohd Amaan Khan (2022-301-093)

Bachelor of Computer Applications (BCA)


Department of Computer Science & Engineering
School Of Engineering Sciences & Technology
Jamia Hamdard, New Delhi

Under the Supervison of : Dr. Imran Hussain

Table 1. LIST OF ABBREVIATIONS

3|Page
Abbreviations Full Form
AI Artificial Intelligence
ML Machine Learning
DL Deep Learning
NLP Natural Language Processing
CV Computer Vision
BCA Bachelor of Computer Application
DB Database
API Application Programming Interface
UI User Interface
SQL Structured Query Language
DBMS Database Management System
PDF Portable Document Format
TF-IDF Term Frequency-Inverse Document Frequency
HTML HyperText Markup Language
CSS Cascading Style Sheets
JS JavaScript
SPA Single Page Application
GCP Google Cloud Platform
AWS Amazon Web Services
IDE Integrated Development Environment
CRUD Create, Read, Update, Delete
SPA Single Page Application
CI/CD Continuous Integration / Continuous Deployment
UI/UX User Interface / User Experience
XGBoost Extreme Gradient Boosting
SPA Single Page Application
DB Database
API Application Programming Interface
ERD Entity Relationship Diagram
SRS Software Requirements Specification.
Table 2. List of Figures

4|Page
Figure No. Title
Fig 1.1 System Architecture of Resume Analyzer
Fig 1.2 Data Flow Diagram (DFD) of the Overall System
Fig 1.3 Use Case Diagram of the Application
Fig 1.4 Sequence Diagram for Resume Upload and Analysis
Fig 1.5 Flowchart of Resume Text Extraction Process
Fig 1.6 NLP Pipeline for Skill Extraction
Fig 1.7 Skill Matching & Recommendation Module Architecture
Fig 1.8 Resume Scoring Feature Vector Representation
Fig 1.9 Screenshot: Web Application Home Page
Fig 1.10 Screenshot: Resume Upload Page
Fig 1.11 Screenshot: Display of Extracted Skills/ Text
Fig 1.12 Screenshot: Job Role Prediction Output
Fig 1.13 Screenshot: Missing Skills Recommendation
Fig 1.14 Screenshot: Suggested Job Listings from API
Fig 1.15 Screenshot: User Signup & Login Interface
Fig 1.16 Screenshot: Display Score
Fig 1.17 Database Schema D
Model Training Pipeline (TF-IDF + Random Forest +
Fig 1.18
XGBoost)

Fig 1.19 ERD (Entity Relationship Diagram)


Fig 1.20 Radar Chart For Skills Match
Fig 1.21 Context Flow Diagram

Table 3. Table Of Contents


Chapter 1: Introduction ------------------------------------ Page 7 - 12

5|Page
1.1 Abstrat ------------------------------------------------------- page 7 - 8
1.2 Problem Statement --------------------------------------- page - 9
1.3 Context Flow Diagram ------------------------------------- page - 9
1.4 Objectives -------------------------------------------------- page - 10
1.5 Scope Of the Project --------------------------------------- page - 10
1.6 Methodology ---------------------------------------------- page 11 - 12

Chapter 2: Literature Review --------------------------- page 13 - 17


2.1 Overview of Existing Systems ------------------------- page 13 - 14
2.2 Key Research Papers and Technologies -------------- page 14 - 15
2.3 Limitations in Existing Systems ---------------------- page 16 - 17
2.4 Summary -------------------------------------------------- page 17

Chapter 3: System Analysis ------------------------------- page 18 - 29


3.1 Software Requirement Specifications (SRS) ---------- page 18 - 24
3.2 Requirement Analysis --------------------------------- page 25
3.3 Functional Requirements ------------------------------ page 26 - 27
3.4 Non-functional Requirements ------------------------ page 28
3.5 Feasibility Study ---------------------------------------- page 29
3.6 Risk Analysis --------------------------------------------------- page 29

Chapter 4: System Design ------------------------------- page 30 - 37


4.1 System Architecture ------------------------------- page 30
4.2 Data Flow Diagrams (DFD) ---------------------------- page 31 - 32
4.3 Entity Relationship Diagram ---------------------------- page 33
4.4 Use Case Diagram --------------------------------------- page 34
4.5 Sequence Diagram ---------------------------------------- page 35
4.6 Database Design ---------------------------------------- page 36
4.7 Deployment diagram -------------------------------------- page 37

Chapter 5: Implementation -------------------------------- Page 38 -


46
5.1 Tools and Technologies Used ------------------------page 38
5.2 Modules and Functionalities ---------------------------- page 39 - 41
5.3 Resume Parsing and Preprocessing ---------------------- page 42

6|Page
5.4 Skill Extraction using NLP ------------------------------- page 42
5.5 Resume Scoring Model ---------------------------------- page 43
5.6 Job Recommendation Engine ---------------------------- page 44
5.7 User Interface and Web Integration -------------------- page 45
5.8 Radar Chart for Skills Match------------------------------ page 46

Chapter 6: Results and Discussion ----------------------- Page 47 - 49


6.1 Model Accuracy Comparison --------------------------- page 47
6.2 Resume Score Output Samples -------------------------- page 48
6.3 Job Match Samples -------------------------------------- page 49

Chapter 7: Conclusion and Future Work ---------------- Page 50 -


52
7.1 Conclusion ----------------------------------------------- page 50
7.2 Limitations ---------------------------------------------- page 50 -51
7.3 Future Enhancements ----------------------------------- page 51 - 52

Chapter 8: Appendices --------------------------------- Page 53 - 60


 A. Sample Resumes --------------------------------- page 53 - 56
 B. Dataset Details ------------------------------------- page 57
 C. Screenshots ------------------------------------------ page 58 - 60
 D. Source Code Repository Link ------------------ page 60

Chapter 1: Introduction
1.1 Abstract
In recent years, the global job market has become increasingly competitive and
dynamic, prompting the need for advanced technological solutions to optimize

7|Page
recruitment processes. Traditional resume screening is largely manual, time-
consuming, and prone to human error or bias. Recruiters often struggle with
large volumes of resume submissions, leading to inefficiencies in shortlisting
candidates and missed opportunities for suitable talent. On the other side, job
seekers often submit resumes without a clear understanding of the expectations
or skill requirements for specific roles. These challenges underscore the
importance of a smart, scalable, and automated resume analysis system.
This project, titled “Resume Analyzer”, proposes a machine learning-based
web application that automates resume screening and job role prediction. The
system leverages Natural Language Processing (NLP) techniques to parse
resume content, extract relevant features, and classify resumes into predefined
job roles using a trained classification model. In addition, the system performs a
skills gap analysis by comparing extracted resume skills with a structured
database of job-specific requirements. This enables the system to suggest both
technical and soft skills that the candidate might need to acquire to better align
with the predicted role. The overall solution enhances the recruitment process
and empowers job seekers by providing personalized, data-driven feedback on
their resumes.
The Resume Analyzer is built as a Flask-based web application, incorporating
both front-end and back-end components. Users can securely sign up, log in,
and upload their resumes through a simple user interface. The backend includes
a resume parsing pipeline that cleans and preprocesses text using techniques
like tokenization, stopword removal, and TF-IDF vectorization. These
processed features are then fed into a trained machine learning model—selected
after comparative analysis of multiple classification algorithms—to predict the
most suitable job role. Skill extraction and comparison are handled through
integration with a MySQL database, which stores job role-specific skillsets. The
system identifies missing skills and provides role-specific recommendations for
improvement, including critical soft skills such as communication, adaptability,
and problem-solving.
One of the standout features of the Resume Analyzer is its dual benefit: it serves
both recruiters and job seekers. For recruiters, it significantly reduces the time
and effort required to screen candidates manually. The model acts as a first-
level filter, allowing HR professionals to focus only on the most suitable
resumes. For job seekers, it provides insight into how well their resume matches
current job market expectations. This feedback loop is valuable for self-
assessment and professional development, especially in a rapidly evolving
digital job market where new roles and technologies are constantly emerging.
From a technical standpoint, the development of Resume Analyzer involved
several stages:
1. Dataset Preparation: The system was trained using a labeled dataset of
resumes categorized by job roles. This dataset was augmented and
preprocessed to ensure diversity and balance across categories.

8|Page
2. Text Preprocessing and Feature Engineering: Preprocessing was
critical in transforming raw resume data into a format suitable for
machine learning. Text data was cleaned to remove noise and
inconsistencies, followed by vectorization using TF-IDF, which measures
the importance of terms across the corpus.
3. Model Selection and Training: Multiple classification algorithms,
including Logistic Regression, Naive Bayes, and Random Forest, were
evaluated based on accuracy, precision, and recall. The most effective
model was chosen for deployment and was stored as a serialized .pkl file
for prediction tasks.
4. Web Interface and User Experience: The Flask framework was used to
build the web interface, incorporating user-friendly HTML templates for
login, signup, dashboard, and resume upload. JavaScript and CSS were
used for form validations and interface styling.
5. Database Design and Integration: A MySQL database was designed to
store user credentials (with encrypted passwords) and a structured
mapping of job roles to their required technical and soft skills. SQL
queries were optimized to retrieve and compare skills efficiently during
analysis.
6. Security and Data Privacy: User credentials are encrypted before
storage, and the application includes basic session management and input
validation to ensure secure data handling.
Despite its strengths, the current version of Resume Analyzer has certain
limitations. It does not support resumes in formats other than plain text or
extracted text from PDFs. It also does not integrate with third-party job
boards or Applicant Tracking Systems (ATS), nor does it schedule
interviews or offer real-time recruiter interactions. These features could
be incorporated in future versions to create a more comprehensive HR
solution.
In conclusion, the Resume Analyzer addresses a crucial gap in modern
recruitment by introducing automation, intelligence, and feedback into
resume screening. Its application of machine learning and natural
language processing transforms static resume documents into dynamic
career insights. By providing role predictions and skill recommendations,
it not only aids recruiters in shortlisting candidates efficiently but also
supports job seekers in improving their professional profiles. This project
serves as a practical and impactful solution to the real-world challenges
of job market alignment and recruitment optimization.

1.2 Problem Statement


Recruiters and human resource departments often face overwhelming volumes
of resumes when hiring for open positions. Manually filtering through these
documents is not only time-consuming but also highly subjective, which may
9|Page
result in overlooked talent or unconscious bias. Furthermore, many job seekers
submit resumes without a clear understanding of the qualifications and skills
needed for a particular role, which can hinder their chances of being shortlisted.
The lack of automation and intelligent filtering leads to inefficiencies in
recruitment pipelines, missed opportunities for candidates, and extended hiring
cycles for employers. There is a pressing need for a solution that can:
 Analyze resume content objectively,
 Predict appropriate job roles,
 Identify missing or weak areas in skills, and
 Suggest actionable improvements.
Such a system would not only streamline the hiring process but also empower
job seekers to tailor their profiles toward roles they are best suited for.

Fig 1.21: Context Flow Diagram

1.3 Objectives
The key objectives of this project are as follows:

10 | P a g e
 Automated Job Role Prediction: To design and implement a machine
learning-based system that can classify resumes and accurately predict
job roles based on the textual content of the document.
 Skill Extraction and Matching: To extract both technical and soft skills
from resumes and match them against a curated database of role-specific
requirements.
 Skill Gap Analysis: To identify the disparity between a candidate’s
current skill set and the skills required for a predicted job role, and
provide personalized recommendations to bridge these gaps.
 Web-Based User Interface: To develop a user-friendly web application
using Flask that allows users to upload their resumes, view predictions,
and receive feedback.
 Secure User Authentication: To incorporate sign-up and login
functionality with password encryption to ensure user data confidentiality
and security.

1.4 Scope of the Project


The scope of the project is clearly defined to focus on core functionalities that
enhance resume screening and job matching:
 Resume Upload and Parsing: The system allows users to upload resumes
in text format. The content is parsed to extract information relevant for
prediction and analysis.
 Job Role Classification: Using a pre-trained machine learning model, the
application predicts a suitable job role based on resume content.
 Skill Matching and Recommendations: The system matches extracted
skills with a MySQL database containing predefined job role skills.
Based on this comparison, it suggests missing technical and soft skills.
 User Account Management: Basic user authentication is implemented,
allowing individuals to create accounts and securely access their analysis
results.
 Limitations: The system does not include advanced recruitment features
such as job application tracking, real-time employer feedback, or
integration with third-party job platforms. Interview scheduling and
behavioral assessment are also beyond the current scope.

1.5 Methodology
11 | P a g e
The development of this system follows a structured methodology combining
machine learning, web development, and database integration:
1. Data Collection
A labeled dataset of resumes categorized by job roles was used as the
foundation for training the classification model. Each sample included text
content representing a resume and a corresponding job role label.
2. Preprocessing
Text data in resumes was cleaned and transformed to ensure consistency and
effectiveness in model training. This involved:
 Tokenization: Splitting text into individual words or tokens.
 Stopword Removal: Eliminating common but non-informative words
(e.g., “the,” “and,” “is”).
 Vectorization: Converting text into numerical features using TF-IDF
(Term Frequency-Inverse Document Frequency) to reflect word
importance across the dataset.

3. Model Training
Various classification algorithms (e.g., Naive Bayes, Random Forest, Logistic
Regression) were tested. The model with the best accuracy and generalization
performance was selected and saved as a .pkl file for integration into the web
application.

4. Skill Matching
Extracted resume skills were compared against a MySQL database containing
skill sets associated with different job roles. This comparison facilitated the
identification of missing skills and generation of suggestions, including soft
skills relevant to the predicted job role.

5. Web Development
The Flask framework was used to create a lightweight yet functional web
application. It includes:
 A front-end interface with HTML templates for uploading resumes and
displaying results.
 A back-end server to handle processing, model predictions, and database
queries.
 Encrypted storage and verification of user credentials using Python
libraries.

6. Security

12 | P a g e
To ensure privacy and data protection, user passwords are encrypted before
storage in the database. The application follows basic security practices for form
validation and secure session handling.

13 | P a g e
Chapter 2: Literature Review
2.1 Overview of Existing Systems
Over the past decade, numerous resume analysis systems and Applicant
Tracking Systems (ATS) have been developed to support automated
recruitment. These systems aim to streamline the hiring process by enabling
recruiters to filter, rank, and categorize resumes based on predefined keywords,
qualifications, and experience.
One of the earliest forms of such systems includes keyword-based parsing tools
that rely on pattern matching to extract data from resumes. While helpful, these
systems often suffer from inaccuracies due to formatting issues and a lack of
contextual understanding. Modern solutions have shifted towards intelligent
automation using Natural Language Processing (NLP) and Machine Learning
(ML) techniques to analyze resume content more accurately.
Some well-known platforms include:
 LinkedIn Recruiter – Offers AI-powered tools for job matching and
candidate recommendations.
 Hiretual – Uses AI to build talent pools and predict candidate fit.
 Zety and ResumeWorded – Provide resume scoring and feedback
systems based on content optimization.
 ATS systems like Workday and Taleo – Used by enterprises for
filtering and organizing applications.
These systems often offer additional features like job matching, candidate
ranking, and behavioral insights.

Fig 1.11: Upload Section

Fig 1.13: Job Role Prediction


14 | P a g e
Fig 1.15: Suggested Job Listing Via API

Fig 1.17: Resume Score

2.2 Key Research Papers and Technologies


Several academic papers and open-source technologies provide the foundation
for intelligent resume analysis systems. Below are some notable contributions:
1. Resume Classification Using NLP and ML
A common approach in academic research involves using classification
algorithms to map resumes to predefined job categories. For instance, in the
paper "Resume Classification and Ranking Using Natural Language
Processing" (International Journal of Advanced Research in Computer Science,
2019), the authors use TF-IDF vectorization combined with Naive Bayes
classifiers to predict job domains with considerable accuracy.
15 | P a g e
2. Skill Extraction with Named Entity Recognition (NER)
Technologies like SpaCy and Stanford NLP have been employed to extract
entities such as skills, companies, and roles from resumes. A study published in
IEEE Access explored the use of NER models trained specifically for HR
datasets to improve skill identification accuracy by over 20%.

3. Resume Parsers and Tools


Several open-source libraries and tools have been developed for resume parsing,
such as:
 PyResparser – A Python library for extracting structured information
from unstructured resumes.
 docx2txt, pdfminer, and textract – Used for text extraction from
different document formats.
 Scikit-learn – Provides classification, vectorization, and evaluation
tools essential for building ML pipelines.

Fig 1.7: NLP Pipeline For Skill Extraction

These tools serve as foundational technologies for projects like the Resume
Analyzer, which integrate similar components to achieve role prediction and
skills analysis.

16 | P a g e
2.3 Limitations in Existing Systems
While many systems and models have made significant progress in automating
resume screening, they continue to face several limitations:

1. Format Sensitivity
Many commercial ATS platforms struggle with parsing non-standard resume
formats. Elements like tables, graphics, and multi-column layouts can disrupt
the parsing process, leading to incorrect data extraction.

2. Context Ignorance
Keyword-based systems often ignore the context in which a term appears. For
example, a resume that lists "Python" under hobbies may be incorrectly
matched to programming jobs.

3. Limited Personalization
Existing systems often provide generic feedback. They lack the capability to
tailor feedback based on predicted job roles, making their recommendations less
actionable for users looking to improve their resumes for specific roles.

4. High Cost and Proprietary Constraints


Many of the most effective tools and platforms are commercial and not open-
source, making them inaccessible for academic or small-scale applications.
Additionally, they may not be customizable or transparent in how predictions
are made.

5. Soft Skills Detection


Soft skills, such as leadership, adaptability, or communication, are critical in job
suitability but are often overlooked by resume analysis tools due to the
difficulty in detecting them through NLP methods.

17 | P a g e
Fig 1.14: Recommended Skills

These limitations highlight the need for more accessible, accurate, and
customizable resume analysis systems that can offer intelligent predictions and
feedback tailored to individual users.

2.4 Summary
This chapter explored the landscape of existing resume analysis systems,
reviewed key research contributions, and identified the technologies and models
that support intelligent resume screening. It also highlighted critical limitations
in current approaches, including context handling, resume format sensitivity,
and the absence of personalized feedback.
The Resume Analyzer project addresses these challenges by combining a
machine learning-based classification system with NLP-powered skill
extraction and skill gap analysis. Unlike traditional ATS tools, this system is
designed to be lightweight, open-source, and accessible through a web interface.
It not only predicts job roles based on resume content but also offers customized
skill suggestions, including soft skills, that enhance candidate readiness for
specific job markets.
The next chapter will detail the system design, including the architecture,
components, database structure, and user interface considerations.

18 | P a g e
Chapter 3: System Analysis
3.1 Software Requirements Specification (SRS)
Project Title: Resume Analyzer
Mohammad Amaan Khan
Department: Department of Computer Science, School of Engineering Sciences
and Technology, Jamia Hamdard, New Delhi
Project Guide: Dr Imran Hussain

Table of Contents
1. Introduction
1.1 Purpose
1.2 Scope
1.3 Definitions, Acronyms, and Abbreviations
1.4 References
1.5 Overview
2. Overall Description
2.1 Product Perspective
2.2 Product Functions
2.3 User Characteristics
2.4 Constraints
2.5 Assumptions and Dependencies
3. Specific Requirements
3.1 Functional Requirements
3.2 External Interface Requirements
3.3 Non-functional Requirements
3.4 System Features
4. Appendices
4.1 Hardware and Software Requirements
4.2 System Architecture Diagram

1. Introduction

19 | P a g e
1.1 Purpose
The purpose of this SRS is to define the software requirements for the Resume
Analyzer system, which automates resume screening, skill extraction, and job
role recommendation using Natural Language Processing (NLP) and Machine
Learning.
1.2 Scope
Resume Analyzer is a web-based application that allows users to upload their
resumes and receive detailed insights, including extracted skills, education,
experience, job role suggestions, and suitability scores. It supports recruiters in
shortlisting candidates efficiently and helps job seekers align resumes with
desired job roles.
1.3 Definitions, Acronyms, and Abbreviations
 NLP: Natural Language Processing
 ML: Machine Learning
 PDF/DOC: File formats for resumes
 Flask: Python web framework
 HTML: Hypertext Markup Language
 MySQL: Relational database management system
1.4 References
 IEEE Std 830-1998 - IEEE Recommended Practice for Software
Requirements Specifications
 Flask Documentation
 scikit-learn Documentation
 spaCy NLP Documentation

1.5 Overview

20 | P a g e
The rest of the document details system functionality, user requirements,
interface descriptions, and constraints following IEEE SRS standards.

2. Overall Description
2.1 Product Perspective
This is a standalone web-based system that interacts with users, a backend
server, and a MySQL database. It employs a pre-trained NLP model for
extraction and machine learning for job prediction.
2.2 Product Functions
 Resume Upload
 Resume Parsing
 Skill and Entity Extraction
 Job Role Prediction
 Resume Scoring
 Results Display
2.3 User Characteristics
 Job Seekers: Basic computer literacy
 Recruiters: HR professionals with resume screening experience
2.4 Constraints
 Resumes must be in English
 Resume file size must be ≤ 5 MB
 Must run in a browser supporting modern HTML5
2.5 Assumptions and Dependencies
 Internet connectivity is required
 Relies on third-party libraries for NLP and ML

3. Specific Requirements

21 | P a g e
3.1 Functional Requirements
FR1: Resume Upload
 Users must be able to upload resumes (PDF/DOC/DOCX) via a web
form.
FR2: Resume Parsing
 System parses uploaded resumes and extracts key sections (name,
contact, skills, education, experience).
FR3: Skill Extraction
 The NLP engine detects and extracts skills using spaCy and custom
dictionaries.
FR4: Job Role Prediction
 A trained ML model predicts the most suitable job role based on parsed
content.
FR5: Resume Scoring
 Assigns a matching score (0–100) to show suitability for various roles.
FR6: Result Display
 Parsed data, predicted job roles, and scores are shown on the user
dashboard.
FR7: Data Storage
 Parsed and predicted data are stored in a MySQL database.

3.2 External Interface Requirements

22 | P a g e
3.2.1 User Interface
 Web interface using HTML, CSS, Bootstrap
 Resume upload form

3.2.2 Hardware Interfaces


 Standard browser and computer (no special hardware needed)
3.2.3 Software Interfaces
 Flask (backend)
 MySQL (database)
 spaCy/NLTK (NLP processing)
 scikit-learn (ML prediction)

3.3 Non-functional Requirements


NFR1: Performance
 Resume analysis must complete within 5 seconds.
NFR2: Scalability
 System should support 50+ concurrent users.
NFR3: Usability
 Simple and clean UI with clear navigation and instructions.
NFR4: Security
 Secure resume file upload, database access with authentication.
NFR5: Maintainability
 Code must follow modular and reusable structure with clear
documentation.
NFR6: Availability
 99% uptime during working hours.
23 | P a g e
3.4 System Features
Feature Description
Resume Upload Upload resume from local system
Skill Extraction Extract technical/non-technical skills
Job Prediction Suggest job roles using ML
Score Display View matching scores
Data Storage Save resume data for recruiters

4. Appendices
4.1 Hardware and Software Requirements
Hardware:
 Client Machine: Standard PC, 4GB+ RAM
 Server: 8GB+ RAM, Dual Core+, Cloud hosting (GCP/AWS
recommended)
Software:
 Python 3.10+
 Flask
 MySQL
 spaCy, scikit-learn, pandas, numpy
 Browser: Chrome, Firefox
4.2 System Architecture Diagram
Error! Filename not specified.
 User Interface: HTML/CSS form for uploading resumes

24 | P a g e
 Flask App: Handles routing and logic
 Resume Processor: Extracts content
 NLP Engine: Processes text for skills and entities
 ML Module: Predicts job roles and scores
 Database: Stores resume data and predictions
 Results Display: Shows the analyzed insights to the user

Fig 1.1

3.2 Requirement Analysis


The Resume Analyzer project aims to assist job seekers and recruiters by
automating the analysis of resumes using Machine Learning and Natural

25 | P a g e
Language Processing (NLP). It predicts suitable job roles, evaluates skills, and
suggests improvements.
User Requirements:
 A web interface to upload resumes.
 Display of predicted job roles.
 Skill gap analysis and improvement suggestions.
 User registration and secure login.
System Requirements (Server and Development Environment):
Hardware Requirements:
 Processor: Intel i5 or higher
 RAM: 8 GB or more
 Storage: Minimum 500 MB for project files and dependencies.

Software Requirements:
 Python 3.x
 Flask Web Framework
 MySQL Database
 HTML/CSS for frontend
 Libraries: Scikit-learn, Pandas, PDFMiner/Docx2txt, SpaCy/NLTK,
Flask-Login, bcrypt.

3.3 Functional Requirements


Functional requirements describe the core system behavior and features.
1. Resume Upload Module:

26 | P a g e
o Users should be able to upload resumes in .pdf or .docx format.
o Text is extracted for further processing.

Fig 1.11: Upload Section


2. Job Role Prediction:
o ML model predicts suitable job role based on extracted content.
o Uses TF-IDF vectorization and classification algorithms.
o

Fig 1.13: Predicted Job Role

3. Skill Extraction and Matching:


o Extracts technical and soft skills using keyword matching and
NLP.
o Compares with required skills stored in MySQL database.

27 | P a g e
o Identifies missing skills and suggests improvements.

Fig 1.12: Skills Extraction


4. User Authentication:
o Allows sign-up and login using encrypted credentials.
o Ensures user data is protected using hashing (bcrypt).

Fig 1.16: User Signup and Login

3.4 Non-functional Requirements


These define quality attributes of the system:
1. Performance:
o Resume analysis should complete within a few seconds.

28 | P a g e
o Backend ML model is optimized for quick prediction.
o
2. Scalability:
o System can be extended to include more job roles, resumes, or
additional ML models.
3. Security:
o Passwords stored with bcrypt encryption.
o Sessions managed securely using Flask-Login.
4. Usability:
o Intuitive UI allowing easy navigation.
o Clear display of results with suggestions.
5. Maintainability:
o Code is modular with separation between frontend, backend, and
ML logic.

3.5 Feasibility Study


A feasibility study ensures the system is viable across different aspects:
Type Description
The system uses open-source tools like Flask and MySQL, which
Technical
are well supported.

29 | P a g e
Type Description
End-users can upload and analyze resumes without needing
Operational
technical expertise.
No major cost is involved; system runs on a local or cloud server
Economic
with minimal cost.
No sensitive or personally identifiable information is stored
Legal
beyond login credentials.
Conclusion: The system is technically, operationally, and economically feasible.

3.6 Risk Analysis


Every system has inherent risks that must be identified and addressed:
Risk Impact Mitigation Strategy
Resume format not
Medium Restrict upload to .pdf and .docx formats
supported
Model accuracy issues High Train with larger, more diverse datasets
Encrypt credentials; use secure login
Data breach/user data leak High
sessions
Database connection Implement connection retry logic and error
Medium
failure handling
Browser compatibility
Low Test UI on multiple browsers and devices
issues

Chapter 4: System Design


4.1 System Architecture Design
This diagram illustrates the system architecture of the Resume Analyzer
project:
1. User uploads a resume via the HTML-based web interface.
2. The Flask backend handles the request and sends the resume for Resume
Processing.

30 | P a g e
3. Processed resume data is stored in a Database and also passed to the
NLP Engine for:
o Skill Extraction
o Education & Experience Extraction
o Named Entity Recognition (NER)
4. The NLP output is used for Scoring & Job Role Prediction.
5. Final results are retrieved from the database and shown in the Results
Display section.

Fig 1.1 System Architecture Design

4.2 Data Flow Diagrams (DFD):

A Data Flow Diagram (DFD) for the Resume Analyzer visually depicts how
resume data flows through the system, illustrating processes (e.g., parsing,
analysis), data stores (e.g., resume database), data flows (e.g., extracted skills),
and external entities (e.g., users, job platforms). It models how input resumes
are processed to generate outputs like skill matches or job recommendations.
DFDs for the Resume Analyzer clarify system functionality, optimize data

31 | P a g e
processing, and are structured in levels (e.g., Level 0, Level 1) for detailed
design.

Fig 1.2.0: DFD LEVEL 0

32 | P a g e
Fig 1.2.1: DFD Level 1

Fig 1.2.2: DFD Level 2

33 | P a g e
4.3 Entity Relationship Diagram
An Entity-Relationship (ER) Diagram for a resume analyzer visually
represents the main data entities involved in analyzing resumes and
the relationships between them. This diagram is crucial for designing
the underlying database and structuring how resumes, candidates,
skills, job postings, and analysis results are interconnected.

Fig 1.23

34 | P a g e
4.4 Use Case Diagram
A Use Case Diagram for the Resume Analyzer visually represents the system's
functionality, showing interactions between actors (e.g., job seekers, recruiters)
and use cases (e.g., upload resume, analyze skills, generate job matches). It
outlines key features like resume parsing, skill extraction, and report generation,
illustrating how users engage with the system. The diagram helps define system
scope and requirements for the Resume Analyzer.

ML Models

View Analysis

Fig 1.3

35 | P a g e
4.5 Sequence Diagram

A Sequence Diagram for the Resume Analyzer illustrates the dynamic


interactions between actors and the system over time, detailing the sequence of
messages exchanged. Key components include:
 Actors:
o Job Seeker (initiates resume upload and receives analysis results).
o Recruiter (manages job listings and provides feedback/reports).
o System (processes data and generates outputs).

Fig 1.4

36 | P a g e
4.6 Database Schema Design
The database design for the Resume Analyzer includes tables to store user data,
resumes, job listings, and analysis results. Key tables are:
 Users: Stores candidate and recruiter details (e.g., ID, name, email).
 Resumes: Contains resume data (e.g., ID, user_ID, content, upload_date).
 Job_Listings: Holds job details (e.g., ID, title, description, recruiter_ID).
 Analysis_Results: Stores processed data (e.g., ID, resume_ID, skills,
job_matches).

Fig 1.18

37 | P a g e
4.7 Deployment diagram

38 | P a g e
Chapter 5: Implementation
This chapter details the implementation of the Resume Analyzer system. It
explains the tools and technologies used, the structure of various modules, and
the techniques applied for resume parsing, skill extraction, scoring, and job
recommendation. Additionally, it covers the user interface and web integration
aspects of the project.

5.1 Tools and Technologies Used


The following tools, libraries, and platforms were used in the development of
the Resume Analyzer:
 Programming Language: Python
 Web Framework: Flask
 Frontend Technologies: HTML, CSS, JavaScript
 Machine Learning Libraries: scikit-learn, pandas, NumPy
 Natural Language Processing (NLP): spaCy, NLTK
 Database: MySQL
 PDF Generation: ReportLab / FPDF
 Model Serialization: pickle
 External API: JSearch API for real-time job listings
 IDE: VS Code / PyCharm
 Version Control: Git and GitHub

39 | P a g e
5.2 Modules and Functionalities
The project is modularized for maintainability and scalability. Key modules and
their functionalities include:
 Resume Upload Module: Allows users to upload resumes in PDF or
DOCX format.

Fig 1.11
 Parsing Module: Extracts text from resumes using PDF and DOCX
parsers.

Fig 1.6
 Preprocessing Module: Cleans and standardizes extracted text for further
analysis.

Fig 1.11

40 | P a g e
 Skill Extraction Module: Uses NLP to identify both hard and soft skills.

 Role Prediction Module: Predicts the most suitable job role using a
trained ML model.

 Resume Scoring Module: Evaluates resumes based on role-specific skill


matching.

41 | P a g e
 Recommendation Module: Suggests relevant job openings using JSearch
API / Rapid API.

 User Authentication Module: Handles sign-up and login with password


encryption.

 Report Generation Module: Creates downloadable PDF reports of the


analysis.

42 | P a g e
5.3 Resume Parsing and Preprocessing
Resumes are parsed using libraries like pdfplumber for PDF and python-docx
for DOCX files. The text extraction step is followed by:
 Removing special characters and extra whitespaces
 Tokenizing the text
 Lowercasing all text
 Removing stopwords
 Lemmatization to reduce words to their root form
This cleaned text becomes the foundation for downstream NLP tasks.

5.4 Skill Extraction using NLP

Using pre-defined skill keyword lists and NLP models, relevant hard skills (e.g.,
Python, SQL) and soft skills (e.g., teamwork, communication) are extracted
from resumes. The spaCy library is utilized to recognize entities and context,
while rule-based matching helps identify known skills effectively.

43 | P a g e
5.5 Resume Scoring Model
A custom scoring logic evaluates how well a resume aligns with the predicted
job role. The process involves:
• Matching the extracted skills against role-specific required skills retrieved
from the database
• Assigning weights to each skill based on its importance for the job role
• Computing a total score as a percentage of matched versus required skills
• Categorizing resumes (e.g., Excellent, Good, Average) based on predefined
threshold values.

Fig 1.17

44 | P a g e
5.6 Job Recommendation Engine

To enhance usability, the system connects to the JSearch API, fetching live job
listings based on:
 Predicted job role
 User's location (optional)
 Extracted skills
The engine filters and displays relevant job opportunities with details like
company, position, and application links.

45 | P a g e
5.7 User Interface and Web Integration

The user interface is built using HTML, CSS, and JavaScript, integrated with
Flask for backend interaction. Key UI features include:
 Upload form for resumes
 Real-time display of analysis results
 Interactive dashboard showing skills, scores, and recommendations
 Option to download a PDF report
 User login and signup functionality with encrypted password storage in
MySQL
The system ensures a smooth user experience with responsive design and
intuitive navigation.

46 | P a g e
5.8 Radar Chart for Skills Match

A Radar Chart is a graphical method used to display multivariate data in the


form of a two-dimensional chart with axes starting from the same point. Each
axis represents a different variable (e.g., skills), and the data points are
connected to form a polygon. In a Resume Analyzer, it visually compares a
candidate’s skills against the required skills for a job role, making skill gaps and
strengths easy to identify at a glance.

Fig 1.20

47 | P a g e
Chapter 6
6.1 Model Accuracy Comparison

This section presents a comparison between different machine learning models


used for resume scoring and skill extraction. The models evaluated include
Logistic Regression, Random Forest, and BERT.

48 | P a g e
6.2 Resume Score Output Samples
The scoring module assigns a score to each resume based on how closely the
candidate's skills match the role-specific requirements. Below are examples of
real output from the Resume Analyzer system:
Example 1: Machine Learning Engineer
 Predicted Role: Machine Learning Engineer
 Matched Skills: Python, Scikit-Learn, Deep Learning, Pandas etc
 Missing Skills/ Recommended Skills: Tableau, Statistics
 Score: 8.846/10
 Category: Moderate

These examples demonstrate that the scoring system provides a meaningful


evaluation of the candidate’s fit for a role, highlighting areas for improvement.

49 | P a g e
6.3 Job Match Samples
The system integrates with the JSearch API to provide real-time job
recommendations based on the predicted role and extracted skills. Below are
sample outputs:
Example: Machine Learning Engineer
 Predicted Role: Machine Learning Engineer
 Top Job Matches:

This shows that the system successfully fetches and filters relevant job listings,
making it useful for job seekers to explore appropriate opportunities.

Chapter 7: Conclusion and Future Work

50 | P a g e
This chapter summarizes the outcomes of the Resume Analyzer project,
highlights the limitations encountered during implementation, and outlines
possible future enhancements to improve the system’s effectiveness and scope.

7.1 Conclusion
The Resume Analyzer successfully achieves its objective of automating resume
evaluation by combining natural language processing (NLP), machine learning,
and web technologies. The system parses resumes, extracts relevant skills,
predicts suitable job roles, scores resumes based on role-specific skill matching,
and recommends real-time job listings using the JSearch API. The user
interface, built with Flask and modern web technologies, provides an intuitive
and responsive platform for users. By offering skill-based feedback and
personalized job suggestions, the tool serves as a practical solution for both job
seekers and recruiters. The modular design and use of open-source tools make it
adaptable and scalable for future needs.

7.2 Limitations
Despite its functionality, the system has several limitations:
1. Model Training
 The old models were trained with less than 3,000 records, so they aren't
very reliable.
 The new models were trained on over 11,000 records, but we don’t know
how good or complete the data was.
 There’s no info about how the models were tested or if they handle
unfairness or bias.
2. Resume Parsing
 It uses tools like regex and pdfplumber or docx to read resumes, which
might fail for messy or unusual resume formats.
 It doesn’t support all file types properly and has no backup method if
parsing fails.
3. Security
 Even though .env is used to hide passwords and keys, the model files and
credentials are not fully protected or encrypted.
 There's no login security like token-based authentication or limits on how
often users can access it.
4. Scalability
 The project uses Flask with sessions, which may not handle many users at
once in a big production setup.

 It doesn’t mention features like caching, background processing, or


multiple request handling.
51 | P a g e
5. API Dependence
 It depends heavily on the JSearch API. If that service goes down or
changes its rules, job listing features may stop working.
6. No User Interface
 There’s no front-end (user interface) if it's meant for users.
 If it's only an API, it should have tools like Swagger or Postman for
testing and documentation.
7. No Error Tracking
 There’s no system to track or log errors, so if something goes wrong, it
will be hard to debug.
8. Resume Scoring
 The resume score is given by a pre-trained model, but it’s not clear how
the scoring works or if it’s fair for all types of resumes.

7.3 Future Enhancements


Several improvements can be made to enhance the functionality and accuracy of
the Resume Analyzer:
1. Resume Builder with Skill & Course Suggestions
Goal: Help users create or improve resumes by suggesting relevant skills
and recommending courses for missing or weak skills.
Benefit: Helps job seekers develop the skills needed to meet job market
demands.
2. Job Matching with Filters
Goal: Show job opportunities that match a user’s skills, location, salary
preferences, and job type. Users can filter jobs based on their needs.
Benefit: Makes job searching easier by showing only relevant jobs,
saving time.
3. Job Posting Platform for Employers
Goal: Allow employers to post job openings directly and specify the
skills and experience they need from candidates.
Benefit: Makes the platform useful for both job seekers and employers,
bridging the hiring gap.
4. Employer/HR Account
Goal: Let employers create accounts to post jobs, view resumes, and
manage the hiring process.
Benefit: Makes the hiring process faster and easier for employers to track
and shortlist candidates.
5. Applicant Tracking System (ATS) (Compulsory)
Goal: Help employers track candidates through the hiring process, view
6. resumes, and manage applications.
Benefit: Helps HR teams stay organized and automate tasks, saving time.

52 | P a g e
7. Dashboard for Insights (Admin/User)
Goal: Create a dashboard to show resume scores, popular skills, and job
trends, helping users improve their resumes and employers understand
job market demands.
Benefit: Provides valuable insights to help both job seekers and
employers make better decisions.

Chapter 8
Appendices
53 | P a g e
A. Sample Resumes
1.

2.

54 | P a g e
3.

55 | P a g e
4.

56 | P a g e
57 | P a g e
B. Dataset Details

Datasets
1. Combined_Tech_and_NonTech_Category_Resume.csv

"The dataset contains approximately 11,000


records downloaded from Kaggle.com."

2. Enhanced_Resume_Dataset.csv

“The dataset contains approximately 950


records downloaded from Kaggle.com.”

Kaggle.com website link

https://www.kaggle.com/datasets/snehaanbhawal/resume-
dataset

https://www.kaggle.com/datasets/gauravduttakiit/resume-
dataset

58 | P a g e
C. Screenshots
1.

2.

3.

4.

5.
59 | P a g e
6.

7.

60 | P a g e
D. Source Code Repository Link
https://github.com/Amaan7040?
tab=repositories&fbclid=PAQ0xDSwKGCgdleHRuA2FlbQIxMQABpyzIaYcp
orrj9Xvt_raiYC2Ys2liWIWM-
Ae1CyJPgQFwXR54sVMlv255_SRH_aem_CWXAa6yliS2S0uU5FjBfdw

61 | P a g e

You might also like