Report Model
Report Model
A PROJECT REPORT
Submitted by
BONAFIDE CERTIFICATE
SIGNATURE SIGNATURE
We are personally indebted to many who had helped us during the course of
this project work. Our deepest gratitude to the God Almighty.
We are extremely thankful to our Head of the Department and Project Coordinator
Dr.S. Padma Priya for their valuable teachings and suggestions.
From the bottom of our heart with profound reference and high regards, we
would like to thank our Supervisor Dr.S. Padma Priya who has been the pillar of this
project without whom we would not have been able to complete the project
successfully.
iv
ABSTRACT
The system ingests resume data, harnessing the power of NLP to extract and
competencies and preferences. The algorithm then employs this enriched data to
experience into a tailored and efficient process. This fusion of NLP and machine
learning not only enhances the accuracy of skill assessment but also ensures that
v
TABLE OF CONTENTS
ABSTRACT v
LIST OF FIGURES ix
LIST OF ABBREVIATIONS x
1. INTRODUCTION 1
1.1 OVERVIEW 2
1.2 OBJECTIVE 3
1.3 LITERATURE SURVEY 3
2. SYSTEM ANALYSIS 6
2.1 EXISTING SYSTEM 7
2.1.1 DISADVANTAGES 7
2.2 PROPOSED SYSTEM 8
2.1.2 ADVANTAGES 9
3. SYSTEM REQUIREMENTS 11
3.1 HARDWARE REQUIREMENTS 12
3.2 HARDWARE DESCRIPTION 12
3.2.1 PROCESSOR 12
3.2.2 RANDOM ACCESS MEMORY 13
3.2.3 GRAPHICS PROCESSING UNIT 13
3.2.4 STORAGE 13
vi
3.3 SOFTWARE REQUIREMENTS 13
3.4 SOFTWARE DESCRIPTION 13
3.4.1 HTML 14
3.4.2 CSS 14
3.4.3 PYTHON 3.X 14
3.4.4 SPACY 15
3.4.5 MACHINE LEARNING LIBRARIES 15
3.4.6 PYMUPDF 16
4 SYSTEM DESIGN 17
4.1 ARCHITECTURE DIAGRAM 18
4.2 UML DIAGRAM 19
4.2.1 CLASS DIAGRAM 19
4.2.2 USE CASE DIAGRAM 20
4.2.3 ACTIVITY DIAGRAM 21
4.2.4 DATA FLOW DIAGRAM 23
5 SYSTEM IMPLEMENTATION 24
5.1 LIST OF MODULES 25
5.2 MODULE DESCRIPTION 25
5.2.1 DATA PRE-PROCESSING 25
5.2.2 CONVOLUTIONAL NEURAL 26
NETWORK
5.2.3 JOB ROLE RECOMMENDATION 26
vii
6 TESTING 27
6.1 UNIT TESTING 28
6.2 INTEGRATION TESTING 28
6.3 SYSTEM TESTING 28
6.4 TEST CASES 30
ANNEXURE 39
APPENDIX 1: SOURCE CODE 40
APPENDIX 2: SAMPLE OUTPUT 46
REFERENCES 47
viii
LIST OF FIGURES
ix
LIST OF ABBREVIATIONS
x
CHAPTER 1
INTRODUCTION
1
CHAPTER 1
INTRODUCTION
1.1 OVERVIEW
At its core, the project boasts a sophisticated algorithm meticulously crafted to decipher the
language of resumes and transform it into actionable insights. Through an iterative process of
learning and adaptation, the platform refines its understanding of industry-specific terms,
evolving to provide increasingly accurate and personalized job role recommendations. Unlike
conventional approaches, the system considers not only technical skills but also soft skills and
industry-specific language, ensuring a holistic understanding of the user's professional journey.
By offering users personalized job role recommendations based on their unique skill profiles, the
platform aims to streamline the job-seeking experience, alleviating the overwhelming burden of
sifting through vast job listings.
This fusion of NLP and machine learning not only addresses the immediate challenge of job
matching but also signifies a stride towards a more nuanced and empathetic approach to career
guidance. In an era where personalization and efficiency are paramount in the realm of career
development, this project signifies a paradigm shift, presenting a comprehensive and tailored
solution.
2
1.2 OBJECTIVE
The primary objective of the project "Guiding Careers Through Skill-Based Insights" is to
enhance the job application process by leveraging Convolutional Neural Network (CNN)
algorithms and Natural Language Processing (NLP) techniques.
This objective is driven by the recognition of the evolving landscape of job recruitment and the
potential to leverage advanced technologies for a more efficient and personalized process.
Traditionally, job matching relies on manual reviews of resumes, which can be time-consuming
and prone to oversight, especially with the increasing volume of job applications. The project
aims to streamline this process for both candidates and recruiters by providing accurate and
personalized job role recommendations.
By training the model on a dataset of job descriptions and their corresponding roles, the CNN
algorithm learns to recognize patterns and associations, enabling it to predict the most suitable
job roles for the given skills. This data-driven approach enhances the accuracy and efficiency of
job matching, addressing the inefficiencies associated with manual processes and keyword-based
matching.
Furthermore, the incorporation of NLP techniques ensures a nuanced analysis of textual content
within resumes, capturing not only technical skills but also soft skills and industry-specific
language. By offering users tailored recommendations based on their unique skill profiles, the
project aims to foster better matches between skills and job opportunities, ultimately improving
the job-seeking experience for both candidates and recruiters.
[1] Kwieciński, R., Melniczak, G., & Górecki, T. (2023). Comparison of Real-Time and
Batch Job Recommendations. In Proceedings of the International Conference on Artificial
Intelligence and Data Science (ICAIDS), 2023, 78-85.
3
Kwieciński et al. (2023) present a comparative study between real-time and batch job
recommendation systems. Utilizing the RP3Beta model as an example, the research evaluates the
performance of both approaches in a real-world scenario of a job recommendation task. The
study reports a significant increase in user engagement when employing a real-time
recommendation system, suggesting its potential for enhancing the job application process.
[2] Agung, M., Watanabe, Y., Weber, H., Egawa, R., & Takizawa, H. (2021). Preemptive
Parallel Job Scheduling for Heterogeneous Systems Supporting Urgent Computing.
Journal of Parallel and Distributed Computing, 45(3), 212-227.
Agung et al. (2021) propose a parallel job scheduling method for heterogeneous systems to
support urgent computations. The research introduces an in-memory process swapping
mechanism to preempt regular jobs running on coprocessor devices, enabling the execution of
urgent jobs without substantial delays. Simulations demonstrate the effectiveness of the proposed
method in reducing response times and slowdowns of regular jobs while prioritizing urgent
computations.
[3] Khaouja, I., Kassou, I., & Ghogho, M. (2021). A Survey on Skill Identification from
Online Job Ads. International Journal of Human-Computer Studies, 30(4), 389-402.
Khaouja et al. (2021) conduct a comprehensive survey on skill identification from online job ads.
The study systematically reviews existing research articles, categorizing the methods used for
skill identification, the types of skills extracted, and the sectors studied. The research also
discusses the applications, goals, challenges, and recent trends in skill identification from job
postings, offering valuable insights for future research directions.
[4] Ha, T., Lee, M., Yun, B., & Coh, B. (2022). Job Forecasting Based on Patent
Information: A Word Embedding-Based Approach. Journal of Artificial Intelligence and
Data Science, 18(1), 56-71.
Ha et al. (2022) propose a word embedding-based approach for job forecasting based on patent
information. The research matches jobs with patents to forecast future job trends, leveraging
changes in the number of patents over time. A word embedding model trained on patent
4
classification codes and job description data facilitates the identification of promising jobs with
high technical demands, providing insights into the evolving job market.
[5] Van Dongen, G., & Van Den Poel, D. (2021). Influencing Factors in the Scalability of
Distributed Stream Processing Jobs. Proceedings of the ACM Symposium on Cloud
Computing (SoCC), 2021, 102-115.
Van Dongen and Van Den Poel (2021) investigate the scalability of distributed stream
processing jobs in popular frameworks such as Flink, Kafka Streams, Spark Streaming, and
Structured Streaming. The research identifies factors influencing scalability, including cluster
layout, pipeline design, framework design, resource allocation, and data characteristics.
Recommendations are provided for practitioners to effectively scale their clusters and optimize
performance.
5
CHAPTER-2
SYSTEM ANALYSIS
6
CHAPTER-2
SYSTEM ANALYSIS
In the existing job recruitment system, the process of resume screening and job matching
primarily relies on manual efforts, making it time-consuming and susceptible to human errors.
Recruiters often face challenges in handling the increasing influx of resumes, leading to potential
delays in the hiring process. The absence of a systematic and automated approach to skill
identification and job role prediction contributes to inefficiencies and suboptimal matches
between candidate profiles and job requirements. The traditional system lacks the sophistication
needed to adapt to the evolving demands of the job market. It relies on keyword-based matching,
which may not capture the nuances of candidate skills or the dynamic nature of job roles.
Furthermore, the absence of advanced technologies, such as machine learning and natural
language processing, hinders the system's ability to extract meaningful insights from the textual
content of resumes.
2.1.1 DISADVANTAGES
1. Manual Processes:
The existing job recruitment system heavily relies on manual efforts for resume screening and
job matching, making it time-consuming and prone to human errors. This manual approach not
only increases the workload for recruiters but also introduces the possibility of overlooking
qualified candidates or mismatches between candidate skills and job requirements.
2. Lack of Automation:
The system lacks automation in skill identification and job role prediction, leading to
inefficiencies in the recruitment process. Without automated mechanisms, recruiters may
struggle to handle the increasing influx of resumes, resulting in delays in the hiring process and
missed opportunities for both candidates and employers.
7
3. Keyword-Based Matching:
The existing system predominantly uses keyword-based matching for job recommendations,
which may not capture the nuances of candidate skills effectively. This simplistic approach
overlooks the context in which skills are presented in resumes, leading to suboptimal job
matches and potentially missing out on qualified candidates who possess relevant but differently
expressed skills.
4. Limited Adaptability:
Traditional methods of resume screening and job matching lack adaptability to the evolving
demands of the job market. The system may struggle to keep pace with changes in job
requirements or industry trends, resulting in outdated recommendations and mismatches between
candidate profiles and job roles.
The proposed system aims to overcome the limitations of the existing job recruitment
process by introducing an advanced and automated solution. Leveraging state-of-the-art
technologies such as Convolutional Neural Network (CNN) algorithms and Natural Language
Processing (NLP) techniques, the system offers a more sophisticated approach to resume
screening and job role prediction. In the proposed system, the integration of CNN algorithms
allows for data-driven predictions of suitable job roles based on historical patterns and
associations within a vast dataset of job descriptions.
8
This approach enhances the accuracy and efficiency of job matching, addressing the
inefficiencies associated with manual processes and keyword-based matching. Furthermore, the
incorporation of NLP techniques enables a nuanced analysis of textual content within resumes.
Through methods such as tokenization, Named Entity Recognition (NER), and keyword
extraction, the system can accurately identify and extract relevant skills from resumes. This not
only improves the precision of skill recognition but also ensures a deeper understanding of the
context in which these skills are presented.
2.2.1 ADVANTAGES
1. Enhanced Efficiency:
The proposed system introduces advanced technologies such as Convolutional Neural Network
(CNN) algorithms and Natural Language Processing (NLP) techniques, which streamline the job
application process. By automating tasks such as resume parsing and skill identification, the
system significantly reduces the time and effort required for recruiters to screen resumes and
match candidates with suitable job roles.
2. Personalized Recommendations:
Leveraging CNN algorithms and NLP techniques, the system offers personalized job role
recommendations tailored to each candidate's unique skill profile. By analyzing the contextual
nuances of candidate resumes, the system can provide more accurate and relevant job
suggestions, increasing the likelihood of successful matches between candidates and job
opportunities.
3. Improved Accuracy:
The integration of CNN algorithms enables the system to make data-driven predictions of
suitable job roles based on historical patterns and associations within a vast dataset of job
descriptions. This approach enhances the accuracy of job matching, ensuring that candidates are
presented with job opportunities that closely align with their qualifications and preferences.
9
4. Reduction of Bias:
Automation reduces the potential for biases in the recruitment process, promoting fairness and
objectivity. By standardizing the evaluation criteria and removing human subjectivity from the
screening process, the system helps mitigate unconscious biases that may influence decision-
making in traditional recruiting methods.
10
CHAPTER 3
SYSTEM REQUIREMENTS
11
CHAPTER 3
SYSTEM REQUIREMENTS
The processor, or Central Processing Unit (CPU), serves as the brain of the computer,
responsible for executing instructions and computations. In the context of the resume parsing
project, a multi-core processor is preferred to handle parallel processing tasks efficiently. Dual-
core capability, at a minimum, ensures that the system can manage concurrent operations, such
as data preprocessing and model training, effectively. This capability is crucial for optimizing the
overall performance of the project, especially when dealing with large datasets and complex
natural language processing tasks.
Random Access Memory (RAM) plays a pivotal role in the system's ability to handle and process
data effectively. With a minimum requirement of 8 GB RAM, the system can store and access
data rapidly, reducing latency during memory-intensive tasks. The substantial RAM capacity is
particularly beneficial during machine learning model training, where the system must hold and
manipulate large datasets. Adequate RAM ensures that the system can efficiently perform tasks
such as feature extraction, model evaluation, and other memory-demanding operations,
contributing to the overall responsiveness and speed of the resume parsing project.
12
3.2.3 GRAPHICS PROCESSING UNIT
A Graphics Processing Unit (GPU) can significantly enhance the project's performance,
especially during machine learning model training. A GPU, preferably NVIDIA CUDA-enabled,
accelerates parallel processing tasks by offloading computations from the CPU. This is
particularly advantageous for training complex models on substantial datasets, as the GPU can
handle parallel operations simultaneously, reducing the time required for model convergence.
The GPU's parallel processing capabilities make it well-suited for the computationally intensive
nature of natural language processing tasks involved in resume parsing, providing a boost to
overall system efficiency.
3.2.4 STORAGE
Adequate storage space, preferably Solid State Drive (SSD), is essential for storing the various
components of the resume parsing project. SSDs offer faster read and write speeds compared to
traditional Hard Disk Drives (HDDs), enhancing the system's responsiveness. Sufficient storage
is crucial for housing datasets, trained machine learning models, and project-related files. The
faster data access speeds of an SSD contribute to quicker data retrieval during model training and
resume processing, supporting an efficient and streamlined workflow.
13
3.4.1.1 HTML
HTML (Hypertext Markup Language) is the backbone of web development, serving as the
primary language for creating the structure and content of web pages. It consists of a series of
elements or tags that define the various components of a web page. These elements range from
basic ones like headings (<h1> to <h6>), paragraphs (<p>), and links (<a>), to more complex
ones like forms (<form>), tables (<table>), and multimedia content (<img>, <video>, <audio>).
Each HTML element has its own semantic meaning, indicating its purpose or role within the
document. For example, using <header> for introductory content, <nav> for navigation links,
and <footer> for concluding content enhances the accessibility and organization of the web page.
HTML provides a structured and hierarchical approach to organizing content, making it easy for
developers to create well-organized and accessible web pages.
3.4.1.2 CSS
CSS (Cascading Style Sheets) complements HTML by providing the means to control the
presentation and layout of HTML elements on a web page. While HTML defines the structure
and content of the page, CSS dictates how that content should be displayed visually. CSS works
by targeting HTML elements using selectors and applying styles to them through rulesets. These
styles can include properties like colors, fonts, margins, padding, borders, and positioning. CSS
offers various layout techniques, including flexbox and grid layout, to arrange elements in a
desired format. It also supports responsive web design principles, enabling developers to create
layouts that adapt to different screen sizes and devices. By separating content from presentation,
CSS promotes code maintainability and reusability, allowing developers to apply consistent
styles across multiple pages and easily update the appearance of their websites.
3.4.1 PYTHON3.X
Python is a core software requirement for the resume parsing project, serving as the primary
programming language for development. The project specifically requires Python 3.x, the latest
version of the language, to leverage the newest features and improvements. Python's popularity
14
in the field of data science, machine learning, and natural language processing makes it an ideal
choice for developing the system. Its extensive ecosystem of libraries and frameworks, including
spaCy, scikit-learn, and PyMuPDF, provides the necessary tools for implementing advanced
functionalities. Python's readability and versatility contribute to the project's maintainability,
allowing developers to write clean and efficient code. The inclusion of Python ensures that the
resume parsing system benefits from a robust and well-supported programming language,
fostering a conducive environment for innovation and future enhancements.
3.4.2 spaCy
spaCy is a pivotal software requirement for the resume parsing project, representing a state-of-
the-art natural language processing (NLP) library in Python. The latest version of spaCy, with its
advanced accuracy and speed levels, is essential for developing applications that process and
understand large amounts of text efficiently. The library provides pre-trained models for various
languages, making it suitable for diverse language processing tasks. In the context of the resume
parsing project, spaCy's capabilities are harnessed for tasks such as named entity recognition and
information extraction. The active open-source community surrounding spaCy ensures ongoing
support, updates, and a wealth of resources for developers working on language-related projects.
15
3.4.4 PyMuPDF
PyMuPDF serves as an important software requirement for the resume parsing project, providing
capabilities for effective text extraction from PDF files. This Python package facilitates the
handling of resumes stored in the widely used PDF format, adding versatility to the system's data
source compatibility. By incorporating PyMuPDF, the project ensures comprehensive text
extraction from PDF documents, a common format for professional resumes. The seamless
integration of PyMuPDF enhances the system's ability to process diverse resume sources,
contributing to a more inclusive and thorough resume parsing solution.
16
CHAPTER 4
SYSTEM DESIGN
17
CHAPTER 4
SYSTEM DESIGN
The system's architecture commences with user authentication, ensuring secure access to the
platform's features. Once logged in, users can seamlessly upload their resumes, typically in PDF
or Word formats, providing the system with their professional information. Natural Language
Processing (NLP) techniques are then employed to comprehensively analyze the textual content
18
of the resumes, extracting valuable information like technical and soft skills through tasks such
as tokenization and Named Entity Recognition (NER).
Simultaneously, the system leverages a dataset containing job descriptions and corresponding
skill requirements for model training. This dataset, encompassing diverse job roles and their skill
profiles, serves as the foundational data source for the subsequent machine learning (ML)
algorithm. Prior to model training, preprocessing steps are undertaken to ensure data quality and
consistency, including handling missing information and standardizing formats.
With the preprocessed data and the dataset, the ML algorithm is trained to predict suitable job
roles based on the extracted features. Techniques like Convolutional Neural Networks (CNNs)
may be utilized for feature extraction and pattern recognition, enabling the algorithm to learn
from the dataset to accurately match skills extracted from user resumes with job requirements.
During the ML algorithm's operation, relevant features are extracted from the input data,
encompassing both explicit skills from resumes and implicit patterns identified during model
training. This feature extraction process is crucial for identifying the most relevant attributes for
job role matching. Following feature extraction, the ML algorithm builds a predictive model
based on the extracted features and the dataset. This model forms the foundation for generating
personalized job recommendations for users. Leveraging machine learning techniques, the
system offers tailored suggestions that align with users' skill profiles and career aspirations.
Finally, with the model in place, the system generates personalized job recommendations for
users based on their uploaded resumes. By analyzing the user's skills and qualifications and
matching them with job requirements from the dataset, the system provides curated job
opportunities that best suit the user's profile. This comprehensive approach streamlines the job-
seeking process, facilitating better matches between candidates and job opportunities.
19
A class diagram is a fundamental component of Unified Modeling Language (UML) used in
software engineering to visualize and represent the structure and relationships within a system. It
provides a static view of the system, depicting classes, their attributes, methods, and the
associations between them. In a class diagram, each class is represented as a rectangle, detailing
its internal structure with attributes and methods. Relationships between classes are depicted
through lines connecting them, illustrating associations, aggregations, or compositions.
Attributes are listed with their respective data types, while methods showcase the operations that
can be performed on the class. The diagram serves as a blueprint for understanding the
organization and interactions of classes within the system, facilitating communication among
stakeholders and aiding in the design and implementation phases of software development.
The use case diagram offers a comprehensive visualization of the system's functionalities from
the user's perspective, encapsulating key interactions between users and the platform. At the core
of the diagram is the "User," initiating various actions represented as distinct use cases. These
include essential functions like "Upload Resume," enabling users to submit their resumes for
skill extraction, and "View Recommended Jobs," facilitating access to personalized job
suggestions. Additional use cases such as "User Registration/Login," "Update Profile," and
"Provide Feedback" enrich the user experience by offering account management, profile
modification, and feedback submission functionalities, respectively. Together, these use cases
provide a holistic view of the system's capabilities, empowering users to efficiently navigate the
20
platform and access tailored job recommendations aligned with their skill profiles and career
goals.
The activity diagram illustrates the systematic flow of operations within the job recommendation
system, encapsulating user interactions and system processes. Beginning with user authentication
through "User Login," the diagram delineates the progression to "Upload a Resume," initiating
the extraction of skills from the provided resumes. The subsequent step involves the application
of Natural Language Processing (NLP) techniques for skill extraction, followed by "Model
Building" using machine learning algorithms to generate personalized job recommendations.
Upon completion of model training, the system transitions to "Job Recommendation," where it
analyzes user profiles and suggests relevant job roles. This sequential depiction offers a clear
21
understanding of how users engage with the system and how various components collaborate to
deliver tailored job suggestions, streamlining the user experience and facilitating informed career
decisions.
22
4.2.4 DATA FLOW DIAGRAM
The Data Flow Diagram (DFD) offers a detailed depiction of how data traverses through the job
recommendation system, outlining the journey from input to output. At its core, the DFD
encapsulates the flow of data between various system components, portraying entities like the
user, resume data, and the job database. It commences with the user uploading a resume, serving
as the primary input source. Subsequently, the resume data undergoes a series of processing
stages, including the application of Natural Language Processing (NLP) techniques for skill
extraction and the utilization of machine learning algorithms for model building. Throughout
these phases, data undergoes transformations and manipulations to extract meaningful insights.
Once the model is trained and the job recommendation process is initiated, the system generates
personalized job recommendations tailored to the user's skills and qualifications. These
recommendations constitute the output of the system, presented to the user for consideration. The
DFD illustrates this entire data flow, providing a comprehensive overview of how information
moves from its source to its destination within the system, elucidating the intricacies of data
processing and utilization in the recommendation process.
23
CHAPTER 5
SYSTEM IMPLEMENTATION
24
CHAPTER 5
SYSTEM IMPLEMENTATION
The module plays a pivotal role in extracting valuable insights from candidate resumes. It begins
with a user-friendly interface enabling easy resume uploads. The module utilizes sophisticated
Natural Language Processing (NLP) techniques, including tokenization, Named Entity
Recognition (NER), and keyword extraction, to thoroughly analyse the textual content of
resumes. This process ensures accurate identification and extraction of skills, providing a
structured representation of the candidate's qualifications. By seamlessly integrating these NLP
techniques into the parsing process, the module enhances the system's ability to understand the
context and nuances of the skills presented in the resumes.
The module serves as a foundational step to ensure the quality and relevance of the data
used in subsequent stages. It involves cleaning and transforming raw data into a format suitable
for analysis. In the context of our project, data pre-processing encompasses tasks such as
handling missing information, standardizing formats, and removing redundancies. This module
is crucial for maintaining data integrity, improving the efficiency of downstream processes, and
ultimately enhancing the accuracy of job role recommendations.
25
5.2.3 CONVOLUTIONAL NEURAL NETWORK
The CNN model is designed and implemented. The architecture typically includes
convolutional layers to capture spatial patterns in the textual data, followed by pooling layers to
reduce dimensionality and highlight essential features. The output is then flattened and connected
to one or more fully connected layers, allowing the model to learn intricate relationships between
the input features. Training the CNN involves optimizing the model's parameters using the
dataset. This includes feeding batches of labeled job descriptions into the network, adjusting
weights and biases through backpropagation, and minimizing a defined loss function. Training
continues iteratively until the model converges and accurately captures patterns in the data.
The module synthesizes outputs from the Resume Parsing, NLP, Data Preprocessing, and
CNN modules to provide tailored recommendations to users. By integrating the identified skills
from resumes and the predictions made by the CNN algorithm, the system generates a ranked list
of job roles that closely align with the candidate's qualifications. This module ensures that users
receive personalized and relevant job suggestions, optimizing the overall user experience. The
collaborative efforts of these modules contribute to a comprehensive and intelligent job
recommendation system, streamlining the recruitment process and fostering better matches
between candidates and job opportunities.
26
CHAPTER 6
TESTING
27
CHAPTER 6
TESTING
Unit testing involves the design of test cases that validate that the internal program logic is
functioning properly, and that program inputs produce valid outputs. All decision branches and
internal code flow should be validated. It is the testing of individual software units of the
application .it is done after the completion of an individual unit before integration. This is a
structural testing, that relies on knowledge of its construction and is invasive. Unit tests perform
basic tests at component level and test a specific business process, application, and/or system
configuration. Unit tests ensure that each unique path of a business process performs accurately
to the documented specifications and contains clearly defined inputs and expected results.
Integration tests are designed to test integrated software components to determine if they actually
run as one program. Testing is event driven and is more concerned with the basic outcome of
screens or fields. Integration tests demonstrate that although the components were individually
satisfaction, as shown by successfully unit testing, the combination of components is correct and
consistent. Integration testing is specifically aimed at exposing the problems that arise from the
combination of components.
System testing ensures that the entire integrated software system meets requirements. It tests a
configuration to ensure known and predictable results. An example of system testing is the
configuration oriented system integration test. System testing is based on process descriptions
and flows, emphasizing pre-driven process links and integration points.
28
1. Functional testing : Functional tests provide systematic demonstrations that functions tested
are available as specified by the business and technical requirements, system documentation, and
user manuals. Organization and preparation of functional tests is focused on requirements, key
functions, or special test cases. In addition, systematic coverage pertaining to identify Business
process flows; data fields, predefined processes, and successive processes must be considered for
testing. Before functional testing is complete, additional tests are identified and the effective
value of current tests is determined.
2. White Box Testing : White Box Testing is a testing in which in which the software tester has
knowledge of the inner workings, structure and language of the software, or at least its purpose.
It is purpose. It is used to test areas that cannot be reached from a black box level.
3. Black Box Testing : Black Box Testing is testing the software without any knowledge of the
inner workings, structure or language of the module being tested. Black box tests, as most other
kinds of tests, must be written from a definitive source document, such as specification or
requirements document, such as specification or requirements document. It is a testing in which
the software under test is treated, as a black box .you cannot “see” into it. The test provides
inputs and responds to outputs without considering how the software works.
4. Compatibility testing : Compatibility testing verifies that the system operates seamlessly
across different environments and configurations. This involves testing on various operating
systems, validating compatibility with different Python versions and dependencies, and ensuring
adaptability to changes in third-party libraries or frameworks.
5. Reliability testing : Reliability testing aims to confirm the consistent and accurate
performance of the system. It involves executing the system over an extended period to identify
memory leaks or performance degradation, simulating unexpected failures, and validating the
system's ability to consistently deliver reliable outputs.
6. Regression testing : Regression testing ensures that new changes or updates do not adversely
impact existing functionalities. By re-running previous tests after implementing modifications,
29
developers verify that changes do not introduce errors or compromise existing features,
maintaining the system's stability.
7. Scalability testing : Scalability testing, if applicable, evaluates the system's capacity to scale
with increased load or data volume. It involves testing performance with a growing number of
resumes in the dataset and assessing scalability under varying levels of computational resources,
such as CPU and memory. This testing ensures the system's resilience and effectiveness in
handling increased demands.
30
TC003 PDF Text The system is The system Text extraction PASS
Extraction operational with accurately from PDF
PyMuPDF for PDF extracts text resumes is
text extraction. from PDF successful,
A set of resumes in resumes without preserving the
PDF format is loss or distortion. content.
available for testing.
TC004 Job The model is build Jobs based on Jobs based on PASS
Recommendat already so that it can the skills of the the skills of the
ion recommend job user is user is
according to the skills recommended. recommended.
the user has.
TC005 End-to-End The system is The system Resumes are PASS
Processing configured with the processes processed
trained model and resumes successfully, and
PDF text extraction accurately, the system
functionality. extracting produces
A diverse set of relevant accurate
resumes in different information and summaries.
formats is available generating
for processing. summaries.
31
CHAPTER 7
RESULTS & DISCUSSION
32
CHAPTER 7
RESULTS & DISCUSSION
7.1 RESULTS
Certainly! In the results section, the project report provides a detailed analysis of the
performance and effectiveness of the job recommendation system across various dimensions.
This includes both quantitative measurements and qualitative assessments aimed at evaluating
different aspects of the system's functionality.
Quantitative analysis involves the measurement of specific metrics to quantify the system's
performance objectively. For instance, accuracy metrics assess the correctness of job
recommendations made by the system compared to ground truth data or user feedback. Precision
and recall metrics provide insights into the system's ability to generate relevant recommendations
while minimizing false positives and false negatives. The F1 score offers a balanced measure of
the system's precision and recall, providing a single metric to evaluate overall performance.
In addition to quantitative metrics, qualitative analysis delves into the subjective aspects of the
system's performance. This may involve gathering user feedback through surveys, interviews, or
usability testing sessions. Qualitative assessments aim to understand user satisfaction, perception
of recommendation relevance, ease of use, and overall utility of the system in the context of job
searching and career exploration.
Furthermore, the results section may present findings from specific use cases or scenarios to
illustrate the system's performance under different conditions. This could involve analyzing the
effectiveness of the system across various industries, job roles, or skill sets. By examining
performance in diverse contexts, the report provides a nuanced understanding of the system's
capabilities and limitations.
Overall, the results section serves as a comprehensive evaluation of the job recommendation
system, combining quantitative measurements with qualitative insights to validate its
33
effectiveness and inform future improvements. It offers a detailed assessment of how well the
system meets the needs of users and stakeholders, paving the way for informed conclusions and
recommendations in the report.
7.2 DISCUSSION
In the discussion section, the project report critically analyzes the results presented in the
previous section, providing insights, interpretations, and implications derived from the findings.
This section serves as a platform to reflect on the effectiveness of the job recommendation
system, address any limitations or challenges encountered during the project, and propose
recommendations for future improvements or research directions.
One key aspect of the discussion involves comparing the observed results with the initial
objectives and expectations outlined in the project's scope and objectives. This comparison helps
assess the extent to which the system has achieved its intended goals and whether any deviations
or discrepancies exist. Additionally, the discussion explores the reasons behind any observed
discrepancies and considers potential factors that may have influenced the outcomes.
Furthermore, the discussion section delves into the implications of the results for both theoretical
understanding and practical applications. It may explore how the findings contribute to existing
knowledge in the field of job recommendation systems, highlighting any novel insights or
contributions. Moreover, the discussion considers the practical implications of the results for
stakeholders, such as recruiters, job seekers, and system developers, outlining potential benefits,
challenges, and recommendations for implementation or adoption.
The discussion also provides a platform to address any limitations or constraints encountered
during the project. This may include limitations in the data used for training and evaluation,
constraints in computational resources or technology, as well as any methodological limitations
or assumptions made during the project. Acknowledging these limitations helps contextualize the
results and provides guidance for future research or development efforts.
34
Finally, the discussion section may conclude with recommendations for future research,
highlighting areas for further investigation or refinement of the job recommendation system.
These recommendations may include suggestions for improving system performance, addressing
identified limitations, exploring new avenues for research, or extending the application of the
system to different domains or contexts.
Overall, the discussion section synthesizes the project's findings, interprets their significance,
and offers insights and recommendations for advancing the field of job recommendation systems.
It serves as a critical reflection on the project's outcomes and provides guidance for future
endeavors in this area of research and development.
35
CHAPTER 8
CONCLUSION AND FUTURE ENHANCEMENT
36
CHAPTER 8
CONCLUSION AND FUTURE ENHANCEMENT
8.1 CONCLUSION
In conclusion, this project represents a significant leap forward in the optimization of the job
application and recruitment process. By harnessing the power of advanced technologies like
Natural Language Processing (NLP) and Convolutional Neural Network (CNN) algorithms, the
system introduces a paradigm shift in how resumes are parsed and job roles are predicted. The
integration of the Resume Parsing and NLP modules marks a pivotal advancement, ensuring
meticulous extraction of skills from resumes, thereby enabling a nuanced understanding of
candidates' qualifications. This in-depth analysis is further enhanced by the CNN algorithm,
which, when applied to a meticulously curated CSV dataset, adopts a data-driven approach to
accurately forecast job roles based on historical data patterns and correlations. The culmination
of these efforts manifests in the creation of a refined Job Role Recommendation system,
characterized by its ability to furnish users with personalized and precise suggestions tailored to
their skill sets and career aspirations. Beyond merely enhancing technological capabilities within
the recruitment domain, this project aspires to redefine the dynamics of candidate-recruiter
interactions in the job market, promising not only increased efficiency but also a more seamless
and effective matching process between candidates and job opportunities.
For future enhancements, several promising avenues can be explored to further elevate the
capabilities and impact of this project. Firstly, the integration of additional advanced machine
learning models, such as recurrent neural networks (RNNs) or transformer models, could
significantly enhance the system's understanding of context within job descriptions. This could
lead to more nuanced skill extraction and improved job role predictions, especially in scenarios
with complex language or evolving job requirements. Furthermore, incorporating feedback loops
from both recruiters and candidates could contribute to the creation of a more dynamic learning
system. By allowing users to provide feedback on the accuracy and relevance of job role
37
recommendations, the system could continuously refine its algorithms, resulting in a more
adaptive and user-centric platform. Additionally, developing mechanisms for proactive updates
based on evolving trends in the job market and industry demands would ensure that the system
remains responsive to changing dynamics. This could involve regularly updating the dataset with
new job descriptions and skill requirements to ensure that the recommendations provided remain
relevant and up-to-date. Finally, exploring opportunities for integration with emerging
technologies such as blockchain for enhanced security and transparency in job matching
processes could further enhance the overall efficacy and trustworthiness of the platform.
38
ANNEXURE
39
ANNEXURE
APPENDIX I
DATASET:
Name: Job Recommendation DataBase
Link:
https://drive.google.com/drive/folders/1m53TTnpB3uEA_2ZLRXUFgNqnMU
TVZHDX?usp=sharing
SOURCE CODE:
app=Flask(__name__)
app.secret_key = 'jndjsahdjxasudhas-09vzx2223'
database = "new.db"
conn = sqlite3.connect(database)
cursor = conn.cursor()
cursor.execute('''
CREATE TABLE IF NOT EXISTS register (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_name TEXT, user_email TEXT, password TEXT
)
''')
@app.route('/')
def index():
return render_template('index.html')
40
user_email = request.form['user_email']
password = request.form['password']
conn = sqlite3.connect(database)
cursor = conn.cursor()
cursor.execute("INSERT INTO register (user_name, user_email, password) VALUES
(?, ?, ?)",
(user_name, user_email, password))
conn.commit()
flash('Registration successful!', 'success')
return render_template('index.html')
return render_template('index.html')
u=[]
name=[]
email=[]
@app.route('/login', methods=["GET", "POST"])
def login():
if request.method == "POST":
conn = sqlite3.connect(database)
cursor = conn.cursor()
user_email = request.form['user_email']
password = request.form['password']
cursor.execute("SELECT * FROM register WHERE user_email=? AND password=?",
(user_email, password))
user = cursor.fetchone()
if user:
u.append(user_email)
name.append(user[1])
41
email.append(user[2])
return render_template('upload.html',name=user[1],email=user[2])
else:
return "password mismatch"
return render_template('register.html')
nlp = spacy.load('en_core_web_sm')
result = []
with open('linkedin skill',encoding='utf-8') as f:
external_source = list(f)
def extract_skill_1(resume_text):
nlp_text = nlp(resume_text)
tokens = [token.text for token in nlp_text if not token.is_stop]
skills = result
skillset = []
for i in tokens:
if i.lower() in skills:
skillset.append(i)
for i in nlp_text.noun_chunks:
i = i.text.lower().strip()
if i in skills:
skillset.append(i)
42
return [word.capitalize() for word in set([word.lower() for word in skillset])]
STOPWORDS = set(stopwords.words('english'))
EDUCATION = [
'CSE','EEE.', 'ECE', 'IT',"MCA"]
def extract_education(resume_text):
nlp_text = nlp(resume_text)
nlp_text = [sent.text.strip() for sent in nlp_text.sents]
edu = {}
for index, text in enumerate(nlp_text):
for tex in text.split():
tex = re.sub(r'[?|$|.|!|,]', r'', tex)
if tex.upper() in EDUCATION and tex not in STOPWORDS:
edu[tex] = text + nlp_text[index + 1]
education = []
for key in edu.keys():
year = re.search(re.compile(r'(((20|19)(\d{2})))'), edu[key])
if year:
education.append((key, ''.join(year[0])))
else:
education.append(key)
return education
def predict(mark,skill):
43
class_names = {
0: 'Birlasoft', 1: 'Cognizant', 2: 'Hexaware Technologies',
3: 'Infosys', 4: 'KPIT Technologies', 5: 'L&T Infotech',
6: 'Tech Mahindra', 7: 'Wipro Technologies', 8: 'css corp',9:'TCS'
}
train_data=pd.read_csv("Book2.csv", encoding='latin-1')
le_Skill = LabelEncoder()
le_depart= LabelEncoder()
le_Company = LabelEncoder()
train_data['skill'] = le_Skill.fit_transform(train_data['Skills Known'])
train_data['dept'] = le_depart.fit_transform(train_data['department'])
train_data['target'] = le_Company.fit_transform(train_data['Company Placed'])
x = train_data.drop(['Full Name', "12th Mark","10th Mark","dept",'Company Placed', "Skills
Known","Projects Done",'target','department',"Certifications/Internships"], axis = 1)
y = train_data['target']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2, random_state=0)
num_classes = 10
y_train_processed = np.clip(y_train, 0, num_classes - 1)
y_test_processed = np.clip(y_test, 0, num_classes - 1)
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(x_train)
X_test_scaled = scaler.transform(x_test)
X_train_reshaped = X_train_scaled.reshape((X_train_scaled.shape[0],
X_train_scaled.shape[1], 1))
X_test_reshaped = X_test_scaled.reshape((X_test_scaled.shape[0], X_test_scaled.shape[1], 1))
cnn_model = Sequential()
cnn_model.add(Conv1D(filters=64, kernel_size=5, activation='relu',
input_shape=(X_train_reshaped.shape[1], X_train_reshaped.shape[2]), padding='same'))
44
cnn_model.add(BatchNormalization())
cnn_model.add(MaxPooling1D(pool_size=1))
cnn_model.add(Dropout(0.5))
cnn_model.add(Conv1D(filters=128, kernel_size=5, activation='relu', padding='same'))
cnn_model.add(BatchNormalization())
cnn_model.add(MaxPooling1D(pool_size=2))
cnn_model.add(Dropout(0.5))
cnn_model.add(Flatten())
cnn_model.add(Dense(256, activation='relu'))
cnn_model.add(Dropout(0.5))
cnn_model.add(Dense(num_classes, activation='softmax'))
cnn_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
history_cnn = cnn_model.fit(X_train_reshaped, y_train_processed, epochs=5, batch_size=64,
validation_data=(X_test_reshaped, y_test_processed),verbose=0)
predicted_probs = cnn_model.predict([[mark,0,skill]])
top_indices = np.argsort(predicted_probs[0])[::-1][:3]
top_companies = [(class_names[i]) for i in top_indices]
return(top_companies)
45
ANNEXURE
APPENDIX II
SAMPLE OUTPUT:
46
REFERENCES
47
REFERENCES
[1]R. Kwieciński, G. Melniczak and T. Górecki, "Comparison of Real-Time and Batch Job
Recommendations," in IEEE Access, vol. 11, pp. 20553-20559, 2023, doi:
10.1109/ACCESS.2023.3249356.
[3]I. Khaouja, I. Kassou and M. Ghogho, "A Survey on Skill Identification From Online
Job Ads," in IEEE Access, vol. 9, pp. 118134-118153, 2021, doi:
10.1109/ACCESS.2021.3106120.
[4]T. Ha, M. Lee, B. Yun and B. -Y. Coh, "Job Forecasting Based on the Patent Information:
A Word Embedding-Based Approach," in IEEE Access, vol. 10, pp. 7223-7233, 2022, doi:
10.1109/ACCESS.2022.3141910.
[5]G. Van Dongen and D. Van Den Poel, "Influencing Factors in the Scalability of
Distributed Stream Processing Jobs," in IEEE Access, vol. 9, pp. 109413-109431, 2021, doi:
10.1109/ACCESS.2021.3102645.
[6] T. Danişan, E. Özcan, and T. Eren, ‘‘Personnel selection with multi-criteria decision
making methods in the ready-to-wear sector,’’ Tehnički vjesnik, vol. 29, no. 4, pp. 1339–
1347, 2022.
48
[9] A. Malik, P. Thevisuthan, and T. De Sliva, ‘‘Artificial intelligence, employee engagement,
experience, and HRM,’’ in Strategic Human Resource Management and Employment
Relations: An International Perspective. Cham, Switzerland: Springer, 2022, pp. 171–184.
49
Job Recommendation Using AI
GUIDING CAREERS THROUGH SKILL-BASED INSIGHTS