100% found this document useful (1 vote)

593 views

A Comparative Study On Fake Job Post Prediction Using Different Data Mining Techniques

This paper proposes using different data mining techniques and machine learning classifiers like KNN, decision trees, support vector machines, naive Bayes, random forest, multilayer perceptron, and deep neural networks to predict if a job post is real or fraudulent. The researchers experimented on the Employment Scam Aegean Dataset containing 18,000 samples and found that a deep neural network achieved 98% accuracy. Existing methods are discussed that used classifiers like random forest and SVM on this dataset with accuracies up to 97.4%. The proposed system uses this same dataset but converts attributes to categorical values before classifying with machine learning to detect fake job posts.

Uploaded by

bits computers

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

593 views

A Comparative Study On Fake Job Post Prediction Using Different Data Mining Techniques

Uploaded by

bits computers

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

A Comparative Study on Fake Job Post Prediction

Using Different Data mining Techniques

ABSTRACT

In recent years, due to advancement in modern technology and social

communication, advertising new job posts has become very common issue in the
present world. So, fake job posting prediction task is going to be a great concern for
all. Like many other classification tasks, fake job posing prediction leaves a lot of
challenges to face. This paper proposed to use different data mining techniques and
classification algorithm like KNN, decision tree, support vector machine, naive
bayes classifier, random forest classifier, multilayer perceptron and deep neural
network to predict a job post if it is real or fraudulent. We have experimented on
Employment Scam Aegean Dataset (EMSCAD) containing 18000 samples. Deep
neural network as a classifier, performs great for this classification task. We have
used three dense layers for this deep neural network classifier. The trained classifier
shows approximately 98% classification accuracy (DNN) to predict a fraudulent job
post.

EXISTING SYSTEM

Many researches occurred to predict if a job post is real or fake. A good number of
research works are to check online fraud job advertiser. Vidros [1] et al. identified
job scammers as fake online job advertiser. They found statistics about many real
and renowned companies and enterprises who produced fake job advertisements or
vacancy posts with ill-motive. They experimented on EMSCAD dataset using
several classification algorithms like naive bayes classifier, random forest classifier,
Zero R, One R etc. Random Forest Classifier showed the best performance on the
dataset with 89.5% classification accuracy. They found logistic regression
performing very poor on the dataset. One R classifier performed well when they
balanced the dataset and experimented on that. They tried in their work to find out
the problems in ORF model (Online Recruitment Fraud) and to solve those problems
using various dominant classifiers.

Alghamdi [2] et al. proposed a model to detect fraud exposure in an online

recruitment system. They experimented on EMSCAD dataset using machine learning
algorithm. They worked on this dataset in three steps- data pre-processing, feature
selection and fraud detection using classifier. In the preprocessing step, they
removed noise and html tags from the data so that the general text pattern remained
preserved. They applied feature selection technique to reduce the number of
attributes effectively and efficiently. Support Vector Machine was used for feature
selection and ensemble classifier using random forest was used to detect fake job
posts from the test data. Random forest classifier seemed a tree structured classifier
which worked as ensemble classifier with the help of majority voting technique. This
classifier showed 97.4% classification accuracy to detect fake job posts.

Huynh [3] et al. proposed to use different deep neural network models like Text
CNN, Bi-GRU-LSTM CNN and Bi- GRU CNN which are pre-trained with text
dataset. They worked on classifying IT job dataset. They trained IT job dataset on
TextCNN model consisting of convolution layer, pooling layer and fully connected
layer. This model trained data through convolution and pooling layers. Then the
trained weights were flattened and passed to the fully connected layer. This model
used softmax function for classification technique. They also used ensemble
classifier (Bi-GRU CNN, Bi-GRULSTM CNN) using majority voting technique to
increase classification accuracy. They found 66% classification accuracy using
TextCNN and 70% accuracy for Bi-GRU- LSTM CNN individually. This
classification task performed best with ensemble classifier having an accuracy of
72.4%.

Zhang [4] et al. proposed an automatic fake detector model to distinguish between
true and fake news (including articles, creators, subjects) using text processing. They
had used a custom dataset of news or articles posted by PolitiFact website twitter
account. This dataset was used to train the proposed GDU diffusive unit model.
Receiving input from multiple sources simultaneously, this trained model performed
well as an automatic fake detector model.

Disadvantages

1) The system is implemented by Conventional Machine Learning.

2) The system doesn’t implement for analyzing large data sets.

PROPOSED SYSTEM

The system has used EMSCAD to detect fake job post. This dataset contains 18000
samples and each row of the data has 18 attributes including the class label. The
attributes are job_id, title, location, department, salary_range, company_profile,
description, requirements, benefits, telecommunication, has_company_logo,
has_questions, employment_type, required_experience, required_education,
industry, function, fraudulent (class label). Among these 18 attribute, we have used
only 7 attributes which are converted into categorical attribute. T elecommuting,
has_company_logo, has_questions, employment_type, required experience,
required_education and fraudulent are changed into categorical value from text
value. For example, “employment_type” values are replaced like this- 0 for “none”,
1 for ‘full-time”, 2 for “part-time” and 3 for “others”, 4 for “contract’ and 5 for
“temporary”. The main goal to convert these attributes into categorical form is to
classify fraudulent job advertisements without doing any text processing and natural
language processing. In this work, we have used only those categorical attributes.
Advantages
1) The proposed has been implemented EMSCAD technique which is very accurate
and fast.
2) The system is very effective due to accurate detection of Fake job posts which
creates inconsistency for the job seeker to find their preferable jobs causing a huge
waste of their time.

SYSTEM REQUIREMENTS

➢ H/W System Configuration:-

➢ Processor - Pentium –IV

➢ RAM - 4 GB (min)
➢ Hard Disk - 20 GB
➢ Key Board - Standard Windows Keyboard
➢ Mouse - Two or Three Button Mouse
➢ Monitor - SVGA

SOFTWARE REQUIREMENTS:

 Operating system : Windows 7 Ultimate.

 Coding Language : Python.

 Front-End : Python.

 Back-End : Django-ORM

 Designing : Html, css, javascript.

 Data Base : MySQL (WAMP Server).

APSSDC - Edunet Foundation - IBM SkillsBuild Virtual Internship Proposal
No ratings yet
APSSDC - Edunet Foundation - IBM SkillsBuild Virtual Internship Proposal
2 pages
Software Myths
No ratings yet
Software Myths
2 pages
Crowd Sourcing Analytics
100% (1)
Crowd Sourcing Analytics
27 pages
Sta - Final Lab Manual
No ratings yet
Sta - Final Lab Manual
51 pages
CS UNIT 6.2 Case Studies
No ratings yet
CS UNIT 6.2 Case Studies
14 pages
Unit 1-5 CS8079 HCI QBank Panimalar College PDF
No ratings yet
Unit 1-5 CS8079 HCI QBank Panimalar College PDF
49 pages
Lamp Technology New
100% (1)
Lamp Technology New
25 pages
7th Sem 1
No ratings yet
7th Sem 1
32 pages
Currency Detector App For Visually Impaired
No ratings yet
Currency Detector App For Visually Impaired
5 pages
CCS361 RPA Unit 1 Notes
100% (1)
CCS361 RPA Unit 1 Notes
13 pages
Playstore App Review Analysis: Capstone Project
No ratings yet
Playstore App Review Analysis: Capstone Project
11 pages
CCS341 Data Warehousing Notes Unit I
No ratings yet
CCS341 Data Warehousing Notes Unit I
30 pages
Lamp Technology: Presented By: D.T.N.Aparna
No ratings yet
Lamp Technology: Presented By: D.T.N.Aparna
18 pages
Prolog Lab File
0% (2)
Prolog Lab File
20 pages
Unit 4 - Cloud Computing - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 4 - Cloud Computing - WWW - Rgpvnotes.in PDF
13 pages
ADA Notes-NEP 2023-1
No ratings yet
ADA Notes-NEP 2023-1
17 pages
A Driving Decision Strategy (DDS) Based12
No ratings yet
A Driving Decision Strategy (DDS) Based12
8 pages
Unit 1 - Machine Learning
No ratings yet
Unit 1 - Machine Learning
21 pages
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
No ratings yet
Campus Placement Analyzer: Using Supervised Machine Learning Algorithms
5 pages
Ppt for Final Year Project Overview (1)
No ratings yet
Ppt for Final Year Project Overview (1)
11 pages
Updated 5th and 6th Sem 2021 Scheme and Syllabus
No ratings yet
Updated 5th and 6th Sem 2021 Scheme and Syllabus
71 pages
Mini Project: Helmet Detection and License Plate Number Recognition
100% (1)
Mini Project: Helmet Detection and License Plate Number Recognition
14 pages
Machine Learning UNIT-3
100% (1)
Machine Learning UNIT-3
16 pages
Honors CSE CSv2
No ratings yet
Honors CSE CSv2
8 pages
CCW332 DIGITAL MARKETING QUESTIONBANK
No ratings yet
CCW332 DIGITAL MARKETING QUESTIONBANK
18 pages
Jntuh Iot Le Cture Notes
No ratings yet
Jntuh Iot Le Cture Notes
93 pages
MG8591 Principles of Management
No ratings yet
MG8591 Principles of Management
31 pages
PROJECT REPORT Automatic Question Paper Generating System
No ratings yet
PROJECT REPORT Automatic Question Paper Generating System
46 pages
IJPREMS Template January 2023
No ratings yet
IJPREMS Template January 2023
2 pages
spm-unit-4
No ratings yet
spm-unit-4
7 pages
ST Lab Manual
No ratings yet
ST Lab Manual
52 pages
Cloudservices Notes
No ratings yet
Cloudservices Notes
31 pages
AI Anna University Important Questions
No ratings yet
AI Anna University Important Questions
2 pages
Write A Mobile Application That Makes Use of RSS Feed
No ratings yet
Write A Mobile Application That Makes Use of RSS Feed
8 pages
Unit No 4 Slides Full
No ratings yet
Unit No 4 Slides Full
133 pages
Vtu 8TH Sem Cse Network Management Systems Notes 10CS834
100% (4)
Vtu 8TH Sem Cse Network Management Systems Notes 10CS834
57 pages
Sonata Software Sample Aptitude Placement Paper Level1
No ratings yet
Sonata Software Sample Aptitude Placement Paper Level1
7 pages
Cs-3491-Ai-Ml-Lab RECORD
No ratings yet
Cs-3491-Ai-Ml-Lab RECORD
59 pages
UID Module 2 PPT With Assignment Answers
100% (1)
UID Module 2 PPT With Assignment Answers
39 pages
MG8591 Principles of Management L T P C 3 0 0 3
No ratings yet
MG8591 Principles of Management L T P C 3 0 0 3
1 page
Pre-Processing: System Architecture
100% (2)
Pre-Processing: System Architecture
5 pages
System Software and Compiler Design
No ratings yet
System Software and Compiler Design
34 pages
How Does A Single Bit Error Differs From Burst Error.
No ratings yet
How Does A Single Bit Error Differs From Burst Error.
4 pages
Unit Iv: Syllabus: Knowledge Representation: Introduction, Approaches To Knowledge Representation, Knowledge
No ratings yet
Unit Iv: Syllabus: Knowledge Representation: Introduction, Approaches To Knowledge Representation, Knowledge
14 pages
MC9285 Erp
0% (1)
MC9285 Erp
1 page
L-2.9 Hmac Cmac
No ratings yet
L-2.9 Hmac Cmac
14 pages
Weskill - Business Development Intern
No ratings yet
Weskill - Business Development Intern
2 pages
Iot U-4
No ratings yet
Iot U-4
14 pages
AI-Based Picture Translation App: 1) Background/ Problem Statement
No ratings yet
AI-Based Picture Translation App: 1) Background/ Problem Statement
7 pages
2.centralised Mutual Exclusion
No ratings yet
2.centralised Mutual Exclusion
6 pages
Jawaharlal Nehru Engineering College: Digital Image Processing
50% (2)
Jawaharlal Nehru Engineering College: Digital Image Processing
26 pages
Ai in Electronics
100% (1)
Ai in Electronics
24 pages
Air Canvas Srs
No ratings yet
Air Canvas Srs
7 pages
Unit 1 - CD Cs3501
No ratings yet
Unit 1 - CD Cs3501
24 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
4 pages
Introduction to Linux: Installation and Programming
From Everand
Introduction to Linux: Installation and Programming
N. B. Venkateswarlu
No ratings yet
A_Machine_Learning-Based_Classification_and_Prediction_Technique_for_DDoS_Attacks
No ratings yet
A_Machine_Learning-Based_Classification_and_Prediction_Technique_for_DDoS_Attacks
7 pages
Crime Type and Occurrence Prediction Using Machine Learning
No ratings yet
Crime Type and Occurrence Prediction Using Machine Learning
4 pages
Presentation Credit Card
No ratings yet
Presentation Credit Card
25 pages
Internet 2016 1 40 40038
No ratings yet
Internet 2016 1 40 40038
6 pages
A Road Accident Prediction Model Using Data Mining Techniques
No ratings yet
A Road Accident Prediction Model Using Data Mining Techniques
5 pages
A Spam Transformer Model For SMS Spam Detection
No ratings yet
A Spam Transformer Model For SMS Spam Detection
5 pages
A Holistic Framework For Crime Prevention, Response, and Analysis With Emphasis On Women Safety Using Technology
No ratings yet
A Holistic Framework For Crime Prevention, Response, and Analysis With Emphasis On Women Safety Using Technology
6 pages
A Student Attendance Management Method Based On Crowdsensing in Classroom Environment
No ratings yet
A Student Attendance Management Method Based On Crowdsensing in Classroom Environment
6 pages
A Study On A Car Insurance Purchase Prediction Using Two-Class Logistic Regression and Two-Class Boosted Decision Tree
No ratings yet
A Study On A Car Insurance Purchase Prediction Using Two-Class Logistic Regression and Two-Class Boosted Decision Tree
6 pages
A Systematic Review of Predicting Elections Based On Social Media Data
No ratings yet
A Systematic Review of Predicting Elections Based On Social Media Data
6 pages
Academic Performance Prediction Based On Multisource, Multi Feature Behavioral Data
No ratings yet
Academic Performance Prediction Based On Multisource, Multi Feature Behavioral Data
6 pages
An Efficient Privacy-Preserving Credit Score System Based On Non Interactive Zero-Knowledge Proof
0% (1)
An Efficient Privacy-Preserving Credit Score System Based On Non Interactive Zero-Knowledge Proof
5 pages
Jeanine R. Wagner
No ratings yet
Jeanine R. Wagner
1 page
Floo& Towe&: Government Pakistan Directorate General Intelligence Tnvestigation (IR) Emigration Mauve Area, G-8/I
No ratings yet
Floo& Towe&: Government Pakistan Directorate General Intelligence Tnvestigation (IR) Emigration Mauve Area, G-8/I
2 pages
Video Editing
No ratings yet
Video Editing
13 pages
74x181 4-Bit ALU: Floyd, Digital Fundamentals, 10th Ed
No ratings yet
74x181 4-Bit ALU: Floyd, Digital Fundamentals, 10th Ed
12 pages
Chapter 1 - I: Ntroduction
No ratings yet
Chapter 1 - I: Ntroduction
46 pages
Peer-To-Peer Programming With WCF and .NET Framework 3
No ratings yet
Peer-To-Peer Programming With WCF and .NET Framework 3
32 pages
BRKDCT 3144
No ratings yet
BRKDCT 3144
94 pages
Bank Management System
100% (2)
Bank Management System
19 pages
3.5 - Communication of Design Ideas
No ratings yet
3.5 - Communication of Design Ideas
24 pages
Worksheets Free Time Activities PDF
100% (2)
Worksheets Free Time Activities PDF
2 pages
Energy Demand Forecasting for Electric Vehicles Using Blockchain-Based Federated Learning
No ratings yet
Energy Demand Forecasting for Electric Vehicles Using Blockchain-Based Federated Learning
8 pages
7.3 - Common If Statement Problems - Learn C++
No ratings yet
7.3 - Common If Statement Problems - Learn C++
4 pages
SmartFink Manual 1.2
No ratings yet
SmartFink Manual 1.2
16 pages
ABAP Stack Checks:: SM50: (Process Overview)
No ratings yet
ABAP Stack Checks:: SM50: (Process Overview)
48 pages
(PDF) Library Management Sytem
No ratings yet
(PDF) Library Management Sytem
36 pages
Chapter 10 Thinking in Objects
No ratings yet
Chapter 10 Thinking in Objects
67 pages
Super Asia Presentation
No ratings yet
Super Asia Presentation
23 pages
Activity, Class, Sequence, Collaboration, Deployment Diagram
No ratings yet
Activity, Class, Sequence, Collaboration, Deployment Diagram
5 pages
Session 6-9 WBS
No ratings yet
Session 6-9 WBS
112 pages
Final Project - Ticket Management System Upgraded
No ratings yet
Final Project - Ticket Management System Upgraded
35 pages
Optisystem 17.0 Release Notes: Important - Please Read Me
No ratings yet
Optisystem 17.0 Release Notes: Important - Please Read Me
6 pages
Level 8 Theory PDF
No ratings yet
Level 8 Theory PDF
1 page
CIA-1 - CS3353 - CPDS Question Paper PC
No ratings yet
CIA-1 - CS3353 - CPDS Question Paper PC
6 pages
ComAp InteliPro Comprehensive Guide 1 5
No ratings yet
ComAp InteliPro Comprehensive Guide 1 5
110 pages
Simple Pass2 Assembler
63% (8)
Simple Pass2 Assembler
28 pages
RetrofiyNightly.lua
No ratings yet
RetrofiyNightly.lua
19 pages
SPRAC M8
No ratings yet
SPRAC M8
57 pages
Introduction To REF CURSOR
No ratings yet
Introduction To REF CURSOR
10 pages
ASAP Methodology
No ratings yet
ASAP Methodology
14 pages
GKD Colours Surfaces Brochure Web en 01-17
No ratings yet
GKD Colours Surfaces Brochure Web en 01-17
14 pages

A Comparative Study On Fake Job Post Prediction Using Different Data Mining Techniques

Uploaded by

A Comparative Study On Fake Job Post Prediction Using Different Data Mining Techniques

Uploaded by

A Comparative Study on Fake Job Post Prediction

Using Different Data mining Techniques

In recent years, due to advancement in modern technology and social

Alghamdi [2] et al. proposed a model to detect fraud exposure in an online

1) The system is implemented by Conventional Machine Learning.

➢ H/W System Configuration:-

➢ Processor - Pentium –IV

 Operating system : Windows 7 Ultimate.

 Coding Language : Python.

 Designing : Html, css, javascript.

 Data Base : MySQL (WAMP Server).

You might also like