0% found this document useful (0 votes)

65 views

A Datamining Model For Detection of Fraudulent Behaviour in Water

This document describes a project that aims to develop a data mining model to detect fraudulent behavior in water usage. The project was completed by four students for their Bachelor of Technology degree in Computer Science and Engineering at Sai Spurthi Institute of Technology under the guidance of their professor. The project involved building models using support vector machine (SVM) and K-nearest neighbor (KNN) algorithms on historical customer billing data to identify suspicious water usage patterns and predict customers that should be inspected.

Uploaded by

Saikiran Mamidi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

A Datamining Model For Detection of Fraudulent Behaviour in Water

Uploaded by

Saikiran Mamidi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 36

A

MAIN PROJECT ON

A DATAMINING MODEL FOR DETECTION OF FRAUDULENT

BEHAVIOUR IN WATER
SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD OF THE
DEGREE OF

BACHELOR OF TECHNOLOGY

COMPUTER SCIENCE AND ENGINEERING

JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY, HYDERABAD

SUBMITTED BY

S.SWETHA SRAVANTHI (16C51A0546)

K.MADHURI (16C51A0528)

A.SRUTHI (16C55A0504)

D.BHARATH (16C51A0514)

UNDER THE ESTEEMED GUIDENCE OF

Mr .V.V.SIVA PRASAD

Associate Professor

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SAI SPURTHI INSTITUTE OF TECHNOLOGY

(Approved by AICTE, Affiliated to JNTU, Hyderabad, Certified by ISO 9001:2008)

(ACCREDITED BY NAAC-‘B’ Grade)

B.GANGARAM-507303, JNTU-HYDERABAD, TS, 2019-2020

SAI SPURTHI INSTITUTE OF TECHNOLOGY
(Approved by AICTE, Affiliated to JNTU, Hyderabad, Certified by ISO 9001:2008)
(ACCREDITED BY NAAC)
B.GANGARAM-507303, JNTU-HYDERABAD, TS, 2019-2020

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

CERTIFICATE
This is to certify that the main project entitled “A DATAMINING MODEL FOR
DETECTION OF FRAUDULENT BEHAVIOUR IN WATER” is a bonafied work done
by S.SWETHA SRAVANTHI (16C51A0546), K.MADHURI (16C51A0524), A.SRUTHI
(16C55A0504),D.BHARATH (16C51A0514) under the guidance and supervision of Mr
V.V.SIVA PRASAD Assoc. Professor in CSE Department at SAI SPURTHI INSTITUTE
OF TECHNOLOGY in the partial fulfillment of Bachelor of Technology in Computer
Science and Engineering from JNTU-Hyderabad during the year 2019-2020.

Project supervisor Head of the Department

Mr V.V.SIVA PRASAD Mr N.VENKATESWARA RAO

Associate Professor Associate Professor

Dr.CH. VIJAYA KUMAR EXTERNAL EXAMINER

PRINCIPAL
ACKNOWLEDGEMENT

We express our sincere thanks to our supervisor Mr V.V.SIVA PRASAD,

Associative Professor in CSE Department for giving us moral support, kind attention
and valuable guidance to us throughout this project work.

It is our privilege to thank Mr N. VENKATESWARA RAO, Head of the CSE

Department for his encouragement during the progress of this project work.

It is our privilege to thank all Project Review Committee members for allowing
us to do this project and providing us all the facilities to do our project.

We derive great pleasure in expressing our sincere gratitude to our principal

Dr.CH. VIJAYA KUMAR for his timely suggestions, which helped us to complete
this work successfully.

We thank to both teaching and non-teaching staff members of CSE department

for their kind cooperation and all sorts of help to bring this project work successfully.

In all sincerity,

K.KRISHNASRI (16C51A0523)

M.SPOORTHI (16C51A0528)

S.VAMSI KRISHNA (17C55A0507)

D.MANOJ KUMAR (16C51A0515)

ABSTRACT

Data mining is a powerful tool widely used by organizations to enhance their

businesses and gain a competitive advantage over their competitors. The data
mining process helps in extracting and analysing various data patterns,
information or trends from large databases. Various data mining techniques are
available to conduct the data mining process. Data mining techniques are used
in a variety of applications, one of which is the detection and prevention of
different types of frauds. Although there is existing research on data mining and
various data mining techniques that can be used to detect and identify different
types of frauds, there is little research that synthesizes various facets of fraud
that uses the data mining techniques. This research explores the use of two
classification techniques (SVM and KNN) to detect suspicious fraud water
customers. The SVM based approach uses customer load profile attributes to
expose abnormal behaviour that is known to be correlated with non-technical
loss activities. The data has been collected from the historical data of the
company billing system. To deploy the model, a decision tool has been built
using the generated model. The system will help the company to predict
suspicious water customers to be inspected on site.

CONTENTS

S.NO Topic Name Page No

1. Introduction 01
1.1. Data science 01
1.2. Machine Learning 02
1.3. Project Inroduction 05
2. Installations 06
2.1. Anaconda 06
2.2. Integrated Development Enviornment (IDE) 11
3. Python Libraries 13
3.1. Numpy 13
3.2.Pandas 13
3.3. Matplot lib 13
3.4.Scikit-learn 13
4. System Specifications 14
4.1. Hardware Requirements 14
4.2. Software Requirements 14
5. Tools & Technologies 15
5.1. Spyder 15
5.2. Python 15
5.3.Linear Regression 16
5.4.Support Vector Machine(SVM) 17
5.5.K-Nearest Neighbour(KNN) 18
6. Data Flow Diagram 21
7. Sample Code 22
7.1.Sample Code 23
7.2. Using SVM 23
7.3. Using KNN 23
8. Screenshots 25
8.1.Code 25
8.2.Datasets 26
8.3.Outputs 27
9. Conclusion 28
10. References 29
SCREENSHOTS

S.NO Figure Name Page No

1. 8.1. Code 25
2. 8.2.Datasets 26
3. 8.3.Outputs 27
A DATAMINIG MODEL FOR DETECTION OF
FRADULENT BEHAVIOUR IN WATER INTRODUCTION

1. INTRODUCTION
1.1. Data science
Data science is the process of deriving knowledge and
insights from a huge and diverse set of data through
organizing, processing and analyzing the data. It involves
many different disciplines like mathematical and
statistical modeling, extracting data from it source and
applying data visualization techniques. Often it also
involves handling big data technologies to gather both
structured and unstructured data.

Below we will see some example scenarios where Data science is

used.

 Recommendation system: Create models

predicting the shopper’s needs and show the
products the shopper is most likely to buy.
 Financial Risk management: The financial risk
involving loans and credits are better analysed by
using the customers past spend habits, past
defaults, other financial commitments. The
outcome is minimizing loss for the financial
organization by avoiding bad debt.
 Improvement in Health Care services: The
health care industry deals with a variety of data
which can be classified into technical data,
financial data, patient information, drug
information and legal rules. All this data need to
be analysed to produce insights that will save cost
both for the health care provider and care
receiver.
 Computer Vision: The advancement in
recognizing an image by a computer involves
processing large sets of image data from multiple
SAI SPURTHI INSTITUTE OF TECHNOLOGY Page 1
objects of same category. For example, face
A DATAMINIG MODEL FOR DETECTION OF
FRADULENT BEHAVIOUR IN WATER INTRODUCTION

recog scientific computing. More over it is being continuously

nition upgraded in form of new addition to its plethora of
. libraries aimed at different programming requirements.

Python in Data
Science:

The
programming
requirements
of data
science
demand a
very versatile
yet flexible
language
which is
simple to
write the
code but can
handle highly
complex
mathematical
processing.
Python is
most suited
for such
requirements
as it has
already
established
itself both as
a language
for general
SAI SPURTHI INSTITUTE OF TECHNOLOGY Page 2
computing as
well as
A DATAMINIG MODEL FOR DETECTION OF
FRADULENT BEHAVIOUR IN WATER INTRODUCTION

1.2. Machine learning

Machine learning is a discipline that deals with programming the systems so as to make them
automatically learn and improve with experience. Here, learning implies recognizing and
understanding the input data and taking informed decisions based on the supplied data. It is
very difficult to consider all the decisions based on all possible inputs.

To solve this problem, algorithms are developed that build knowledge from a specific data
and past experience by applying the principles of statistical science, probability, logic,
mathematical optimization, reinforcement learning, and control theory.

For example, machine learning programs can scan and process huge databases detecting
patterns that are beyond the scope of human perception.

Applications of Machine Learning

The developed machine learning algorithms are used in various applications such as

 Vision processing

 Language processing

 Forecasting things like stock market trends, weather

 Pattern recognition

 Games

 Data mining

 Expert systems

 Robotics

Types of machine learning algorithms

 Supervised Learning

 Unsupervised Learning

 Reinforcement Learning
A DATAMINIG MODEL FOR DETECTION OF
FRADULENT BEHAVIOUR IN WATER INTRODUCTION

Supervised Learning:

Supervised learning involves building a machine learning model that is based on labeled
samples. Learning data comes with description, labels, targets or desired outputs and the
objective is to find a general rule that maps inputs to outputs. This kind of learning data is
called labeled data.

For example, if we build a system to estimate the price of a plot of land or a house based on
various features, such as size, location, and so on, we first need to create a database and label
it. We need to teach the algorithm what features correspond to what prices. Based on this
data, the algorithm will learn how to calculate the price of real estate using the values of the
input features.

Supervised learning can be further classified into two types -

Regression and Classification.

Regression trains on and predicts a continuous-valued response, for example predicting real
estate prices.

Regression algorithms:

 Linear regression

 Logistic regression

 Polynomial Regression

 Stepwise Regression etc.

Classification attempts to find the appropriate class label, such as analyzing

positive/negative sentiment, male and female persons, benign and malignant tumors, secure
and unsecure loans etc.

Classification algorithms:

 Decision tree algorithms

 K Nearest Neighbor algorithms

 Support Vector Machine algorithms

 Naïve Bayes algorithms etc..

A DATAMINIG MODEL FOR DETECTION OF
FRADULENT BEHAVIOUR IN WATER INTRODUCTION

Unsupervised learning:
Unsupervised learning has no labelled data here. When learning data contains only some
indications without any description or labels, it is up to the coder or to the algorithm to find
the structure of the underlying data, to discover hidden patterns, or to determine how to
describe the data. This kind of learning data is called unlabeled data.

Unsupervised learning algorithms are extremely powerful tools for analyzing data and for
identifying patterns and trends. They are most commonly used for clustering similar input
into logical groups. Unsupervised learning algorithms include

Clustering algorithms

 Kmeans

 Random Forests

 Hierarchical clustering etc..

Dimensionality reduction algorithms

 PCA (Principle Component Analysis).

Reinforcement Learning

Here learning data gives feedback so that the system adjusts to dynamic conditions in order
to achieve a certain objective. The system evaluates its performance based on the feedback
responses and reacts accordingly. The best known instances include self-driving cars and
chess master algorithm Alpha Go.
A DATAMINIG MODEL FOR DETECTION OF
FRADULENT BEHAVIOUR IN WATER INTRODUCTION

1.3. Project Introduction

Water is an essential element for the uses of households, industry, and agriculture.
Fraudulent behavior in drinking water consumption is a significant problem facing water
supplying companies and agencies. This behavior results in a massive loss of income and
forms the highest percentage of non technical loss. Finding efficient measurements for
detecting fraudulent activities has been an active research area in recent years.

For this Prediction intelligent data mining techniques can help water supplying
companies to detect these fraudulent activities to reduce such losses. This research explores the
use of two classification techniques (SVM and KNN) to detect suspicious fraud water
customers. The SVM based approach uses customer load profile attributes to expose abnormal
behavior that is known to be correlated with non technical loss activities. The data has been
collected from the historical data. The system will help the company to predict suspicious
water customers to be inspected on site.

To do data science project we must know about some python libraries like:

 NumPy

 Pandas

 Scikitlearn

 Matplotl
ib And IDE’s
like

 Jupyter

 Spyder
A DATAMINING MODEL FOR DETECTION OF
FRAUDULENT BEHAVIOUR IN WATER INSTALLATIONS

2. INSTALLATIONS
2.1 ANACONDA:
Anaconda is a package manager, an environment manager,
and Python distribution that contain a collection of many open source packages. This is
advantageous as when you are working on a data science project, you will find that you need
many different packages (NumPy, Scikit-learn, SciPy, pandas to name a few), which an
installation of Anaconda comes preinstalled with.

Download and Install Anaconda:

1. Go to the Anaconda Website and choose a Python 3.x graphical installer (A) or a Python
2.x graphical installer (B). If you aren't sure which Python version you want to install, choose
Python 3. Do not choose both.

2. Locate your download and double click it.

Then download starts….
When the screen below appears, click on Next.
3. Read the license agreement and click on I Agree
4. Click on Next.

5. Note your installation location and then click Next.

6. This is an important part of the installation process. The recommended approach is to not
check the box to add Anaconda to your path. This means you will have to use Anaconda
Navigator or the Anaconda Command Prompt. When you wish to use Anaconda. If you want
to be able to use Anaconda in your command prompt please use the alternative approach and
check the box.

7. Click on Next.
8. Click on Next

9. Click on Finish.

Anaconda provides various IDE’s like Jupyter, Spyder, etc. You can launch them and use
them.

SAI SPURTHI INSTITUTE OF TECHNOLOGY Page 10

2.2. Integrated Development Environment (IDE):

Jupyter:

 The Jupyter Notebook is an incredibly powerful tool for interactively developing and
presenting data science projects.
 A notebook integrates code and its output into a single document that combines
visualisations, narrative text, mathematical equations, and other rich media.
 It is possible to use many different programming languages within Jupyter Notebooks,
this article will focus on Python as it is the most common use case.
Spyder:
 Spyder was developed specifically for data science
 Spyder is an open source cross-platform IDE for data science.
 Spyder does the job of integrating the essentials libraries for data science like
IPython, SciPy, Matplotlib and NumPy.
 Spyder has features like code completion, a text editor with syntax highlighting, and
variable exploring, whose values you may edit using a GUI.
 An online help browser, allowing users to search and view Python and package
documentation inside the IDE
A DATAMINING MODEL FOR DETECTION OF
FRAUDULENT BEHAVIOUR IN WATER PYTHON LIBRARIES

3.PYTHON LIBRARIES

Libraries:

3.1 NumPy:

 NumPy is an open source extension module for Python.

 It’s very easy to work with large multidimensional arrays and matrices using
NumPy.
 Another advantage of NumPy is that you can apply standard mathematical operations
on an entire data set without having to write loops.
 Even though NumPy does not provide powerful data analysis functionalities,
understanding NumPy arrays and array-oriented computing will help you use other
Python data analysis tools more effectively.

3.2Pandas:

 Pandas is a Python module that contains high-level data structures and tools designed
for fast and easy data analysis operations.
 Pandas is built on NumPy and make it easy to use in NumPy-centric applications,
such as data structures.
 It is also easy to handle missing data using Python. Pandas are the best tool for doing
data munging.

3.3Matplotlib:

 Matplotlib is a Python module for visualization.

 Matplotlib allows you to quickly make line graphs, pie charts, histograms and other
professional grade figures.
 Using Matplotlib, you can customise every aspect of a figure.
 Matplotlib has interactive features like zooming and panning.

3.4 Scikit-Learn:

 Scikit-Learn is a Python package for machine learning.

 It provides a set of common machine learning algorithms to users through a consistent
interface.
A DATAMINING MODEL FOR DETECTION OF
FRAUDULENT BEHAVIOUR IN WATER SYSTEM SPECIFICATIONS

4.System Specifications

4.1 Hardware Requirements:

 Processor : i5 or higher
 Processor Speed : minimum 1.1GHz
 Hard Disk : maximum 100GB
 Input Devices : Keyboard, Mouse
 Ram : 8GB or higher.

4.2Software Requirements:

 Operating system : Windows 10.

 Coding Language : Python
 Libraries : NumPy,Pandas,Matplotlib,Scikitlearn
 Tool : Jupyter, Spyder
 Dataset : Water.csv
A DATAMINING MODEL FOR DETECTION OF
TOOLS AND TECHNOLOGIES
FRAUDULENT BEHAVIOUR IN WATER

5.TOOLS AND TECHNOLOGIES

Tools : Spyder / Jupyternotebook

Programming Language : Python
Algorithms : Linear Regression, Support Vector Machine (SVM) ,
K-Nearest Neighbors (KNN).
5.1 Spyder:
Spyder is an open source cross-platform integrated development environment (IDE) for
scientific programming in the Python language. Spyder integrates with a number of prominent
packages in the scientific Python stack
including NumPy, SciPy, Matplotlib, pandas, IPython, SymPy and Cython, as well as other open source
software. It is released under the MIT license. Initially created and developed by Pierre Raybaut in
2009, since 2012 Spyder has been maintained and continuously improved by a team of scientific Python
developers and the community.
Spyder is extensible with first- and third-party plugins, includes support for interactive
tools for data inspection and embeds Python-specific code quality assurance and introspection
instruments, such as Pyflakes, Pylint and Rope. It is available cross-platform through Anaconda, on
Windows, on macOS through MacPorts, and on major Linux distributions such as Arch
Linux, Debian, Fedora, Gentoo Linux, openSUSE and Ubuntu.
Spyder uses Qt for its GUI, and is designed to use either of the PyQt or PySide Python
bindings. QtPy, a thin abstraction layer developed by the Spyder project and later adopted by multiple
other packages, provides the flexibility to use either backend.

Features:

 An editor with syntax highlighting, introspection, code completion

 Support for multiple IPython consoles
 The ability to explore and edit variables from a GUI

5.2 Python:
Python is interpreted, high-level, general-purpose programming language. Created by Guido
van Rossum and first released in 1991, Python's design philosophy emphasizes code readability with its
notable use of significant whitespace. Its language constructs and object-oriented approach aim to help
programmers write clear, logical code for small and large-scale projects.
Python dynamically typed and It supports multiple programming paradigms,
including procedural, object-oriented, and functional programming. Python is often described as a
"batteries included" language due to its comprehensive standard library.
Python was conceived in the late 1980s as a successor to the ABC language. Python 2.0,
released in 2000, introduced features like list comprehensions and a garbage collection system capable
of collecting reference cycles. Python 3.0, released in 2008, was a major revision of the language that is
not completely backward-compatible, and much Python 2 code does not run unmodified on Python 3.
A DATAMINING MODEL FOR DETECTION OF
TOOLS AND TECHNOLOGIES
FRAUDULENT BEHAVIOUR IN WATER

The Python 2 language, i.e. Python 2.7.x, was officially discontinued on 1 January 2020 (first
planned for 2015) after which security patches and other improvements will not be released for it. With
Python 2's end-of-life, only Python 3.5.x and later are supported.
Python interpreters are available for many operating systems. A global community of
programmers develops and maintains CPython, an open source reference implementation. A non-profit
organization, the Python Software Foundation, manages and directs resources for Python and CPython
development.

Features:
 Python is a multi-paradigm programming language.
 Object-oriented programming and structured programming are fully supported.
 Supports functional programming and aspect-oriented programming.

Algorithms:
 Linear Regression,
 Support Vector Machine (SVM) ,
 K-Nearest Neighbors (KNN).

5.3 Linear Regression:

Linear regression is a linear methodology for demonstrating the link
between a scalar dependent variable y and one or more independent variables denoted X.
The instance of solitary independent variable is called simple linear regression. In linear
regression, the relationships are modeled using linear predictor functions whose unknown
model parameters are estimated from the data. Such models are called linear models.
Linear regression was the first type of regression analysis to be studied
rigorously, and to be used extensively in practical applications. This is because models which depend
linearly on their unknown parameters are easier to fit than models which are non-linearly related to their
parameters and because the statistical properties of the resulting estimators are easier to determine.
A DATAMINING MODEL FOR DETECTION OF
TOOLS AND TECHNOLOGIES
FRAUDULENT BEHAVIOUR IN WATER

Fig: Linear Regression

Fig: Non Linear Regression

5.4 Support Vector Machine(SVM):
In machine learning, support-vector machines (SVMs, also support-vector
networks) are supervised learning models with associated learning algorithms that analyze
data used for classification and regression analysis. Given a set of training examples, each
marked as belonging to one or the other of two categories, an SVM training algorithm builds a
model that assigns new examples to one category or the other, making it a non-probabilistic
binary linear classifier (although methods such as Platt scaling exist to use SVM in a
probabilistic classification setting).
An SVM model is a representation of the examples as points in space,
mapped so that the examples of the separate categories are divided by a clear gap that is as
A DATAMINING MODEL FOR DETECTION OF
TOOLS AND TECHNOLOGIES
FRAUDULENT BEHAVIOUR IN WATER

wide as possible. New examples are then mapped into that same space and predicted to
belong to a category based on the side of the gap on which they fall.

Advantages :

Support vector machine is one of the most widely used classification algorithms due to the advantages it
enjoys which are as follows:

 SVMs are helpful in text and hypertext categorization as their application can significantly
reduce the need for labeled training instances in both the standard inductive and transductive
settings.
 Classification of images can also be performed using SVMs.
 Experimental results show that SVMs achieve significantly higher search accuracy than
traditional query refinement schemes after just three to four rounds of relevance feedback.

 This is also true of image segmentation systems, including those using a modified version
SVM.

5.5 K-Nearest Neighbors (KNN):

K Nearest Neighbor(KNN) is a very simple, easy to understand, versatile
and one of the topmost machine learning algorithms. KNN used in the variety of applications
such as finance, healthcare, political science, handwriting detection, image recognition and
video recognition. In Credit ratings, financial institutes will predict the credit rating of
customers.
KNN is a non-parametric and lazy learning algorithm. Non-parametric means
there is no assumption for underlying data distribution. Lazy algorithm means it does not need
any training data points for model generation. All training data used in the testing phase. This
makes training faster and testing phase slower and costlier. Costly testing phase means time
and memory. In the worst case, KNN needs more time to scan all data points and scanning all
data points will require more memory for storing training data.

KNN makes predictions using the training dataset directly. In KNN, K is the
number of nearest neighbors. The number of neighbors is the core deciding factor. Predictions are made
for a new instance (x) by searching through the entire training set for the K most similar instances (the
neighbors) and summarizing the output variable for those K instances. For regression this might be the
mean output variable, in classification this might be the mode (or most common) class value.
A DATAMINING MODEL FOR DETECTION OF
TOOLS AND TECHNOLOGIES
FRAUDULENT BEHAVIOUR IN WATER

To determine which of the K instances in the training dataset are most similar to a new
input a distance measure is used. For real-valued input variables, the most popular distance measure is
Euclidean distance. This is calculated as the square root of the sum of the squared differences between a
new point (x) and an existing point (xi) across all input attributes j.

Euclidean Distance(x, xi) = sqrt( sum( (xj – xij)^2 ) )

A DATAMINING MODEL FOR DETECTION OF
TOOLS AND TECHNOLOGIES
FRAUDULENT BEHAVIOUR IN WATER

Other popular distance measures include:

 Hamming Distance: Calculate the distance between binary vectors.

 Manhattan Distance: Calculate the distance between real vectors using the sum of
their absolute difference. Also called City Block Distance.
 Minkowski Distance: Generalization of Euclidean and Manhattan distance.
A DATAMINING MODEL FOR DETECTION OF
DATA FLOW DIAGRAM
FRAUDULENT BEHAVIOUR IN WATER

6. DATA FLOW DIAGRAM

INPUT

SELECT PROCESS
(LR, SVM, KNN)

TRAIN

PREDICT

OUTPUT
A DATAMINING MODEL FOR DETECTION OF
SAMPLE CODE
FRAUDULENT BEHAVIOUR IN WATER

7.SAMPLE CODE
7.1 Sample code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LinearRegression
def predict(file,impacts,outcome,inps):
data = pd.read_csv(file)
X = data[impacts]
Y = data[outcome]
linear_regressor = LinearRegression()
linear_regressor.fit(X, Y)
nx = [inps]
pred = linear_regressor.predict(nx)
return pred
speed = int(input("Enter speed:"))
time = int(input("Enter time:"))
users = int(input("Enter users:"))
print("Thefraudishappend: ",predict('water.csv',["speed","time","users"],"fraud",
[speed,time,users]))
A DATAMINING MODEL FOR DETECTION OF
SAMPLE CODE
FRAUDULENT BEHAVIOUR IN WATER

7.2 Using SVM Technique:

data = pd.read_csv('water.csv')
X = data[["speed","time","users"]]
Y = data["fraud"]
training_set=np.c_[X,Y]
clf=svm.SVC(kernel='linear',gamma=2)
clf.fit(X,Y)
prediction=clf.predict([[speed,time,users]])
print("SVC: ",prediction[0])

7.3 Using KNN Technique:

data = pd.read_csv('water.csv')
X = data[["speed","time","users"]]
Y = data["fraud"]
knn=KNeighborsClassifier()
knn.fit(X,Y)
X_test=[[speed,time,users]]
prediction=knn.predict(X_test)
print("KNN: ",prediction[0])

Visual Representation:
plt.scatter(X["speed"], Y, color='r')
plt.xlabel('Speed')
plt.ylabel('Fraud')
plt.show()
plt.scatter(X["time"], Y, color='g')
plt.xlabel('time')
A DATAMINING MODEL FOR DETECTION OF
SAMPLE CODE
FRAUDULENT BEHAVIOUR IN WATER

plt.ylabel('Fraud')
plt.show()
plt.scatter(X["users"], Y, color='b')
plt.xlabel('users')
plt.ylabel('Fraud')
plt.show()
A DATAMINING MODEL FOR DETECTION OF
SCREENSHOTS
FRAUDULENT BEHAVIOUR IN WATER

8 . SCREENSHOTS

8.1 Code:
A DATAMINING MODEL FOR DETECTION OF
SCREENSHOTS
FRAUDULENT BEHAVIOUR IN WATER

8.2 DataSets:
A DATAMINING MODEL FOR DETECTION OF
SCREENSHOTS
FRAUDULENT BEHAVIOUR IN WATER

8.3 Outputs:
A DATAMINING MODEL FOR DETECTION OF
CONCLUSION
FRAUDULENT BEHAVIOUR IN WATER

9.CONCLUSION

In this research, we applied the data mining classification techniques for the purpose of
detecting fraud behaviour in water consumption. We used SVM and KNN classifiers to build
classification models for detecting suspicious fraud. The models were built using the
customers’ historical metered consumption data.

This phase took a considerable effort and time to pre-process and format the data to fit
the SVM and KNN data mining classifiers. The conducted experiments showed that a good
performance of Support Vector Machines (SVM) and K-Nearest Neighbours (KNN) had been
achieved with overall accuracy around 70% for both. The model hit rate is 60%-70% which is
apparently better
A DATAMINING MODEL FOR DETECTION OF
FRAUDULENT BEHAVIOUR IN WATER REFERENCES

10. REFERENCES

1. Approach to Detection of Tampering in Water Meters”, In Procedia Computer

Science, 2015, 60: pp 413-421.
2. Juan Ignacio, Carlos Leon “Real Application on Nontechnical losses detection”, The
2011 World Cogress in Computer Science, Computer Engineering, and Applied
Computing (WORLDCOMP 11), Volume: The 2011 International Conference on
Data Mining.
3. N/A, “Jordan Water Sector Facts & Figures, Ministry of Water and irrigation of
Jordan”. Technical Report. 2015.
4. N/A, “Water Reallocation Policy, Ministry of Water and irrigation of Jordan”.
Technical Report. 2016.
5. B. Coma-Puig, J. Carmona, R. Gavald, S. Alcoverro, and V. Martin, “Fraud
detection in energy consumption: a supervised approach”. In Proc IEEE Intl. Conf.
on DSAA, 2016, pp. 120-129.

Design Patterns Cheat Sheet
No ratings yet
Design Patterns Cheat Sheet
19 pages
Project Report
No ratings yet
Project Report
66 pages
Polaris Office Manual
100% (3)
Polaris Office Manual
50 pages
FINALdocumentsathvika
No ratings yet
FINALdocumentsathvika
69 pages
DBMS MiniProject Report2 Submission10022024
No ratings yet
DBMS MiniProject Report2 Submission10022024
11 pages
Oup 118
No ratings yet
Oup 118
62 pages
Theft Identification - Alert Through Motion Detection - Facial Recognition Using IOT - Report
No ratings yet
Theft Identification - Alert Through Motion Detection - Facial Recognition Using IOT - Report
52 pages
PROJECT DOC-FILE
No ratings yet
PROJECT DOC-FILE
64 pages
Credit Fruad Detection Report
No ratings yet
Credit Fruad Detection Report
30 pages
Final Project Report Crime Data
No ratings yet
Final Project Report Crime Data
37 pages
Missing Child Identification System
No ratings yet
Missing Child Identification System
88 pages
Sradesh Vac
No ratings yet
Sradesh Vac
19 pages
Mini Project Surya
No ratings yet
Mini Project Surya
50 pages
1822 B.E Cse Batchno 328
No ratings yet
1822 B.E Cse Batchno 328
60 pages
Final Document
No ratings yet
Final Document
93 pages
SOWNDAR document
No ratings yet
SOWNDAR document
83 pages
CFFD Documentation
No ratings yet
CFFD Documentation
91 pages
MAJOR PROJECT R
No ratings yet
MAJOR PROJECT R
58 pages
Technical Seminar Abhinav
No ratings yet
Technical Seminar Abhinav
29 pages
MiniProject Report
No ratings yet
MiniProject Report
76 pages
Personality Prediction System ThroughCV Analysis
No ratings yet
Personality Prediction System ThroughCV Analysis
35 pages
Cyber Attack Report-3 - 312820205031 SACHIN L (II-IT)
No ratings yet
Cyber Attack Report-3 - 312820205031 SACHIN L (II-IT)
65 pages
Final Document
No ratings yet
Final Document
61 pages
Movie Ticket MANAGEMENT FINAL REPORT
No ratings yet
Movie Ticket MANAGEMENT FINAL REPORT
27 pages
Rohit
No ratings yet
Rohit
77 pages
Final Doucmennt
No ratings yet
Final Doucmennt
60 pages
Report Jeevan
No ratings yet
Report Jeevan
30 pages
Department of Computer Science and Engineering: A Mini Project Report
No ratings yet
Department of Computer Science and Engineering: A Mini Project Report
32 pages
DGPDP
No ratings yet
DGPDP
81 pages
Project Report Cyber
No ratings yet
Project Report Cyber
31 pages
Detection of Real-Time Malicious Intrusions & Attacks in IOT Empowered Cybersecurity & Infrastructures
No ratings yet
Detection of Real-Time Malicious Intrusions & Attacks in IOT Empowered Cybersecurity & Infrastructures
100 pages
Plagarism
No ratings yet
Plagarism
51 pages
MINI - PROJECT - REPORT (Deeps) 1
No ratings yet
MINI - PROJECT - REPORT (Deeps) 1
40 pages
111111111111 Full Paraphrased
No ratings yet
111111111111 Full Paraphrased
26 pages
18A25F0012
No ratings yet
18A25F0012
99 pages
Minor Project
No ratings yet
Minor Project
101 pages
B.E Cse Batchno 313
No ratings yet
B.E Cse Batchno 313
67 pages
Final Project Report Crime Data 2
No ratings yet
Final Project Report Crime Data 2
38 pages
Uma's Final Project1
No ratings yet
Uma's Final Project1
92 pages
MIndex
No ratings yet
MIndex
8 pages
ASP Final Report
No ratings yet
ASP Final Report
55 pages
Vaishnavidocumentation
No ratings yet
Vaishnavidocumentation
52 pages
1822 B.E Cse Batchno 296
No ratings yet
1822 B.E Cse Batchno 296
83 pages
Skin Cancer Classification Using Deep Learning
No ratings yet
Skin Cancer Classification Using Deep Learning
65 pages
Final_Report
No ratings yet
Final_Report
42 pages
PR3194 - Cryptocurrency Prediction Using Machine Learning - Report - SN
100% (1)
PR3194 - Cryptocurrency Prediction Using Machine Learning - Report - SN
60 pages
SECURING DATA IN INTERNET OF THINGS USING CRYPTOGRAPHY AND STEGANOGRAPHY
No ratings yet
SECURING DATA IN INTERNET OF THINGS USING CRYPTOGRAPHY AND STEGANOGRAPHY
62 pages
Calculator Projects
No ratings yet
Calculator Projects
43 pages
Final Major
No ratings yet
Final Major
62 pages
Project Expense
No ratings yet
Project Expense
42 pages
Min Project Doc-3
No ratings yet
Min Project Doc-3
41 pages
K19 Major Project Thesis Report New
No ratings yet
K19 Major Project Thesis Report New
77 pages
Department of Computer Science and Engineering: A Mini Project Report
No ratings yet
Department of Computer Science and Engineering: A Mini Project Report
31 pages
PROJECT REPORT FORMAT 2025
No ratings yet
PROJECT REPORT FORMAT 2025
59 pages
MCA FRONT PAGE Muthu2 - 053019
No ratings yet
MCA FRONT PAGE Muthu2 - 053019
9 pages
Sample
No ratings yet
Sample
95 pages
(AMIT) WaterMarking On Database Microproject Report
No ratings yet
(AMIT) WaterMarking On Database Microproject Report
29 pages
Final Project Crafter(Share).Io
No ratings yet
Final Project Crafter(Share).Io
102 pages
Final Documentation 9th Batch
No ratings yet
Final Documentation 9th Batch
55 pages
Blood Bank and Donor Management System-Documentation-3
No ratings yet
Blood Bank and Donor Management System-Documentation-3
83 pages
Ajay
No ratings yet
Ajay
49 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Adafruit GFX Graphics Library PDF
No ratings yet
Adafruit GFX Graphics Library PDF
37 pages
Answer Key
No ratings yet
Answer Key
11 pages
Docks Stands Driver RRFDW wn32 6.3.9600.2202 A13 03
No ratings yet
Docks Stands Driver RRFDW wn32 6.3.9600.2202 A13 03
4 pages
CSS 10 Folder Redirection
No ratings yet
CSS 10 Folder Redirection
31 pages
Nikon Software NIS-Elements D
No ratings yet
Nikon Software NIS-Elements D
4 pages
66-08 Retrieval of Video Ev12835
No ratings yet
66-08 Retrieval of Video Ev12835
32 pages
NetZoom Enterprise Brochure
No ratings yet
NetZoom Enterprise Brochure
8 pages
Lesson 5. Advanced Spreadsheet Skills
100% (2)
Lesson 5. Advanced Spreadsheet Skills
8 pages
TMS TInspectorBar Developers Guide
No ratings yet
TMS TInspectorBar Developers Guide
18 pages
Operating Systems MCQs
No ratings yet
Operating Systems MCQs
68 pages
19177-Fakiha Ameen-Lab Manual COAL
No ratings yet
19177-Fakiha Ameen-Lab Manual COAL
59 pages
Advait Synopsis Report
No ratings yet
Advait Synopsis Report
30 pages
5 Dev Tools For Documenting Code Like A Pro
No ratings yet
5 Dev Tools For Documenting Code Like A Pro
9 pages
Biostar N68S3B Spec
No ratings yet
Biostar N68S3B Spec
2 pages
Matlab Workshop Online
No ratings yet
Matlab Workshop Online
5 pages
Introduction To Computer Graphics
No ratings yet
Introduction To Computer Graphics
16 pages
Unit - V Packages & Gui
No ratings yet
Unit - V Packages & Gui
41 pages
Writing Thesis With Lyx
100% (3)
Writing Thesis With Lyx
6 pages
Introduction To Hardware and Software
No ratings yet
Introduction To Hardware and Software
8 pages
TrueRTA Quick Start
No ratings yet
TrueRTA Quick Start
8 pages
and Install Java
No ratings yet
and Install Java
3 pages
Operation Manual Sidexis 4.3
No ratings yet
Operation Manual Sidexis 4.3
370 pages
DX Diag
No ratings yet
DX Diag
38 pages
The Game Development Process
No ratings yet
The Game Development Process
23 pages
Easyclient Basics
No ratings yet
Easyclient Basics
36 pages
Samuel PDF
No ratings yet
Samuel PDF
17 pages
Powerview Pv380: Engine & Diagnostic Display
No ratings yet
Powerview Pv380: Engine & Diagnostic Display
2 pages
Vivid I - S5 - S6 - SN78261
No ratings yet
Vivid I - S5 - S6 - SN78261
24 pages

A Datamining Model For Detection of Fraudulent Behaviour in Water

Uploaded by

A Datamining Model For Detection of Fraudulent Behaviour in Water

Uploaded by

A

A DATAMINING MODEL FOR DETECTION OF FRAUDULENT

COMPUTER SCIENCE AND ENGINEERING

JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY, HYDERABAD

S.SWETHA SRAVANTHI (16C51A0546)

UNDER THE ESTEEMED GUIDENCE OF

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

SAI SPURTHI INSTITUTE OF TECHNOLOGY

(Approved by AICTE, Affiliated to JNTU, Hyderabad, Certified by ISO 9001:2008)

(ACCREDITED BY NAAC-‘B’ Grade)

B.GANGARAM-507303, JNTU-HYDERABAD, TS, 2019-2020

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

Project supervisor Head of the Department

Mr V.V.SIVA PRASAD Mr N.VENKATESWARA RAO

Associate Professor Associate Professor

Dr.CH. VIJAYA KUMAR EXTERNAL EXAMINER

We express our sincere thanks to our supervisor Mr V.V.SIVA PRASAD,

It is our privilege to thank Mr N. VENKATESWARA RAO, Head of the CSE

We derive great pleasure in expressing our sincere gratitude to our principal

We thank to both teaching and non-teaching staff members of CSE department

S.VAMSI KRISHNA (17C55A0507)

D.MANOJ KUMAR (16C51A0515)

Data mining is a powerful tool widely used by organizations to enhance their

S.NO Topic Name Page No

S.NO Figure Name Page No

Below we will see some example scenarios where Data science is

 Recommendation system: Create models

recog scientific computing. More over it is being continuously

1.2. Machine learning

Applications of Machine Learning

 Forecasting things like stock market trends, weather

Types of machine learning algorithms

Supervised learning can be further classified into two types -

 Stepwise Regression etc.

Classification attempts to find the appropriate class label, such as analyzing

 Decision tree algorithms

 K Nearest Neighbor algorithms

 Support Vector Machine algorithms

 Naïve Bayes algorithms etc..

 Hierarchical clustering etc..

Dimensionality reduction algorithms

 PCA (Principle Component Analysis).

1.3. Project Introduction

Download and Install Anaconda:

2. Locate your download and double click it.

5. Note your installation location and then click Next.

SAI SPURTHI INSTITUTE OF TECHNOLOGY Page 10

 NumPy is an open source extension module for Python.

 Matplotlib is a Python module for visualization.

 Scikit-Learn is a Python package for machine learning.

4.1 Hardware Requirements:

 Operating system : Windows 10.

5.TOOLS AND TECHNOLOGIES

Tools : Spyder / Jupyternotebook

 An editor with syntax highlighting, introspection, code completion

5.3 Linear Regression:

Fig: Linear Regression

Fig: Non Linear Regression

5.5 K-Nearest Neighbors (KNN):

Euclidean Distance(x, xi) = sqrt( sum( (xj – xij)^2 ) )

Other popular distance measures include:

 Hamming Distance: Calculate the distance between binary vectors.

6. DATA FLOW DIAGRAM

7.2 Using SVM Technique:

7.3 Using KNN Technique:

1. Approach to Detection of Tampering in Water Meters”, In Procedia Computer

You might also like