0% found this document useful (0 votes)
46 views

Final Yr4 Report

credit fraud

Uploaded by

Divyansh Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
46 views

Final Yr4 Report

credit fraud

Uploaded by

Divyansh Sharma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 50
Credit Card Fraud Detection A Report Submitted In partial fulfilment ‘for the award of the Degree of Bachelor of Technology in Department of Computer Science & Engineering (with specialization in Computer Science & Engineering) Brel Supervisor: Submitted By: Ms. Mayuree Katara Rahul Meena (20EVJCS081) Assistant Professor (Dept. of CSE) Siddharth Joshi (20EVJCS102) Vivekananda Institute of Technology Utkarsh Sharma (20EVJCS105) Vikas Rajoria (2OEVJCS108) Department of Computer Science & Engineering Vivekananda Institute of Technology, Jaipur Rajasthan Technical University, Kota May 2024 Candidates Declaration Thereby declare that the work, which is being presented in the Project, entitled “Credit Card Fraud Detes mn” in partial fulfilment for the award of Degree of “Bachelor of Technology” in Department. of Computer Science and Engineering with Specialization in Computer Science and Engineering, and submitted to the Department of Computer Science and Engineering, Vivekananda Institute of Technology, Rajasthan Technical University is a record of my own investigations carried under the Guidance of Ms. Mayuree Katara, Vivekananda Institute of Technology. have not submitted the matter of this Project report anywhere for the award of any other Degree. Vikas Rajoria Enrolment No: 20EVICS108 ‘Computer Science and Engineering, Vivekananda Institute of Technology Counter Signed by Ms, Mayuri Katara Assistant Professor Department of Computer Science and Engineering Vivekananda Institute of Technology Candidates Declaration Thereby declare that the work, which is being presented in the Project, entitled “Credit Card Fraud Detes mn” in partial fulfilment for the award of Degree of “Bachelor of Technology” in Department. of Computer Science and Engincering with Specialization in Computer Science and Engineering, and submitted to the Department of Computer Science and Engineering, Vivekananda Institute of Technology, Rajasthan Technical University is a record of my own inves gations carried under the Guidance of Ms. Mayuree Katara, Vivekananda Institute of Technology. Thave not submitted the matter of this Project report anywhere for the award of any other Degree. Rahul Meena Enrolment No: 20EVJCS081 ‘Computer Science and Engin: ring, Vivekananda Institute of Technology Counter Signed by Ms. May Assistant Professor tara Department of Computer Science and Engineering Vivekananda Institute of Technology Candidates Declaration Thereby declare that the work, which is being presented in the Project, entitled “Credit Card Fraud Detection” in partial fulfilment for the award of Degree of “Bachelor of Technology” in Department. of Computer Science and Engineering with Specialization in Computer Science and Engineering, and submitted to the Department of Computer Science and Engineering, Vivekananda Institute of Technology, Rajasthan Technical University is a record of my own investigations carried under the Guidance of Ms. Mayuree Katara, Vivekananda Institute of Technology. Ihave not submitted the matter of this Project report anywhere for the award of any other Degree. Siddharth Joshi Enrolment No: 20EVICS102 ‘Computer Science and Engineering, Vivekananda Institute of Technology Counter Signed by Ms. Mayuri Katara Assistant Professor Department of Computer Science and Engineering Vivekananda Institute of Technology Candidates Declaration I hereby declare that the work, which is being presented in the Project, entitled “Credit Card Fraud Detection” in partial fulfilment for the award of Degree of “Bachelor of Technology” in Department. of Computer Science and Engineering with Specialization in Computer Science and Engineering, and submitted to the Department of Computer Science and Engineering, Vivekananda Institute of Technology, Rajasthan Technical University is a record of my own investigations carried under the Guidance of Ms. Mayuree Katara, Vivekananda Institute of Technology. Thave not submitted the matter of this Project report anywhere for the award of any other Degree. Utkarsh Sharma Enrolment No: 20EVICS105 ‘Computer Science and Engineering, Vivekananda Institute of Technology Counter Signed by Ms. Mayuri Katara Assistant Professor Department of Computer Science and Es Vivekananda Institute of Technology ACKNOWLEDGEMENT I would like to convey my profound sense of reverence and admiration to my mentor and supervisor Ms. Mayuree Katara, Department of Computer Science and Engineering, Vivekananda Institute of Technology for her intense concern, attention, priceless direction, guidance and encouragement throughout this research work. I wish to express my sincere gratitude to Er. Onkar Bagaria, Director, Vivekananda Institute of Technology, Dr. Dhiraj Singh, Principle, Vivekananda Institute of Technology, Dr. Asif Iqbal, Professor, Department of Computer Science and Engineering, Vivekananda Institute of Technology for their incessant motivation and support during my work. My special heartfelt gratitude goes to Ms. Mayuree Katara, Project Coordinator ~ Vivekananda Institute of Technology, for her unvarying support, guidance and motivation during the course of this, research. I would like to (ake the opportunity of expressing my thanks to all other talented faculty members of the Department of Computer Science and Engineering, former or current staff of Vivekananda Institute of Technology for their excellent knowledge, understanding and inspiration throughout the course. Tam deeply thankful to my parents and all other family members for their blessings and inspiration. Date: 14-05-2024 Place: Jaipur Vikas Rajori Enrolment No: 20EVICS108 Rahul Meena Enrolment No: 20EVJCS081 Siddharth Joshi Enrolment No: 20EVJCS102 Utkarsh Sharma Enrolment No: 20EVJCS105 B. Tech VIII Semester, Computer Science and Engineering Branch, Vivekananda Institute of Technology, Jaipur CONTENTS CANDIDATE'S DECLARATIONS ACKNOWLEDGEMENT. 6 LIST OF FIGURES ..... LIST OF TABLES, 9 LIST OF ACRONYMS |... okt ABSTRACT. xii CHAPTER 1 = INTRODUCTION wwseesninnnnninnnsnnninnnnsnnnnnsnnnnnnnnie | LL OVErVIEW vaste 1.2 Problem Statement I Le ignificance and Relevance of Work. 1.4 Objectives. 1.5 Methodology ... 1.6 Organization of the Report CHAPTER 2 - LITERATURE SURVEY .. CHAPTER 3 - SYSTEM REQUIREMENTS AND SPECIFICATION .. 3.1 System Requirement Specification. 3.2 Hardware Specificatio 3.3 Software Specification 3.4 Functional Requirements... 3.5 Non-Functional Requirements. eee Vad 3.6 Performance Requirement CHAPTER 4 - SYSTEM ANALYSIS...... 4.1 Existing System: 4.2 Limitation 9 4.3 Proposed system... 4.3.1 Advantages 15 CHAPTER 5 - SYSTEM DESIGN.. Module 5.1 Proje 5.2 Activity Diagram. 18 5.3 Use Case Diagram. 5.4 Sequence Diagram ...ounnne 5.5 Data Flow Diagram... CHAPTER 6 - IMPLEMENTATION 00: 6.1 Algorithm/Pseudo code module wise. 22 CHAPTER 7 - TESTING 32 7.1 Unit Testing.. 7.2 Validation Testing. 30 7.3 Functional Testing, 7.4 Integration Testing 7.5 User Acceptance Testing .. CHAPTER 8 - PERFORMANCE ANALYSIS. CHAPTER 9 - CONCLUSION & FUTURE ENHANCEMENT 32 CHAPTER 10 - BIBLIOGRAPHY. 32 CHAPTER 11 - APPENDIX 32 Appendix A: Seren Shots... Appendix B: Abbreviation. 4 LIST OF FIGURES S.No. Description Page No. 1 Fraud and non-fraud 9 2 SVM Representation 11 3 Simplified Random Forest 12 4 Decision tree Algorithm 13 5 Figure 5.1 System Architecture 3 6 Figure 5.2 Activity Diagram 18 7 Figure 5.3 Use Case Diagram 19 8 Figure 5.4 Sequence diagram 20 9 Figure 55 Data Flow diagram 21 10 Figure 8.1 Dataset analysis 4 i Figure 1 Correlation Matrix 39) 2 Figure 2 Dataset % B Figure 3 Data set reading code 40 14 40 Figure 4 Confusion Matrix CHAPTER-1 INTRODUCTION 1.1 Overview Credit card is the most popular mode of payment. As the number of credit card users is rising world-wide, the identity theft is increased, and frauds are also increasing. In the virtual card purchase, only the card information is required such as card number, expiration date, secure code, etc. Such purchases are normally done on the Intemet or over telephone. To commit fraud in these types of purchases, a person simply needs to know the card details. The mode of payment for online purchase is mostly done by credit card. The details of credit card should be kept private. To secure credit card privacy, the details should not be leaked. Different ways to steal credit card details are phishing websites, steal/lost credit cards, counterfeit credit cards, theft of card details, intercepted cards ete. For security purpose, the above things should be avoided. In online fraud, the transaction is made remotely and only the card’s details are needed. The simple way to detect this type of fraud is to analyze the spending patterns on every card and to figure out any variation to the “usual” spending patterns. Fraud detection by analyzing the existing data purchase of cardholder is the best way to reduce the rate of successful credit card frauds. As the data sets are not available and also the results are not disclosed to the public. The fraud cases should be detected from the available data sets known as the logged data and user behavior. At present, fraud detection has been implemented by a number of methods such as data mining, statistics, and artificial intelligence. 1.2, Problem Statement The card holder faced a lot of trouble before the investigation finish, And also, as all the transaction is maintained in a log, we need to maintain huge data, and also now a day's lot of online purchase are made so we don't know the person how using the card online, we just capture the Ip address for verification purpose. So there need a help from the cyber- crime to investigate the fraud, 1.3 Significance and Relevance of Work Relevance of work includes consideration of all the possible ways to provide a solution to given problem. The proposed solution should satisfy all the user requirements and should be flexible enough so that future changes can easily done based on the future upcoming requirements like Machine learning techniques. There are two important categories of machine learning techniques to identify the frauds in credit card transactions: supervised and unsupervised learning model. In supervised approach, early transactions of credit card are labelled as genuine or frauds. Then, the scheme identifies the fraud transaction with credit card data. 1.4 Objectives Features Extractions from recognized facial information then data will be normalized for extracting features of good Objective of the project is to predict the fraud and fraud less transaction with respect to the time and amount of the transaction using classification machine learning algorithms such as SVM, Random Forest, Decision tree and confusion matrix in building of the complex machine learning models. 1.5 Methodology First the Dataset is read. Exploratory Data Analysis is performed on the dataset to clearly understand the statistics of the data, Feature selection is used, A machine learning model is developed. Train and test the model and analysis the performance of the model using certain evaluation techniques such as accuracy, confusion matrix, precision ete. 1.6 Organization of the report Chapter 1 1. Overview: the overview provides the basic layout and the insight about the work proposed. It briefs the entire need of the currently proposed work 2. Problem statement: A problem statement is a concise description of an issue to be addressed or a condition to be improved upon. We have identified the gap between addressed or a condition to be improved upon. 3. Significance and Relevance of Work: We have mentioned about the contribution of our work to the society 4. Objectives: A project objective describes the desired results of the work. We have mentioned about the work we are trying to accomplish in this sect 5. Methodology: A methodology is a collection of methods, practices, processes and techniques. We have explained in this section about the working of the project in a brief way. Chapter 2 1. Literature Survey: the purpose of literature review is to gain an understanding, of the 1g resources to a particular topic or area of study. We have referred to many research papers relevant (o our work in a better way. Chapter 3 1. System Requirements and Specifications: System Requirements and Specifications is a document that describes the nature of a project, software or application. This section contains the brief knowledge about the funetional and non — functional that are needed to implement the projec Chapter 4 1. System Analysis: System Analysis is a document that describes about the existing system and proposed system in the project. And also describes about advantages and disadvantages in the project. Chapter 5 1, System design: System design is a document that describes about the project modules, Activity diagram, Use Case Diagram, Data Flow Diagram, and Sequence Diagram detailed in the project. Chapter 6 1. Implementation: Implementation is a document that describes about the detailed concepts of the project. Also describes about the algorithm with their detailed steps. And also, about the codes for implementation of the algorithm Chapter 7 1. Testing: Testing is a document that describes about the a. Methods of testing: This contains the information about Unit testing, Validation testing, Functional testing, Integration testing, User Acceptance testing. b. Test Cases: In Test Cases we contain the detailed description about program Testeases. Chapter 8 1. Performance Analysis: Performance Analysis is a document that describes about the study system in detailed. Chapter 9 1. Conelusion and Future Enhancement: Conclusion and Future Enhancement, is a document that describes about the brief summary of the project and undetermined events that will occur in that time, CHAPTER-2 LITERATURE SURVEY 2.1 Credit Card Fraud Detection Techniques: Data and Technique Oriented Perspective Authors: Samaneh Sorournejad, Zahra Zojaji, Amir Hassan Monadjemi. rd fraud det In this paper, after investigating difficulties of credit tion, we seek to review the state of the art in credit card fraud detection techniques, datasets and evaluation criteria. Disadvantages + Lack of standard metries 2.2 Detection of credit card fraud: State of art Authors: Imane Sadgali, Nawal Sael, Faouzia Benabbau In this paper, we propose a state of the art on various techniques of credit card fraud detection. The purpose of this study is to give a review of implemented techniques for credit card fraud detection, analyses their incomes and limitless, and synthesize the finding in order to identify the techniques and methods that give the best results so far. Disadvantages + Lack of adaptability 2.3 Credit card fraud detection using machine learning algorithm Authors: Vaishnavi Nath Dornadulaa, Geetha 8. The main aim of the paper is to design and develop a novel fraud detection method for Streaming Transaction Data, with an objective, to analyze the past transaction details of the customers and extract the behavioral patterns. isadvantages + Imbalanced Data 24 Fraudulent Transaction Detection in Credit Card by Applying Ensemble Machine Learning techniques Authors: Debachudamani Prusti, Santanu Kumar Rath In this study, the application of various classification models is proposed by implementing machine leaming techniques to find out the accuracy and other performance parameters to identify the fraudulent transaction. Disadvantages + Overlapping data, 28 Detection of Credit Card Fraud Transactions using Machine Learning Algorithms and Neural Networks Authors: Deepti Dighe, Sneha Patil, Shrikant Kokate Credit card fraud resulting from misuse of the system is defined as theft or misuse of one’s credit card information which is used for personal gains without the permission of the card holder. To detect such frauds, itis important to check the usage patterns of a.user over the past transactions. Comparing the usage pattem and current transaction, wwe can classify it as either fraud or a legitimate transaction. Disadvantages + Different misclassificatio 2.6Credit card fraud detection using machine learning algorithms and cybersecurity Authors: JiatongShen As they have the same accuracy the time factor is considered to choose the best algorithm. By considering the time factor they concluded that the Adaboost algorithm works well to detect credit card fraud, isadvantages + Accuracy is not getting perfectly CHAPTER-3 SYSTEM REQUIREMENTS AND SPECIFICATION 3.1 System Requirement Specification: System Requirement Specification (SRS) is a fundamental document, which forms the foundation of the software development process. The System Requirements Specification (SRS) document describes all data, functional and behavioral requirements of the software under production or development. An SRS is basically an organization's understanding (in writing) of a customer or potential client's system requirements and dependencies at a particular point in time (usually) prior to any actual design or development work. It's a two- way insurance policy that assures that both the client and the organization understand the other's requirements from that perspect fe at a given point in time, The SRS also functions as a blueprint for completing a project with as little cost growth as possible. The SRS is often referred to as the "parent" document because all subsequent project management documents, such as design specificati . statements of work, software architecture speci ations, testing and validation plans, and documentation plans, are related to it. It is important, to note that an SRS contains functional and non-functional requirements only, It doesn't offer design suggestions, possible solutions to technology or business issues, or any other information other than what the development team understands the customer's system requirements. 3.2 Hardware specification > RAM: 4GB and Higher > Processor: intel i3 and above > Hard Disk: 500GB: Minimum 3.3 Software specification > OS: Windows or Linux Python IDE: python 2.7.x and above Jupyter Notebook Language: Python vy v 3.4 Functional Requirements: Functional Requirement defines a function of a software system and how the system must behave when presented with specific inputs or conditions. These may include calculations, data manipulation and processing and other specific functionality. In this system following are the functional requirements: * Collect the Datasets, * Train the Model. «Predict the results 3.5 Non-Functional Requirements + The system should be easy to maintain, + The system should be compatible with different platforms. + The system should be fast as customers always need speed. © The system should be accessible to online u ‘* The system should be easy to lean by both sophisticated and novice users. + The system should provide easy, navigable and user-friendly interfaces. + The system should produce reports in different forms such as tables and graphs for easy visualization by management. + The system should have a standard graphical user interface that allows for the online 3.6 Performance Requirement: Performance is measured in terms of the output provided by the application Requirement specification plays an important part in the analysis of a system. Only when the requirement specifications are properly given, it is possible to design a system, which will fit into required environment. It rests largely with the users of the existing system to give the requirement specifications because they are the people who finally use the system. This is because the requirements have to be known during the initial stages so that the system can be designed according to those requirements. It is very difficult to change the system once it has been designed and on the other hand designing a system, which does not cater to the requirements of the user, is of no use. CHAPTER-4 SYSTEM ANALYSIS Systems analysis is the process by which an individual studies a system such that an information system can be analyzed, modeled, and a logical alternative can be chosen. Systems analysis projects are initiated for three reasons: problems, opportunities, and directives 4.1 Existing System Since the credit card fraud detection system is a highly researched field, there are many different algorithms and techniques for performing the credit card fraud detection system. One of the earliest systems is CCFD system using Markov model. Some other various existing algorithms used in the credit cards fraud detection system includes Cost sensitive decision tree (CSD). credit card fraud detection (CCED) is also proposed by usi ‘The existing credit card fraud detection system using neural network follows the whale swarm optimization algorithm to obtain an incer 1g neural networks. ive value. Itthe uses BP network to rectify the values which are found error. Figure 4.1.1 fraud and Non-Fraud Representation 4.1.1 Limitations Ifthe time interval is too short, then Markov models are inappropriate because the individual displacements are not random, but rather are deterministically related in time. This example suggests that Markov models are generally inappropriate over sufficiently short time intervals. 9 4.2 Proposed System Support Vector Machine: SVM works by mapping data to a high-dimensional feature space so that data points can be categorized, even when the data are not otherwise linearly separable. A separator between the categories is found, then the data are transformed in such a way that the separator could be drawn as a hyperplane Training regression model and finding out the best one, Max-margin Fraud Hyperplane nay ° as Legitimate 3° ° 4 700 0° 40° oo 8 Fig 4.2.1 SVM Representation Support Vector Machine Terminology Hyperplane: Hyperplane is the decision boundary that is used to separate the data points of different classes in a feature space. In the case of linear classifications, it will bea linear equation i.e, wx+b = 0. Support Vectors: Support vectors are the closest data points to the hyperplane, which makes a critical role in deciding the hyperplane and margin. ‘Margin: Margin is the distance between the support vector and hyperplane. The main objective of the support vector machine algorithm is to maximize the margin, The wider margin indicates better classification performance. Kernel: Kernel is the mathematical function, which is used in SVM to map the original input data points into high-dimensional feature spaces, so, that the hyperplane can be easily found out even if the data points are not linearly separable in the original input space. Some of the common kemel functions are linear, polynomial, radial function (RBF), and sigmoid. Hard Margin: The maximum-margin hyperplane or the hard margin hyperplane is a hyperplane that properly separates the data points of different categories without any misclassifications. Soft Margin: When the data is not perfectly separable or contains outliers, SVM permits a soft margin technique. Each data point has a slack variable introduced by the soft-margin SVM formulation, which softens the strict margin requirement and permits certain misclassifications or violations. It discovers a compromise between increasing the margin and reducing violations. . C: Margin maximisation and misclassification fines are balanced by the regularisation parameter C in SVM. The penalty for going over the margin or misclassifying data items is decided by it. A stricter penalty is imposed with a greater value of C, which results ina smaller margin and perhaps fewer misclassifications. Hinge Loss: A typical loss function in SVMs is hinge loss. It punishes incorrect, classifications or margin violations. The objective function in SVM is frequently formed by combining it with the regularisation term. Dual Problem: A dual Problem of the optimisation problem that requires locating the Lagrange multipliers related to the support vectors can be used to solve SVM. The dual formulation enables the use of kernel tricks and more effective computing. Random Forest Classifier Features are cheekbone to jaw width, width to upper facial height ratio, perimeter to area ratio, eye size, lower face to face height ratio, face width to lower face height ratio and mean of eyebrow height. The extracted features are normalized and finally subjected to support regression. Rusdom Foret Siniplitica Fig 4.2.2 Simplified Random Forest algorithm Decision Tree A decision tree is one of the most powerful tools of supervised learning algorithms used for both classification and regression tasks. It builds a flowchart-like tree structure where cach internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label. It is constructed by recursively splitting the training data into subsets based on the values of the attributes until a stopping criterion is met, such as the maximum, depth of the tree or the minimum number of samples required to split a node. During training, the Decision Tree algorithm selects the best attribute to split the data based on a metric such as entropy or Gini impurity, which measures the level of impurity or randomness in the subsets. The goal is to find the attribute that maximizes the information gain or the reduction in impurity after the split Some of the common Terminologies used in Decision Trees are as follows: + Root Node: It is the topmost node in the tree, which represents the complete dataset. It is the starting point of the decision-making process. + Decision/Internal Node: A node that symbolizes a choice regarding an input feature. Branching off of internal nodes connects them to leaf nodes or other intemal nodes. + Leaf/Terminal Node: A node without any child nodes that indicates a class label or a numerical value. Splitting: The process of splitting a node into two or more sub-nodes using a split criterion and a selected feature. Branch/Sub-Tree: A subsection of the decision tree starts at an internal node and ends at the leaf nodes. Parent Node: The node that divides into one or more child nodes, Child Node: The nodes that emerge when a parent node is split. Impurity: A measurement of the target variable’s homogeneity in a subset of data. It refers to the degree of randomness or uncertainty in a set of examples. The Gini index and entropy are two commonly used impurity measurements in decision trees for classifications task Variance: Variance measures how much the predicted and the target variables vary in different samples of a dataset. It is used for regression problems in decision trees, Mean squared error, Mean Absolute Error, friedman_mse, or Half Poisson deviance are used to measure the variance for the regression tasks in the decision tree. Information Gain: Information gain is a measure of the reduction in impurity achieved by splitting a dataset on a particular feature in a decision tree. The splitting criterion is determined by the feature that offers the greatest information gain, It is used to determine the most informative feature to split on at each node of the tree, with the goal of creating pure subsets Pruning: The process of removing branches from the tree that do not provide any additional information or lead to overfiting. 4.2.1 Advantages Support vector machine works comparably well when there is an understandable margin of dissociation between classes. + SVM is effective in instances where the number of dimensions is larger than the number of specimens. + Simple to understand and to interpret. Requires little data preparation. + The cost of using the tree (Le., predicting data) is logarithmic in the number Of datapoints used to train the tree, | data, Able to handle both numerical and categori + Random forest el ssifier can be used to solve for regression or classification problems. ion of decision trees, and The random forest algorithm is made up of a coll each tree in the ensemble is comprised of a data sample drawn from a training set with replacement, called the bootstrap sample. + Itcan be very useful for solving decision-related problems. + Ithelps to think about all the possible outcomes for a problem. + There is less requirement of data cleaning compared to other algorithms 15 CHAPTER-5 SYSTEM DESIG! 5.1 Project Modules Entire project is divided into 3 modules as follows: Data Gathering and pre processing Training the model using following Machine Learning algorithms i, SVM ii, Random Forest Classifier iii, Decision Tree Module Data Gathering and Data Pre processing a. A proper dataset is searched among various available ones and finalized with the dataset. b. The dataset must be preprocessed to train the model, ¢. Inthe preprocessing phase, the dataset is cleaned and any redundant values, noisy data and null values are removed, 4. The Preprocessed data is provided as input to the module. Module 2: Training the model a, The Preprocessed data is split into training and testing datasets in the 80:20 ratio to avoid the problems of overefitting and under-fitting b. A model is trained using the training dataset with the following algorithms SVM, Random Forest Classifier and Decision Tree c. The trained models are trained with the testing data and results are visualized using bar graphs, scatter plots. 4. The accuracy rates ofeach algorithm are calculated using different params like F1 score, Precision, Recall, The results are then displayed using various data ‘visualization tools for analysis purpose. e. The algorithm which has provided the better accuracy rate compared to remaining, algorithms is taken as final prediction model. 16 Module inal Prediction model integrated with front end a, The algorithm which has provided better accuracy rate has considered as the final prediction model. b. The model thus made is integrated with front end. c. Database is connected to the front end to store the user i it formation who are using SYSTEM ARCHITECTURE Our Project main purpose is to making Credit Card Fraud Detection awaring to people from credit card online frauds. the main point of credit card fraud detection system is necessary to safe our transactions & security. With this system, fraudsters don't have the chance to make multiple transactions on a stolen or counterfeit card before the cardholder is aware of the fraudulent activity. This model is then used to identify whether a new transaction is fraudulent or not, Our aim here is to detect 100% of the fraudulent transactions while minimizing the incorrect fraud classifications. a + oman afer rermnen] of en remaennen = a —=_ | = a I Fig 5.1 System Architecture 7 5.2 Activity diagram Activity diagram is an important diagram in UML to describe the dynamic aspects of the system. Activity diagram is basically a flowchart to represent the flow from one activity to another activity. The activity can be described as an operation of the system. The control flow is drawn from one operation to another. This flow can be sequential, branched, or concurrent. Activity diagrams deal with all type of flow control by using different elements such as fork, join, etc. The basic purposes of activity diagram are it captures the dynamic behavior of the system. Activity diagram is used to show message flow from one activity to another Activity is a particular operation of the system. Activity diagrams are not only used for visualizing the dynamic nature of a system, but they are also used to construct the executable system by using forward and reverse engineering techniques. The only missing thing in the activity diagram is the message part. Fig 5.2 Activity Diagram 18 5.3 Use case diagram In UML, use-case diagrams model the behavior of a system and help to capture the requirements of the system, Use-case diagrams deseribe the high-level functions and scope of a system. These diagrams also identify the interactions between the system and its actors. The use cases and actors in use-case diagrams describe what the system does and how the actors use it, but not how the system operates internally. Use-case diagrams illustrate and define the context and requirements of either an entire system or the important parts of the system. You can model a complex system with a single use-case diagram, or create many use-case diagrams to model the components of the system. You would typically develop use- case diagrams in the early phases of a proje: and refer to them throughout the development process. _ View statement of account ‘change credit gard Fig 5.3 Use case Diagram 5.4 Sequence Diagram ‘The sequence diagram represents the flow of messages in the system and is also termed as an event diagram. It helps in envisioning several dynamic scenarios. It portrays the communication between any two lifelines as a time-ordered sequence of events, such that these lifelines took part at the run time. In UML, the lifeline is represented by a vertical bar, whereas the message flow is represented by a vertical dotted line that extends across the bottom of the page. It incorporates the iterations as well as branching. | nese [escort yt] oasinny | ree cnr ‘we om pas ccennees veal + ¥ ‘tls sem cescet | 1 sans cu | ‘Seana ion oes semen [mee | | fecreccy en ee | errancn fof) es escent Radome Fig 5.4 Sequence diagram 20 5.5 Data Flow Diagram A Data Flow Diagram (DFD) is a traditional visual representation of the information flows within a system. A neat and clear DED can depict the ri 1 amount of the system requirement graphically. It can be manual, automated, or a combination of both. It shows how data enters and leaves the system, what changes the information, and where data is stored, The objective of DED is to show the scope and boundaries ofa system. asa whole. It may be used as a communication tool between a system analyst and any person who plays a part in the order that acts as a starting point for redesigning a system, The DED is also called as a data flow graph or bubble chart. Preprocessing Apply Collecting missing values. oversampling credit card techniques data sets Normalization Validation Machine with learning based Feature performance classifiers selection measure Fig 5.5 Data Flow diagram 21 CHAPTER-6 6.1 Algorithm Step 1: Import dataset Step 2: Convert the data into data frames format Step3 Do random oversampling using ROSE package Step4: Decide the amount of data for training data and testing data StepS: Give 80% data for training and remaining data for testing Step6: Assign train dataset to the models Step7: Choose the algorithm among 3 different algorithms and create the model Step8: Make predictions for test dataset for each algorithm Step9: Calculate accuracy for each algorithm, Step10: Apply confusion matrix for each variable Stepl1: Compare the algorithms for all the variables and find out the best algorithm. CODE: - Importing Libraries tpip install tensorflow # for numerical operations import numpy as np # to store and analysis data in dataframes import pandas as pd # data visualization import matplotlib.pyplot as plt import seaborn as sns # python modules for data normalization and splitting from sklearn preprocessing import RobustScaler from sklearm.model tion import train_test_split # python modules for ereating training and testing ml algorithms from sklearn.svm import SVC from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier # python modules for creating training and testing Neural Networks import tensorflow as tf from tensorflow.keras.models import load_model from tensorflow.keras.models import Sequential from tensorflow-keras.layers import Dropout,Dense # evaluation head Time AEE EE TE From sklearn.metrics import accuracy_score,confusion_matrix,classification_reportprecision_score,recall_score, fl_score,roc_aue_score import systemcheck Data Acquisition data = pd.read_esv(crediteard.csv’) 24 Data Analysis data shape data info() data deseribe() sns.countplot(x—Class', data=data) print("Fraud: ",data.Class.sum()/data.Class.count()) Fraud_class = pd.DataFrame( ‘Fraud’: datal'Class']}) Fraud_class. apply(pd.value_counts). plot(kind~'pie’,subplots=True) fraud = data[dataf'Class] == 1] =] valid = data[data[‘Class'] fraud. Amount.describe() plt-figure(figsize=(20,20)) plt.itle( Correlation Matrix’, y=1.05, size=15) sns.heatmap(data.astype(float).corr(),linewidths=0.1,vmax=1.0, square=True, linecolor=white’, annot=True) Data Normalization 15 = RobustScalerd) data[! Amount’] =1s.fit_transform(data['Amount'J-values.reshape(-1, 1)) datal'Time'] = rs.fit_transform(data[ Time']-values.reshape(-1, 1)) data 25 Considering inputs columns and output column X = data.drop({‘Class'], axis = 1) Y = data["Class"] eel) ae Data splitting x ain, X_1 est, Y_tain, Y_test = train_t est_split(X, Y, test_size = 0.2, random_stale = 1) X train X test Y_test def evaluate(Y_test, Y_pred): print("Accuracy: " \accuracy_score(Y_test, Y_pred)) print("Precision: ",precision_score(Y_test, Y_pred)) print("Recall: “,ecall_score(Y_te: Y_pred)) print("F1-Score: ",f1_score(Y_test, Y_pred)) print("AUC score: ",roc_auc_score(Y_test, Y_pred)) print(classification_report(Y_test, Y_pred, target_ntames = ‘Normal’ Frau’) conf_matrix = confusion_matrix(Y_test, Y_pred) plt.figure(figsiz. 6, 6)) sns.heatmap(conf_matrix, xticklabels = ['Normal’, ‘Fraud! yticklabels = ['Normal, 'Fraua!], annot = True, fmt =""); pltstitle("Confusion matrix") 26 pitylabel("True class’) plt-xlabel(Predicted class’) pitshow() Creating algorithms, Training, Testing and Evaluating # Creating Support Vector Classifier svm = SVCQ) # Training SVC svm.fit(X_train, Y_train) # Testing SVC Y_pred_svm=svm predict(X_test) # Evaluating SVC evaluate(Y_pred_svm, Y_test) # Random forest model creation rfe = RandomForestClassifier() # waining rfe.fit(X_train, Y_train) # Testing Y_pred_sf= rfe-predict(X_test) # Evaluation evaluate(Y_pred_rf, Y_test) # Decision tree model creation dtc = DecisionTreeClassifien() dte.fit(X_train, Y_train) # predictions Y_pred_dt i itc_predict(X_test) evaluate(Y_pred_dt_i, Y_test) 27 #Random forest balanced weights from skleam.ensemble import RandomForestClassifier # random forest model ereation rib = RandomForestClassifier(class_weight~balanced’) fb fit(X_train, Y_train) # predictions Y_pred_rf_b = rfb.predict(X_test) evaluate(Y_pred_rf_b, ¥_test 28 TESTING Testing is a process of executing a program with intent of finding an error. Testing presents an interesting anomaly for the software engineering, The goal of the softwa e testing is to convince system developer and ci tomers that the software is good enough for operational use. Testing is a process intended to build confidence in the software. Testing is a set of activities that can be planned in advance and conducted systematically, Software testing is often referred to as verification & validation. Arireg aan it fun Tas err Perey rray (input_df_ Pesce er Pata stuic} pact 11.0) Testing In this tes ing we test each module individually and integrate with the overall system. Unit testing focuses verification efforts on the smallest unit of software design in the 29 module. This is also known as module testing. The module of the system is tested separately. This testing is carried out during programming stage itself, In this testing step each module is found to working satisfactorily as regard to the expected output from the module. There are some validation checks for fields also. It is very easy to find error debut in the system. 7.2 Validation Testing At the culmination of the black box testing, software is completely assembled as a package, interfacing errors have been uncovered and corrected and a final series of software tests. Asking the user about the format required by system tests the output displayed or generated by the system under consideration, Here the output format is considered the of screen display. The output format on the screen is found to be correct as the format was designed in the system phase according to the user need, For the hard copy also, the output comes out as specified by the user. Hence the output testing does not result in any correction in the system. 7.3 Functional Testing s tested are available Functional tests provide systematic demonstrations that functio as specified by the business and technical requirements, system documentation, and user manuals. Functional testing is centered on the following items: Valid Input: identified classes of valid input must be accepted. Invalid Input: identified classes of invalid input must be rejected. Functions: identified functions must be exercised. Output: identified classes of application outputs must be exercised. Systems/Procedures: interfacing systems or procedures must be invoked, Organization and preparation of functional tests is focused on requirements, key functions, or special test cases Before functional testing is complete, additional tests are identified and the effective value of current tests is determined, Pee Rs eC 74 Integration Testing Data can be lost across an interface; one module can have an adverse effort on the other sub funetions when combined may not produces the desired major functions. Integrated testing is the systematic testing for constructing the uncover errors within the interface. The testing was done with sample data, The Developed system has run successfully for this sample data, The need for integrated test is to find the overall system performance. 7.5 User acceptance testing User Acceptance Testing is a critical phase of any project and requires significant participation by the end user, It also ensures that the system meets the functional requirements, Some of my friends were who tested this module suggested that this was really a user-friendly application and giving good processing speed. CHAPTER-8 PERFORMANCE ANALYSIS 4.1 Performance metrics: ‘The basic performance measures derived from the confusion matrix. The confusion matrix is a 2 by 2 matrix table contains four outcomes produced by the binary classifier. Various measures such as sensitivity, specificity, accuracy and error rate are derived from the confusion matrix, Accuracy: Accuracy is calculated as the total number of two correct predictions(A+B) divided by the total number of the dataset(C+D). It is calculated as (1-error rate). Accuracy~A+B/C+D Whereas, A=True Positive B=True Negative ositive egative Error rate: Error rate is calculated as the total number of two incorrect predictions(F+E) divided by the total number of the dataset(C+D), Error rate=F+E/C+D Whereas, E-False Positive F-False Negative C-Positive D=Negative Sensitivity: Sensitivity is calculated as the number of correct positive predictions(A) divided by the total number of positives(C). Sensitivity=A/C Specificity: Specit is calculated as the number of correct negative predictions(B) divided by the total number of negatives(D). Specificity=B/D. DATA ANALYSIS Fig 8.1 Dataset analysis SUPPORT VECTOR MACHINE Accuracy: 0,9994557775359011 Precis 0.6781609195402298 Recall: 0.9516129032258065 Fl-Score: 0.7919463087248322 AUC score: 0,975560405918703 precision recall fl-score support ‘Normal 1.00 1.00 1.00 56900 Fraud, 0.68 0.95 0.79 2 accuracy 1.00 56962 ‘macro avg, 0.84 0.98 0.90 56962 weighted avg 1.00 1.00 1.00 $6962 RANDOM FOREST Accuracy: —0.9995611109160493, Precision: —0,7701149425287356 Recall: 0.9305555555555556 Fl-Score: 0,8427672955974842 AUC score: 0,9651019999609383, precision recall fl-score support Normal 1.00 1.00 1.00 56890 Fraud 77 0.93 084 n accuracy 100 $6962 macro avg, 0.89 097 0.92 56962 weighted avg 1.00 1.00 100 56962 DECISION TREE Accuracy: 0.9992802219023208 Precision: 0,7241379310344828 Recall: 0.7875 Fi-Score: 0,7544910179640718 AUC score: 0,8935390369536936 precision recall fl-score support. Normal 1.00 1.00 1.00 56882 Fraud 0.72 079° 0.75 80 accuracy 1.00 56962 macroavg 0.86 089° 0.88 56962 weighted avg 1.00 1001.00 56962 CHAPTER-9 CONCLUSION & FUTURE ENHANCEMENT Nowadays, in the global computing environment, onlin payments are important, because online payments use only the credential information from the credit card to fulfill an application and then deduct money. Due to this reason, it is important to find the best solution to detect the maximum number of frauds in online systems. Accuracy, Error-rate, Sensitivity and Specificity are used to report the performance of the system to detect the fraud in the credit card. In this paper, three machine learning algorithms are developed to detect the fraud in credit card system. To evaluate the algorithms, 80% of the dataset is used for training and 20% is used for testing and validation. Accuracy, error rate, sensitivity and specificity are used to evaluate for different variables for three algorithms. The accuracy result is shown for SVM; Decision tree and random forest classifier are 99.94, 99.92, and 99.95 respectively. The comparative results show that the Random Forest performs better than the SVM and decision tree techniques. Peer CRC UC Reta Cl 36 pee Rie mle ere-l Oo eee Ue cu eens Ls Future Enhancement Detection, we did end up creating a system that can, with enough time and data, get very close to that goal. As with any such project, there is some room for improvement here. The very nature of this project allows for multiple algorithms to be integrated together as modules and their results can be combined to increase the accuracy of the final result. This model can further be improved with the addition of more algorithms into it, However, the output of these algorithms needs to be in the same format as the others. Once that condition is satisfied, the modules are easy to add as done in the code. This provides a great degree of modularity and versatility to the project. More room for improvement can be found in the dataset. As demonstrated before, the precision of the algorithms increases when the size of dataset is increased. Hence, more data will surely make the model more accurate in detecting frauds and reduce the number of false positives. However, this requires official support from the banks themselves. BIBLIOGRAPHY [i] B.Meena, 18.L.Sarwani, $.V.S.S.Lakshmi,” Web Service mining and its techniques in Web Mining” JAEGT, Volume 2,Issue | , Page No.385-389, 2) F. N, Ogwuelcka, "Data Mining Application in Credit Card Fraud Detection 322, System", Journal of Engineering Science and Technology, vol. 6, no. 3, pp. 31 2019, [BL G. Singh, R. Gupta, A. Rastogi, M. D. $. Chandel, A. Riyaz, "A Machine Leaming Approach for Detection of Fraud based on SVM", International Journal of Scientific Engineering and Technology, vol. 1, no. 3, pp. 194-198, 2019, ISSN ISSN: 2277- 1581 14) K. Chaudhary, B, Mallick, "Credit Card Fraud: The study of its impact and detection techniques", Intemational Journal of Computer Science and Network (IICSN), vol. 1, no. 4, pp. 31-35, 2019, ISSN ISSN: 2277-5420. 15] M. J Islam, Q. M. 1. Wa, M. Abmadi, M. A. Side Ahmed, "Investigating the Performance of Naive-Bayes Classifiers and KNearestNeighbor Classifiers", IEEE International Conference on Convergence Information Technology, pp. 1541-1546, 2017. Is) R. Wheeler, S. Aitken, "Multiple algorithms for fraud detection” in Knowledge- Based Systems, Elsevier, vol. 13, no, 2, pp. 93-99, 2018, 7). S. Patil, H. Somavanshi, J. Gaikwad, A. Deshmane, R. Badgujar, "Credit Card Fraud Detection Using Decision Tree Induction Algorithm", Intemational Journal of ‘Computer Science and Mobile Computing (ICSMC), vol. 4, no, 4, pp. 92-95, 2020, ISSN ISSN: 2320-088X. Is] S. Maes, K. Tuyls, B. Vanschoenwinkel, B, Manderick,"Credit card fraud detection using Bayesian and neural networks", Proceedings of the Ist intemational naiso congresson neuro fuzzy technologies, pp. 261-270, 2017. [9] S. Bhattacharyya, S. Jha, K. Tharakunnel, J. C.Westland, "Data mining for eredit card fraud: A comparative study", Decision Support Systems, vol. 50, no. 3, pp. 602-613, 2019. luo} Y. Sahin, E. Duman, "Detecting credit card fraud by ANN and logistic regression", Innovations in Intelligent Systems and Applications (INISTA) 2018 International ‘Symposium, pp. 315-319, 2018. 38 APPENDIX Appendix A: Screen Shots | 5 at A aerer ne eee ee ” Fig 1 Correlation Matrix Fig 2 Dataset 39 ia dota = df éf froud = date vate “Class*) == 2 valid = data data ‘Class’ == 0 outLierFraction = len fraud /float’ len valid print outlierFraction fraud Anount describe .0017304750013189597 count 492.000000 mean = 122.211321 std 256.683288, nin 8.000000, 25% 900000 50x 250000 15% 105.8900 max -2125,870000 Name: Anount, dtype: float64 Fig 3 Data set reading code itd ST ae SEE Confusion matrix Tue css Fig 4 Confusion Matrix 40 Appendix B: Abbreviations CCED - Credit Card Fraud Detection CSDT — Cost Sensitive De ion Tree ML — Machine Learning SVM — Support Vector Machine URL - Uniform Resource 4

You might also like