IJCSE-01768
IJCSE-01768
I INTRODUCTION
In this information era, a huge amount of data is collected Data Selection: In this stage understand the task objectives
daily. Analyzing that huge amount of data and extract and requirements and after that select only that data which is
meaningful information from that data is a necessity to helpful to achieve the project goal.
achieve goals. Now we are living in the world where a lot of Data Preprocessing: In this phase irrelevant data which is
data (scientific data, medical data, banking data, marketing not useful in project that is removed from that data.
data & Financial data etc) related to different fields are Data Transformation or Consolidation: In this phase
available but nobody have time to retrieve meaningful selected data which is to be transformed into appropriate
information from this data manually. To retrieve this forms which are used for mining process.
information in easy way, we find shortcut methods to Data Mining: This is the main & crucial phase,in which
automatically classify it, to automatically summarize it, to clever & intelligent techniques are applied on data so as to
automatically discover & characterize trends in it [1].Data extract the useful patterns.
Mining discover large datasets to dig out the unknown and Interpretation & Pattern evaluation: In this step data sets
earlier weird pattern, relationships and knowledge that are are evaluated & represent information.
not easy to detect with the algorithms & traditional statistical Knowledge Representation: In this phase, exposed
methods . Data mining has effectively been used in many knowledge is visually represented to the user that helps the
fields such as marketing, banking, medical, business, fraud user to simply understand the data mining result.
detection, weather forecasting etc [2].Data Mining or
KDD(knowledge discovery in database) is the process to find III Data Mining Techniques
the helpful knowledge from a collection of data. This is
mostly used data mining technique in this process that
includes data preparation and selection, data cleansing,
incorporating earlier knowledge on data sets and interpreting
perfect solutions from the pragmatic results.
II LITERATURE REVIEW
There are a number of core techniques that are used in data Telecommunication & fraud detection: In today world
mining describes the type of mining and data recovery where a large amount of data saved on cloud daily, that is the
operations such as Association, Classification, Clustering, reason to increase the fraud and crime cases. To control
Prediction, Sequential Patterns, Decision Trees. these types of fraud cases industries & companines now use
Data mining.
Association or Relation: Association is the best known data
mining technique. Association is the process of finding the V Tools for Data Mining
relationships between different modules that are present in
same database. This technique is used to find out relevant Data plays a very important role in today’s world, most of
modules from the database such as to find out how the the data is in structured form and as well as in unstructured
purchase behaviour of one item affects the purchase form. A lot of the data is in unstructured form and it takes a
behaviour of another item. Association rules are created by procedure and system to extract useful information from the
analyzing the data for frequent if/then pattern and using the data and transform it into understandable and usable form.
criteria support and confidence to identify the relationship. Number of tools are available for data mining tasks that used
artificial intelligence, machine learning and other techniques
Classification: Classification is mainly a machine learning to extract data. Here are some of the powerful open source
technique in data mining that assigns and identify the objects data mining tools :
in a group to target categories or classes. In Classification
method, we can use mathematical techniques such as Orange: Orange is an open source data mining tool written
decision trees, linear programming, neural network and in python language. It is a component based & machine
statistics. In classification, we build the method by using that learning tool which is used for data visualization. In this tool
method we can learn how to classify the data items into data mining can be done through visual programming &
groups. Basically, classification is used to classify each item python scripting.
in a set of data into one of a predefined set of classes or
groups [3]. Rapid Miner: Rapid Miner is written in the Java
Programming language, this tool offers advanced analytics
Clustering: Clustering is a process to partitioning a set of through template-based frameworks. Rapid Miner also
data that makes a meaningful or useful cluster of objects provides functionality like data pre-processing, visualization,
which have similar characteristics. The clustering technique predictive analytics, statistical modelling, evaluation and
defines the classes and puts objects in each class [4].For deployment. Rapid Miner is used in business. Industrial
example in prediction of blood pressure by using clustering application, research, education etc.
we get cluster or we can say that list of patients which have
same risk factor means this makes separate list of patients Weka: The original non-Java version of WEKA primarily
that with related risk factor. was developed for analyzing data from the agricultural
Prediction: Prediction is one of the data mining technique domain. With the Java-based version, the tool is very
that discovers relation between independent & dependent sophisticated and used in many different applications
variable. [4] including visualization and algorithms for data analysis and
predictive modelling. Its free under the GNU General Public
License, which is a big plus compared to Rapid Miner,
IV Data Mining Applications because users can customize it however they please. WEKA
supports several standard data mining tasks, including data
Data mining is used in various applications some of them are
pre-processing, clustering, classification, regression,
given below:
visualization and feature selection. WEKA would be more
Sales & Marketing: Data mining enables us to understand
powerful with the addition of sequence modelling, which
the hidden patterns inside the data that helps in planning
currently is not included.
strategy of marketing.
Health Care Industry: The growth of health industry is
VI Literature survey on Data Mining
increasing day by day. Data Mining helps to store all the data
of patients those who are suffering from same type of
In paper [5] the author represent different survey papers in
disease. which one or more algorithms used in prediction of heart
Education & Sports: In this field a vast amount of statistics disease.By applying different algorithms the best results
data are collected for each student, teacher, subject and found by the neural networks that gives the 100% accuracy
session. Data mining can be used by education organizations & decision tree gives 99.62 % accuracy of results in
in the form of statistical analysis, pattern discovery as well as perdiction of heart disease.
for prediction.
@DATA
3529,9191,6,0,0,205000
3247,10061,5,1,1,224900
4032,10150,5,0,1,197900
2397,14156,4,1,0,189900
2200,9600,4,0,1,195000
3536,19994,6,1,1,325000
Fig. 3: Weka startup screen 2983,9365,5,0,1,230000
B. Loading the data into WEKA
When you start WEKA, the GUI chooser window open and Now the ARFF data file that has been created will be loaded
lets you choose four ways to work with WEKA. According in Weka Tool with the help of following Procedure, Start
to our data set we choose only the Explorer option. This WEKA, then choose the Explorer. You'll be taken to the
option is more than enough for everything explorer option Explorer screen, with the Pre-process tab selected. Select
provides a variety of algorithms to work on data set. the Open File button and select the ARFF file you created in
the section above. After selecting the file, your WEKA
Explorer will look like this:
Conclusion