0% found this document useful (0 votes)
19 views

IJCSE-01768

Uploaded by

Lalitha Abhigna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views

IJCSE-01768

Uploaded by

Lalitha Abhigna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

International Journal of Computer Sciences and Engineering Open Access

Review Paper Volume4, Issue-8 E-ISSN: 2347-2693

Review of Data Mining with Weka Tool


Kulwinder Kaur1*, Shivani Dhiman2
1
Department of Computer Science Engineering, Indus International University Una, India
2
Department of Computer Applications, Indus International University Una, India

Available online at: www.ijcseonline.org


Received: 22/Jun/2016 Revised: 10/Jul/2016 Accepted: 16/Aug/2016 Published: 31/Aug/2016
Abstract— Data mining is the process of extract unseen and hidden information from a large amount of data. It is a powerful
technology that helps researchers to find the meaningful information by providing different tools and technologies. In this
paper we focused on different tools, technologies and application area of data mining. Also discussed the weka tool,how we
build data set for weka and how this data set is loaded on weka.
Keywords-DataMining,MachineLearning,Clustering,Classification,WekaTool.

I INTRODUCTION
In this information era, a huge amount of data is collected Data Selection: In this stage understand the task objectives
daily. Analyzing that huge amount of data and extract and requirements and after that select only that data which is
meaningful information from that data is a necessity to helpful to achieve the project goal.
achieve goals. Now we are living in the world where a lot of Data Preprocessing: In this phase irrelevant data which is
data (scientific data, medical data, banking data, marketing not useful in project that is removed from that data.
data & Financial data etc) related to different fields are Data Transformation or Consolidation: In this phase
available but nobody have time to retrieve meaningful selected data which is to be transformed into appropriate
information from this data manually. To retrieve this forms which are used for mining process.
information in easy way, we find shortcut methods to Data Mining: This is the main & crucial phase,in which
automatically classify it, to automatically summarize it, to clever & intelligent techniques are applied on data so as to
automatically discover & characterize trends in it [1].Data extract the useful patterns.
Mining discover large datasets to dig out the unknown and Interpretation & Pattern evaluation: In this step data sets
earlier weird pattern, relationships and knowledge that are are evaluated & represent information.
not easy to detect with the algorithms & traditional statistical Knowledge Representation: In this phase, exposed
methods . Data mining has effectively been used in many knowledge is visually represented to the user that helps the
fields such as marketing, banking, medical, business, fraud user to simply understand the data mining result.
detection, weather forecasting etc [2].Data Mining or
KDD(knowledge discovery in database) is the process to find III Data Mining Techniques
the helpful knowledge from a collection of data. This is
mostly used data mining technique in this process that
includes data preparation and selection, data cleansing,
incorporating earlier knowledge on data sets and interpreting
perfect solutions from the pragmatic results.
II LITERATURE REVIEW

Fig. 2: Data mining different techniques


Fig.1: An outline of the steps of the KDD process

© 2016, IJCSE All Rights Reserved 41


International Journal of Computer Sciences and Engineering Vol.-4(8), PP(41-44) Aug 2016, E-ISSN: 2347-2693

There are a number of core techniques that are used in data Telecommunication & fraud detection: In today world
mining describes the type of mining and data recovery where a large amount of data saved on cloud daily, that is the
operations such as Association, Classification, Clustering, reason to increase the fraud and crime cases. To control
Prediction, Sequential Patterns, Decision Trees. these types of fraud cases industries & companines now use
Data mining.
Association or Relation: Association is the best known data
mining technique. Association is the process of finding the V Tools for Data Mining
relationships between different modules that are present in
same database. This technique is used to find out relevant Data plays a very important role in today’s world, most of
modules from the database such as to find out how the the data is in structured form and as well as in unstructured
purchase behaviour of one item affects the purchase form. A lot of the data is in unstructured form and it takes a
behaviour of another item. Association rules are created by procedure and system to extract useful information from the
analyzing the data for frequent if/then pattern and using the data and transform it into understandable and usable form.
criteria support and confidence to identify the relationship. Number of tools are available for data mining tasks that used
artificial intelligence, machine learning and other techniques
Classification: Classification is mainly a machine learning to extract data. Here are some of the powerful open source
technique in data mining that assigns and identify the objects data mining tools :
in a group to target categories or classes. In Classification
method, we can use mathematical techniques such as Orange: Orange is an open source data mining tool written
decision trees, linear programming, neural network and in python language. It is a component based & machine
statistics. In classification, we build the method by using that learning tool which is used for data visualization. In this tool
method we can learn how to classify the data items into data mining can be done through visual programming &
groups. Basically, classification is used to classify each item python scripting.
in a set of data into one of a predefined set of classes or
groups [3]. Rapid Miner: Rapid Miner is written in the Java
Programming language, this tool offers advanced analytics
Clustering: Clustering is a process to partitioning a set of through template-based frameworks. Rapid Miner also
data that makes a meaningful or useful cluster of objects provides functionality like data pre-processing, visualization,
which have similar characteristics. The clustering technique predictive analytics, statistical modelling, evaluation and
defines the classes and puts objects in each class [4].For deployment. Rapid Miner is used in business. Industrial
example in prediction of blood pressure by using clustering application, research, education etc.
we get cluster or we can say that list of patients which have
same risk factor means this makes separate list of patients Weka: The original non-Java version of WEKA primarily
that with related risk factor. was developed for analyzing data from the agricultural
Prediction: Prediction is one of the data mining technique domain. With the Java-based version, the tool is very
that discovers relation between independent & dependent sophisticated and used in many different applications
variable. [4] including visualization and algorithms for data analysis and
predictive modelling. Its free under the GNU General Public
License, which is a big plus compared to Rapid Miner,
IV Data Mining Applications because users can customize it however they please. WEKA
supports several standard data mining tasks, including data
Data mining is used in various applications some of them are
pre-processing, clustering, classification, regression,
given below:
visualization and feature selection. WEKA would be more
Sales & Marketing: Data mining enables us to understand
powerful with the addition of sequence modelling, which
the hidden patterns inside the data that helps in planning
currently is not included.
strategy of marketing.
Health Care Industry: The growth of health industry is
VI Literature survey on Data Mining
increasing day by day. Data Mining helps to store all the data
of patients those who are suffering from same type of
In paper [5] the author represent different survey papers in
disease. which one or more algorithms used in prediction of heart
Education & Sports: In this field a vast amount of statistics disease.By applying different algorithms the best results
data are collected for each student, teacher, subject and found by the neural networks that gives the 100% accuracy
session. Data mining can be used by education organizations & decision tree gives 99.62 % accuracy of results in
in the form of statistical analysis, pattern discovery as well as perdiction of heart disease.
for prediction.

© 2016, IJCSE All Rights Reserved 42


International Journal of Computer Sciences and Engineering Vol.-4(8), PP(41-44) Aug 2016, E-ISSN: 2347-2693

In Paper [6] the aim of the author is to investigate the


performance of different classification & clustering methods
on a large data set breast cancer. By applying different
algorithms the finest results are found by using Bayes
Network classifier with the 89.71% accuracy & the time
taken to build the model is at 0.19 seconds.
In paper [7] the objective of the paper is to predict the more
accurate results in heart disease. In this paper the author
apply three algorithms on heart disease data set these are
naive bays, j48 decision tree and bagging algorithm and in
result the bagging is the one of the successful data mining
technique used in to diagnosis of heart disease patients. The
results show that bagging algorithm accuracy is 80.03% and
time taken to build the model is .05 seconds.
Fig. 4: Weka Explore
VII Weka A. Building the data set for WEKA
Data mining isn't only the ground of large companies and WEKA is a tool, that accepts Data set as input in Attribute-
costly software. In fact, there's a bit of software that does Relation File Format (ARFF). In the ARFF data file, you
almost all the same things as these expensive pieces of define each column and what each column contains. The
software — the software is called WEKA. WEKA is the ARFF file we'll be using with WEKA appears below [3].
product of the University of Waikato (New Zealand) and was
first implemented in its modern form in 1997. The software WEKA FILE FORMAT IN ARFF FORM
is written in the Java™ language and contains a GUI for @RELATION house
interacting with data files and producing visual results
[3].Weka includes so many machine learning algorithms for @ATTRIBUTE houseSize NUMERIC
data mining tasks. @ATTRIBUTE lotSize NUMERIC
WEKA start-up screen @ATTRIBUTE bedrooms NUMERIC
@ATTRIBUTE granite NUMERIC
@ATTRIBUTE bathroom NUMERIC
@ATTRIBUTE sellingPrice NUMERIC

@DATA
3529,9191,6,0,0,205000
3247,10061,5,1,1,224900
4032,10150,5,0,1,197900
2397,14156,4,1,0,189900
2200,9600,4,0,1,195000
3536,19994,6,1,1,325000
Fig. 3: Weka startup screen 2983,9365,5,0,1,230000
B. Loading the data into WEKA
When you start WEKA, the GUI chooser window open and Now the ARFF data file that has been created will be loaded
lets you choose four ways to work with WEKA. According in Weka Tool with the help of following Procedure, Start
to our data set we choose only the Explorer option. This WEKA, then choose the Explorer. You'll be taken to the
option is more than enough for everything explorer option Explorer screen, with the Pre-process tab selected. Select
provides a variety of algorithms to work on data set. the Open File button and select the ARFF file you created in
the section above. After selecting the file, your WEKA
Explorer will look like this:

© 2016, IJCSE All Rights Reserved 43


International Journal of Computer Sciences and Engineering Vol.-4(8), PP(41-44) Aug 2016, E-ISSN: 2347-2693

Validity”, International Journal of Computer Applications,


Volume-113, Issue-19,Page No (22-29),March2015.
[4] A. Michael, “IBM developerWorks : IBM's resource for
developers and IT,” 27 April 2010. [Online]. Available:
http://www.ibm.com/developerworks/library/ba-data-mining-
techniques/.
[5] Beant Kaur, Williamjeet Singh “Review on heart disease
prediction using data mining techniques,” International Journal
on recent and innovation trends in computer and
communication , Volume- 2, Issue-10,Page No( 3003-3008),
October2014.
[6] Vikas Chaurasia, Saurabh Pal “Data Mining Approach to
Detect Heart Dieses,” International Jouranal of Advanced
Computer Science and Information Technology,Volume-02,
Issue-04, Page No (56-66), 2013.
Fig. 5: WEKA with house data loaded [7] Mohd Fauzi bin Othman, Thomas Moh Shan Yau
“Comparision of different classificaton techniques using
WEKA for Breast Cancer,” Springer, Volume-15, Issue-04,
In this way, WEKA allows you to review the data you're
Page No (520-523), 2007.
working with. In the left section of the Explorer window, it
outlines all of the columns in your Attributes and the number
of rows of data supplied. By selecting a column, information AUTHORS PROFILE
about the data in that column of your data set will be shown. Kulwinder Kaur is currently pursuing M.Tech
in Department of Computer Science and
For example, by selecting the house Size column in the left
Engineeirng, Indus International University
section, the right-section should change to show you Una, India. Her research interests include
additional statistical information about the column. It shows applications of Data Mining in Medical sector.
the maximum value in the data set for this column is 4,032
square feet, and the minimum is 2,200 square feet. The Shivani Dhiman, B.Tech, Mtech (Computer
average size is 3,131 square feet, with a standard deviation of Science and Engineering) presently working as
655 square feet. Finally, there's a visual way of examining Asst. Prof. in the Department of Computer
the data, which you can see by clicking the Visualize Applications at Indus International University
All button. Una, India. Her area of interests include .

Conclusion

This paper has attempted to review the extremely dynamic


and substantial area data mining. In this paper we discussed
the basic process of data mining, Importance of Data mining,
Different strategies that are used for data mining like
classification, prediction, clustering, and association rules,
Different phases in order to find the useful patterns and
knowledge. We also discussed about the tools available now
used for various operations related to Data mining like
WEKA, Orange, Rapid Miner etc.In this paper we briefly
discussed weka tool. What type of file weka tool received
and how we upload file in weka etc.
REFERENCES
[1] Jiawei Han, Micheline Kamber “ Data Mining: Concepts and
Techniques”, Morgan Kaufmann Publishers, Second Edtion-
2006, ISBN: ISBN: 978-1-5090-0669-4
[2] Ravneet Jyot Singh, Williamjeet Singh "Data Mining in
Healthcare for Daibetes Mellitus", International Journal of
Science and Research, Volume-03, Issue-07, Page No (1993-
1998), July 2014.
[3] Mansi Gera, Shivani Goel “Data Mining – Techniques ,
Methods and Algorithms: A Review on Tools and their

© 2016, IJCSE All Rights Reserved 44

You might also like