0% found this document useful (0 votes)

39 views7 pages

Experimental Evaluation of Open Source Data Mining

Uploaded by

Felipe Lima Fortis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views7 pages

Experimental Evaluation of Open Source Data Mining

Uploaded by

Felipe Lima Fortis

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/345203774

Experimental Evaluation of Open Source Data Mining Tools (WEKA and

Orange)

Article in International Journal of Engineering Trends and Technology · August 2020

DOI: 10.14445/22315381/IJETT-V68I8P206S

CITATIONS READS

20 1,945

2 authors:

Ritu Ratra Preeti Gulia

Maharshi Dayanand University Maharshi Dayanand University
7 PUBLICATIONS 79 CITATIONS 105 PUBLICATIONS 463 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Preeti Gulia on 12 November 2020.

The user has requested enhancement of the downloaded file.

International Journal of Engineering Trends and Technology (IJETT) – Volume 68 Issue 8 - Aug 2020

Experimental Evaluation of Open Source Data

Mining Tools (WEKA and Orange)
Ritu Ratra1 , Preeti Gulia2
1.
Research Scholar, Department of Computer Science and Applications, Maharshi Dayanand University,
Rohtak, Haryana, India
2.
Assistant Professor, Department of Computer Science and Applications, Maharshi Dayanand University,
Rohtak, Haryana, India
1. 2.
[email protected] [email protected]

Abstract: Nowadays, it is possible for every time consuming task to execute that data. So, there is
organisation to manage the large dataset at a requirement of automated tools that can help the
minimum cost. But in order to collect the fruitful researcher to convert that messy data into useful
information, it is mandatory to utilize the large information. Few years ago, there are so many data
volume of stored data. Data mining is an on-going mining software tools have been developed to
process of searching pattern and collecting useful overcome this problem. Some of them are freely
information from large datasets for future use. There available as open-source tools. The affirmation of
is no doubt that Data mining is very important in open source tools of information sharing for
various areas like education, military, e-business, implementations of different machine learning
healthcare etc. The main objective of data mining algorithms can be most beneficial for the complete
process is to supervise the data from various sources field [10].
in different manner then assemble it to collect the In this paper, a comparative study is conducted
useful information. It can be done by the help of among various classification algorithms like
various tools and techniques. There are a number of Random Forest tree, K-Nearest Neighbour and
data mining tools available in the digital world that Naïve bayes algorithm using WEKA and Orange
can help the researchers for the evaluation of the tool. The evaluation metrics Precision and Recall are
data. These tools work as an interface to receive the used to analyze the performance of the both the tools
data and to extract some meaningful patterns out of with the help of various classification algorithms.
large dataset. Selection of best tool according to The following Classification Algorithms have been
requirement is not an easy task. In order to find out used for the experimentation:
the best data mining tool for classification problem,  Naïve Bayes: Naive Bayes classifier is a
comparison of various tools is necessary on the group of simple probabilistic algorithms.
basis of different parameters. In this paper, data These are based on Bayes' theorem. In it
mining tools WEKA and Orange are analysed on the algorithm is applied with powerful
the basis of implementation of parameters. The main assumptions between the various features.
objective of this comparison is to help the  K-Nearest Neighbour:It is a simple
researchers to select the suitable tool from these classifier that saves all cases that are
two. available and then generates new cases
based on a similar measurement e.g.,
Keywords: Classification, Naïve Bayes, Random distance functions.
Forest tree, WEKA, Orange, Precision, Recall.  Random forest: It is almost same as
Decision tree classifier. But it adds some
I. INTRODUCTION randomness to the model at the time of
In present scenario, data is increasing day by day
making the tree. It can produce great
according to different parameters. It is very difficult
results without the help of hyper
for a person to analyse the large volume of data for
parameter. It builds different decision trees
perfect decision making. Hence, there is need of
and then combines them to generate more
data mining to extract valuable and useful data from
stable prediction.
the available data. Data mining is the process of
finding the most useful knowledge from the large To handle huge volume of data, there are several
volume of data available in databases or data tools available for the user. Moreover it is not easy
repositories. Classification is one of the most to include all the features in single tool .That’s why a
important problems in data mining, which is a number of different varieties of tools have been
collection of finding rules that divides the given data introduced [8][10]. In this paper, two data mining
into different classes. These classes are predefined. tools i.e. WEKA and Orange will be compared.
There is trillions of data available in the form of These tools have different characteristics,
different types in digital world. Manually, it very functionality and capabilities. Researchers can use

ISSN: 2231-5381 http://www.ijettjournal.org Page 30

International Journal of Engineering Trends and Technology (IJETT) – Volume 68 Issue 8 - Aug 2020

these according to their research activities developed at the University of Waikato in New
requirements. These tools are continuously upgraded Zealand. WEKA is a data mining tool that allows
with new features as per the needs of the user which data pre-processing process. Attribute selection is
are changing day by day. It is very typical to deal very interesting feature of WEKA. It enhances the
with the complexity of huge data. effectiveness and accuracy of selected data. WEKA
comes with these functionalities: command-line
The rest of the paper flow is as follows: section II interface (CLI), Explorer, Experimenter and
describes open source software, section III describes Knowledge flow and weka workbench. Explorer is
the comparative study of WEKA and Orange tool on used to define the data source, preparation, selection
the basis of parametric comparison and experimental of algorithms, and visualization. The Experimenter
analyses and conclusions and future scope is is helpful for the comparison of the different
discussed in Section IV. algorithms on same dataset.

II. OPEN SOURCE SOFTWARE In WEKA software,secondary data can be used to

Open source software is computer software in which analyse. Researcher can apply algorithm to a data
the source code publically available for user under a set and can analyse the results to make decision
license. In this license copyright holder permit the about the data, various predictions can also generate
users to use it. They can inspect and update it and to predict the new instances. Even though, this tool
can also distribute it to anyone for use. Open source support a lot of model evaluation metrics, but there
software is cheap and flexible because it is is absence of many data survey and visualization
developed by group of company rather than a single methods [6]. WEKAis more towards the
programmer. The common open-source licenses are classification and regressionand less towards the
GPL, general people consent (GNU.org, 2015a), descriptive statistics and clustering methods. There
GNU (GNU.org, 2015b), Mozilla Public License is less support for big data and semi-supervised
(MPL), Berkeley Software Distribution (BSD), learning in WEKA [11]. WEKA is a tool that
Netscape Public License (NPL) and Lesser General available freely for download.Popular features of
Public License (LGPL) [10] WEKA are shown in Figure 1.
There are lot of open-source data mining tools are
available for data mining process such as the As shown in the figure most famous feature of
KNIME, RapidMiner, Orange, WEKA, R- WEKA are as: It is an open source data mining tool
Programming etc. These data mining tools are that is based on JAVA language. It is very easy to
assembled with a set of techniques and algorithms understand and use for the beginners and it has the
that are very helpful in better data analytics. capability of running and comparing several
Researcher can take help in classification, clustering algorithms. It is able to perform different data
and visualization of data. These tools are also useful mining activities including: Data preprocessing,
for regression analysis, Predictive analytics etc. clustering, Classification, Association Rule,
These tools are present with their own functionalities knowledge discovery etc. There are a number of
to help the user with their work. In this paper, built in features in WEKA that makes easy for the
WEKA and Orange tool are described. users. Without the knowledge of programming and
coding, researcher can use it for analyses.
A. WEKA: WEKA is a popular toolkit for learning
the machine learning algorithm. It was originally

Open source data mining tool

Java Based Tool

It is platform independent

Data preprocessing, Classification rules, regression, Clustering, association rules,

visualization, feature selection and improving the knowledge discovery

No programming and coding language required

Provide access to SQL databases.

It provides various machine learning algorithms for data mining tasks.

Figure 1: Features of WEKA

ISSN: 2231-5381 http://www.ijettjournal.org Page 31

International Journal of Engineering Trends and Technology (IJETT) – Volume 68 Issue 8 - Aug 2020

B. Orange or quality is involved[12],[13]. Basically Orange

Orange is also freely available open source data Canvas is quite useful for visual programming
mining software. It can be useful for explorative data interface. It provides a well-structured view of
analytics and visualization. It provides a platform for different features. These features are depicting in
different experiment selection. Orange is very figure 2.
effective when the concept of innovation, reliability

Visualiza
tions
Associati Classific
on ation

Visualiza Unsuper
tion ORANGE vised
using Qt learning

Prototyp
e Regressi
impleme on
ntations
Evaluatio
n

Figure2: Features of Orange tool

This figure depicts different features of Orange Data The activity is measured by the frequency of updates
Mining. Visualization of data, classification, and time of latest update. Whenever there is
evaluation, comparison between two tools, then it becomes
necessary to compare them both parametrically and
unsupervised learning, association, visualization experimentally. After then reliable results could be
using Qt, and prototype implementations are some achieved. So in this manner, let us start with
famous features of Orange. The cross-platform parametrical comparison and then analysed the
application of orange is QT and developers can use experimental results.
UI framework for applications. It can be done by
using C++. CSS & JavaScript like language. Orange A. Parametric Comparison: In parametric
tool’s working is visually represented by using comparison, all the characteristics of tools are taken
different widgets for example reading file, training from previous available sources. These
SVM classifier etc. Every widget is self-explained characteristics were listed in Table I. Some
i.e. has a short description about itself is within the characteristics are common in both tools for example
interface. To program, first of all widgets are placed Graphical User Interface (GUI) functionalities,
on the canvas and then inputs and outputs are command line of are in both tool [18], [19].
connected. The widgets available are limited in
Orange in counting as compared to other tools.

III. COMPARATIVE ANALYSIS

Table I: General Characteristics of Open-Source DM Tool WEKA and Orange

Parameters WEKA ORANGE

Company Name University of Waikato University of Ljubljana

New Zealand Switzerland

Source http://www.cs.waikato.ac.nz/ml/weka/ http://orange.biolab.si

Programming language JAVA C++, Python

Released date` 1993 1996

ISSN: 2231-5381 http://www.ijettjournal.org Page 32

International Journal of Engineering Trends and Technology (IJETT) – Volume 68 Issue 8 - Aug 2020

License GNU General Public License Open-source, GNU GPLv3

Availability Open Source Open Source

Current Version 3.8 3.24.1

Areas Machine learning, Data visualization, Marketing, Direct Mail Financial

time series and analysis, text mining, Service, Manufacturing, Health
fraud detection Care, Military

Portability Cross Platform Cross Platform

Logo

GUI/Command line Both Both

B. Technical comparison of WEKA and Orange

To make technical comparison between these tools,

first of all these free data mining and knowledge
discovery tools are to be downloaded. After then
specified the datasets to be used and selecting some
classification algorithm to test the performance of
tools. Precision and Recall are most popular
evaluation metrics of model. To make comparison
these are used in this paper.
Figure 3: Precision and Recall [15]
1) Precision: Precision is positive predictive value.
It is defined as the average probability of relevant Data set: The dataset Heart Disease is used for the
retrieval. work. It is taken from UCI Machine Learning
repository and Cleveland heart disease dataset is
Precision = Number of true positives/(Number of selected for the study. It has 303 instance and 76
true positives + False positives). attributes.
2) Recall: Recall is the average probability of The comparison between these tool are well shown
complete retrieval. through the table II and Table III
Recall= True positives/True positives + False Table II: Comparative study of WEKA and
negative Orange tool
Precision Metric

Classifier WEKA(%) Orange(%)

Naïve bays 83.7 82.4

Random 81.8 77.9
Forest
k-nearest 75.3 58.0

ISSN: 2231-5381 http://www.ijettjournal.org Page 33

International Journal of Engineering Trends and Technology (IJETT) – Volume 68 Issue 8 - Aug 2020

Table III: Comparative study of WEKA and Information and Communication Technology, Electronics
and Microelectronics (MIPRO), 2014 37th International
Orange tool Convention, (May), 26–30. Retrieved from
Recall Metric http://www.zemris.fer.hr/~ajovic/articles/MIPRO
2014_final.pdf
Classifier WEKA(%) Orange(%) [2] Alcalá-Fdez, J., Sánchez, L., & García, S. (2009). “KEEL:
a software tool to assess evolutionary algorithms for data
mining problems”. Soft Computing. Retrieved from
Naïve bays 83.7 80.6 http://link.springer.com/article/10.1007/s00500-008-0323-
Random 81.9 73.4 y
[3] Collier, K., Ph, D., Carey, B., & Marjaniemi, C. (1999). “A
Forest Methodology for Evaluating and Selecting Data Mining
k-nearest 75.2 54.7 Software” Keywords : Data Mining , Tool Evaluation ,
Knowledge Discovery, 00(c), 1–11.
[4] Sonnenburg, S., Braun, M., & Ong, C. (2007). “The need
for open source software in machine learning”, 8, 2443–
When the dimension of the input data is high, then 2466. Retrieved from
Naïve Bayes Classifier algorithm is most suited. http://researchcommons.waikato.ac.nz/handle/10289/3928
Naive Bayes is particularly applicable in artificial [5] Chen, X., Ye, Y., Williams, G., & Xu, X. (2007). “A
intelligence. When comparative study is made , the survey of open source data mining systems”. Emerging
Technologies in Knowledge Discovery and Data Mining,
analysis of precession and recall is analysing for (60603066), 3– 14. Retrieved from
heart disease data sets precession in Orange 82.4% http://link.springer.com/chapter/10.1007/978-3-540-
and Recall 80.6%. In WEKA the value of precision 770183_2
is 83.7% and Recall 83.7 %. WEKA tool is best is [6] Jović, A., Brkić, K., & Bogunović, N. (2014). “An
overview of free software tools for general data mining”.
best precession and Recall as compare to Orange Information and Communication Technology, Electronics
tool in Naïve bayes classifier. Same is happened and Microelectronics (MIPRO), 2014 37th International
with Random forest and k-nearest classifiers. In Convention, (May), 26–30. Retrieved from
Random Forest, precision value in Orange is 77.9% http://www.zemris.fer.hr/~ajovic/articles/MIPRO
2014_final.pdf
and Recall value is 73.4%. In WEKA the value of [7] Kalpana Rangra, Dr. K. L. Bansal. “Comparative Study of
precision is 81.8% and Recall 81.9 %. And in k- Data Mining Tools”, presented at International Journal of
nearest algorithm, precision value in Orange is 58% Advanced Research in Computer Science and Software
and Recall value is 54.7%. In WEKA the value of Engineering, Volume 4, Issue 6, 2014.
[8] Dr. Anil Sharma, Balrajpreet Kaur,” A RESEARCH
precision is 75.3% and Recall 75.2 %. REVIEW ON COMPARATIVE ANALYSIS OF DATA
MINING TOOLS, TECHNIQUES AND PARAMETERS”,
IV. CONCLUSION AND FUTURE STUDY ISSN No. 0976-5697, International Journal of Advanced
This paper presents the study of two different open Research in Computer Science, volume 8, No. 7, July –
August 2017.
source Data mining tools along with their features- [9] H.Witten, E. Frank, M. A.Hall, “Data Mining practiced
WEKA and Orange. Both tools have their own machine learning tools and techniques”, 3rd ed., Morgan
merits and demerits This paper specifies the Kaufmann Elsevier: USA,2011.
comparison between these tools by experimental [10] Predictive Analytics [Online].Available
from:http://www.predictiveanalyticstoday.com/top-
analysis and by using their parameters. This softwarefor-text-analysis-text-mining-text-analytics/
comparative study is based on datasets and [11] Jović, A., Brkić, K., & Bogunović, N. “An overview of free
algorithms. It may be possible that the results may software tools for general data mining. Information and
vary with different datasets or algorithms. The Communication Technology”, Electronics and
Microelectronics (MIPRO), 2014 37th International
comparative analysis is helpful in learning and Convention, (May), 26–30. Retrieved from
selection of the data mining tools as per the areas. http://www.zemris.fer.hr/~ajovic/articles/MIPRO
By employing experimental study, it is to be 2014_final.pdf
concluded that WEKA tool is better than Orange. It [12] http://www.kdnuggets.com/2015/12/ top-7-newfeatures-
orange-3.html/2
can be stated that WEKA has most desired features [13] Orange Data Mining, ‘Orange Data Mining Library
for a fully-functional and user friendly platform for Documentation Release 3’.
classification problems. So, WEKA can be [14] http://orange.biolab.si/
recommended for Classification problems of data [15] http://Precision%20and%20recall%20-%20Wikipedia.PDF
[16] M.Hall, E.Frank , G.Holmes, B.Reutemann , IH
mining. In the future work, different data sets and Witten,"The WEKA Data Mining Software: An Update,"
different problems like clustering, association rule SIGKDD Explorations,2009.
mining will be taken and applied using these tools. [17] A.Wahbeh.,"A Comparison Study between Data Mining
Tools over some Classification Methods," International
Journal of Artificial Intelligence,2012.
ACKNOWLEDGEMENTS [18] Swasti Singhal, Monika Jena. “A Study on WEKA Tool for
The authors are thankful to the Data Preprocessing, Classification and Clustering”
http://archive.ics.uci.edu/ml/datasets/heart+Diseasef presented at International Journal of Innovative
or providing the dataset. Technology and Exploring Engineering (IJITEE),
Volume-2, Issue-6,2013.
[19] http://www.ionos.com>digitalguide
REFERENCES [20] http://www.google.com
[21] Venkateswarlu Pynam , R Roje Spanadna, Kolli Srikanth,
[1] Jović, A., Brkić, K., & Bogunović, N. (2014). “An “An Extensive Study of Data Analysis Tools (Rapid Miner,
overview of free software tools for general data mining”.

ISSN: 2231-5381 http://www.ijettjournal.org Page 34

International Journal of Engineering Trends and Technology (IJETT) – Volume 68 Issue 8 - Aug 2020

Weka, R Tool, Knime, Orange)”, SSRG International [27] A. kumar, et al., “ Data mining: various issues and
Journal of Computer Science and Engineering ( SSRG – challenges for future," IJETA,2014
IJCSE ) – Volume 5 Issue 9 – September 2018, ISSN: [28] H. Nasereddin," NEW TECHNIQUE TO DEAL WITH
2348 – 8387,pp 4-11. DYNAMIC DATA MINING IN THE DATABASE,"
[22] http://opensourceforu.com/2017/03/top-10-open-source- IJRRAS,.December 2012.
datamining-t ools/ [29] J.Demšar and B.Zupan, “Orange: Data Mining Fruitful
[23] Nurdatillah Hasim, Norhaidah Abu Haris, “A Study of and Fun - A Historical Perspective”, 2012.
Open-Source Data Mining Tools for Forecasting”, [30] C.Shah, A.Jivani, ”Comparison of data mining
IMCOM '15, January 08 - 10 2015, BALI, Indonesia. classification algorithms for breast cancer prediction”, 4th
[24] Witten, I. H., & Eibe, F. (2005), “Data Mining: Practical ICCCNT ,IEEE,2013.
Machine Learning Tools and Techniques”, (2nd ed., p. [31] P.Kakkar, A.Parashar, “Comparison of different clustering
525). Algorithm using WEKA tool”, International Journal of
[25] Sonnenburg, S., Braun, M., & Ong, C., “The need for open Advanced Research in Technology, Engineering and
source software in machine learning”, 8, 2443–2466. Science, 2014.
2007. Retrieved from [32] N.Chauhan and N.Gautam, “Parametric comparison of
http://researchcommons.waikato.ac.nz/handle/10289/3928. data mining tools,” IJATES, 2015.
[26] 12 data mining tools and techniques [Online]. Available: [33] A.Gupta, N.Chetty , S.Shukla, “A classification method to
https://www.invensis.net/blog/data-processing/12- classify High Dimensional data”,IEEE,2015.
datamining-tools-techniques.

ISSN: 2231-5381 http://www.ijettjournal.org Page 35

View publication stats

Look Inside - The Software Engineers Guidebook
No ratings yet
Look Inside - The Software Engineers Guidebook
21 pages
Iphone Bill 15
No ratings yet
Iphone Bill 15
2 pages
Trackpad Pro Ver. 5.0 Class 8
From Everand
Trackpad Pro Ver. 5.0 Class 8
Nidhi Arora
No ratings yet
A Comparative Study On Data Mining Tools: Related Papers
No ratings yet
A Comparative Study On Data Mining Tools: Related Papers
4 pages
Final Project ML Report
No ratings yet
Final Project ML Report
6 pages
Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number: I10030789S19/19©BEIESP DOI: 10.35940/ijitee.I1003.0789S19
No ratings yet
Published By: Blue Eyes Intelligence Engineering & Sciences Publication Retrieval Number: I10030789S19/19©BEIESP DOI: 10.35940/ijitee.I1003.0789S19
1 page
Latest Tools For Data Mining and Machine Learning
No ratings yet
Latest Tools For Data Mining and Machine Learning
1 page
Latest Tools For Data Mining and Machine
No ratings yet
Latest Tools For Data Mining and Machine
6 pages
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
From Everand
Comprehensive Guide to Glue for Scientific Data Exploration: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
A Study of Open-Source Data Mining Tools For Forecasting: Nurdatillah Hasim Norhaidah Abu Haris
No ratings yet
A Study of Open-Source Data Mining Tools For Forecasting: Nurdatillah Hasim Norhaidah Abu Haris
4 pages
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
From Everand
Applied Data Mining with Weka: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Ese Lab - Sanoj-159
No ratings yet
Ese Lab - Sanoj-159
11 pages
5
No ratings yet
5
1 page
List of Practical
No ratings yet
List of Practical
66 pages
Practical DWDM
No ratings yet
Practical DWDM
32 pages
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Comparative Analysis of Various Decision PDF
No ratings yet
Comparative Analysis of Various Decision PDF
7 pages
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
From Everand
Big Data: Statistics, Data Mining, Analytics, And Pattern Learning
Rob Botwright
No ratings yet
Decision Tree Analysis On J48 Algorithm PDF
No ratings yet
Decision Tree Analysis On J48 Algorithm PDF
6 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Ijiset V2 I2 63 PDF
No ratings yet
Ijiset V2 I2 63 PDF
9 pages
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
From Everand
Mastering Data Science with Python: The Ultimate Guide: Unlock the Power of Data Analysis and Visualization with Python's Cutting-Edge Tools and Techniques
daniel Huston
No ratings yet
Comparative Analysis of Various Decision
No ratings yet
Comparative Analysis of Various Decision
7 pages
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
Open-Source Tools For Data Mining: Clinics in Laboratory Medicine April 2008
No ratings yet
Open-Source Tools For Data Mining: Clinics in Laboratory Medicine April 2008
19 pages
Data Mining Practical 7
No ratings yet
Data Mining Practical 7
7 pages
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Detectron2 in Practice: Definitive Reference for Developers and Engineers
From Everand
Detectron2 in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Big Data and Data Science: Analytics for the Future
From Everand
Big Data and Data Science: Analytics for the Future
Dhaanyalakshmi Ahuja
No ratings yet
Image Retrieval: Unlocking the Power of Visual Data
From Everand
Image Retrieval: Unlocking the Power of Visual Data
Fouad Sabry
No ratings yet
Unveiling The Power: A Comparative Analysis of Data Mining Tools Through Decision Tree Classification On The Bank Marketing Dataset
No ratings yet
Unveiling The Power: A Comparative Analysis of Data Mining Tools Through Decision Tree Classification On The Bank Marketing Dataset
11 pages
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
From Everand
Automatic Image Annotation: Enhancing Visual Understanding through Automated Tagging
Fouad Sabry
No ratings yet
Trackpad Ver. 1.0 Class 8: Windows 7 & MS Office 2010
From Everand
Trackpad Ver. 1.0 Class 8: Windows 7 & MS Office 2010
Nidhi Arora
No ratings yet
Be Data Curious!: Be Data Curious!, #1
From Everand
Be Data Curious!: Be Data Curious!, #1
Nick Jewell
No ratings yet
9348 11568 1 PB Published Paper
No ratings yet
9348 11568 1 PB Published Paper
12 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Python Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
From Everand
Python Data Cleaning Cookbook: Prepare your data for analysis with pandas, NumPy, Matplotlib, scikit-learn, and OpenAI
Michael Walker
5/5 (1)
Netdata in Practice: Definitive Reference for Developers and Engineers
From Everand
Netdata in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
DATA MINING AND MACHINE LEARNING TOOLS (1)
No ratings yet
DATA MINING AND MACHINE LEARNING TOOLS (1)
6 pages
Data Science Mastery: From Beginner to Expert in Big Data Analytics
From Everand
Data Science Mastery: From Beginner to Expert in Big Data Analytics
Kameron Hussain
No ratings yet
Data Science with Python: Unlocking the Power of Pandas and Numpy
From Everand
Data Science with Python: Unlocking the Power of Pandas and Numpy
Robert Johnson
No ratings yet
Using OpenRefine
From Everand
Using OpenRefine
Ruben Verborgh
4/5 (1)
Trackpad Ver. 2.0 Class 8: Windows 10 & MS Office 2016
From Everand
Trackpad Ver. 2.0 Class 8: Windows 10 & MS Office 2016
Nidhi Arora
No ratings yet
Data Mining Tools
No ratings yet
Data Mining Tools
13 pages
Data Mining Tools - Javatpoint
No ratings yet
Data Mining Tools - Javatpoint
12 pages
Data Mining: Concepts, Fundamentals And Applications
From Everand
Data Mining: Concepts, Fundamentals And Applications
Enrico Guardelli
No ratings yet
Introduction to Data Analysis in Qualitative Research
From Everand
Introduction to Data Analysis in Qualitative Research
Asher Shkedi
No ratings yet
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
Data Mining Tools
No ratings yet
Data Mining Tools
13 pages
Empirical Software Engineering (SE-404) LAB A1-G1 Laboratory Manual
No ratings yet
Empirical Software Engineering (SE-404) LAB A1-G1 Laboratory Manual
29 pages
Open Source Data Mining
No ratings yet
Open Source Data Mining
5 pages
DMW_LabFile_0901CS243D11_swastik
No ratings yet
DMW_LabFile_0901CS243D11_swastik
25 pages
Image Retrieval: Fundamentals and Applications
From Everand
Image Retrieval: Fundamentals and Applications
Fouad Sabry
No ratings yet
Real-Time Big Data Analytics: Emerging Trends
From Everand
Real-Time Big Data Analytics: Emerging Trends
Trilokesh Khatri
No ratings yet
Python Libraries For Data Science
No ratings yet
Python Libraries For Data Science
6 pages
An Overview and Comparison of Free Python Libraries For Data Mining and Big Data Analysis
No ratings yet
An Overview and Comparison of Free Python Libraries For Data Mining and Big Data Analysis
6 pages
Mastering Data Science: From Basics to Expert Proficiency
From Everand
Mastering Data Science: From Basics to Expert Proficiency
William Smith
No ratings yet
GuideForMigration S7-300 S7-1500 V11 en
No ratings yet
GuideForMigration S7-300 S7-1500 V11 en
97 pages
Dermapen4
No ratings yet
Dermapen4
20 pages
Nullc0nbountyhuntingtechniques-160314053844 2 PDF
No ratings yet
Nullc0nbountyhuntingtechniques-160314053844 2 PDF
82 pages
Dell Optiplex GX620
No ratings yet
Dell Optiplex GX620
144 pages
Unicorn Head Papercraft
No ratings yet
Unicorn Head Papercraft
4 pages
Peplink Balance v6.3.1 User Manual
No ratings yet
Peplink Balance v6.3.1 User Manual
278 pages
MIL Quiz
No ratings yet
MIL Quiz
2 pages
Unit 1
No ratings yet
Unit 1
18 pages
User Manual: HDM97 Reference Meter Series
No ratings yet
User Manual: HDM97 Reference Meter Series
49 pages
5 Serial and Batch Processing Systems
No ratings yet
5 Serial and Batch Processing Systems
17 pages
6.2.2.4 Packet Tracer Instructor Version
No ratings yet
6.2.2.4 Packet Tracer Instructor Version
2 pages
Final Control System: Bulletin 41.1:flovue
No ratings yet
Final Control System: Bulletin 41.1:flovue
20 pages
Sap SD
No ratings yet
Sap SD
129 pages
AutoCAD Advanced Syllabus
No ratings yet
AutoCAD Advanced Syllabus
2 pages
Evil-Winrm Error On Connection To Host - Off-Topic - Hack The Box - Forums
No ratings yet
Evil-Winrm Error On Connection To Host - Off-Topic - Hack The Box - Forums
10 pages
365-Day: Https Cookie Stealing: Mike Perry Defcon 2007
No ratings yet
365-Day: Https Cookie Stealing: Mike Perry Defcon 2007
12 pages
App Dna Lga
No ratings yet
App Dna Lga
15 pages
Section 1 HTML (Hypertext Markup Language)
No ratings yet
Section 1 HTML (Hypertext Markup Language)
24 pages
Master Thesis Sheikh Muhammad Ali
No ratings yet
Master Thesis Sheikh Muhammad Ali
76 pages
Skills Development
No ratings yet
Skills Development
13 pages
TDD
No ratings yet
TDD
34 pages
KRIHandbook OpsDog InformationSecurityTechnology
No ratings yet
KRIHandbook OpsDog InformationSecurityTechnology
48 pages
Customer Service Modernization - Juneidi Tsai
No ratings yet
Customer Service Modernization - Juneidi Tsai
15 pages
How I Edited It On Photoshop in The Style of Horst P Horst
No ratings yet
How I Edited It On Photoshop in The Style of Horst P Horst
2 pages
Ch_04
No ratings yet
Ch_04
46 pages
Emtech M6 Module
No ratings yet
Emtech M6 Module
6 pages
Lab - Configure Windows Firewall: Part 1: Create and Share A Folder On PC-1
No ratings yet
Lab - Configure Windows Firewall: Part 1: Create and Share A Folder On PC-1
3 pages
Number System Aptitude Test Questions - Concepts Formulas and Tricks
No ratings yet
Number System Aptitude Test Questions - Concepts Formulas and Tricks
7 pages

Experimental Evaluation of Open Source Data Mining

Uploaded by

Experimental Evaluation of Open Source Data Mining

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Experimental Evaluation of Open Source Data Mining Tools (WEKA and

Article in International Journal of Engineering Trends and Technology · August 2020

Ritu Ratra Preeti Gulia

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Experimental Evaluation of Open Source Data

ISSN: 2231-5381 http://www.ijettjournal.org Page 30

II. OPEN SOURCE SOFTWARE In WEKA software,secondary data can be used to

Open source data mining tool

Java Based Tool

Data preprocessing, Classification rules, regression, Clustering, association rules,

No programming and coding language required

Provide access to SQL databases.

It provides various machine learning algorithms for data mining tasks.

Figure 1: Features of WEKA

ISSN: 2231-5381 http://www.ijettjournal.org Page 31

B. Orange or quality is involved[12],[13]. Basically Orange

Figure2: Features of Orange tool

III. COMPARATIVE ANALYSIS

Parameters WEKA ORANGE

Company Name University of Waikato University of Ljubljana

Source http://www.cs.waikato.ac.nz/ml/weka/ http://orange.biolab.si

Programming language JAVA C++, Python

Released date` 1993 1996

ISSN: 2231-5381 http://www.ijettjournal.org Page 32

License GNU General Public License Open-source, GNU GPLv3

Availability Open Source Open Source

Current Version 3.8 3.24.1

Areas Machine learning, Data visualization, Marketing, Direct Mail Financial

Portability Cross Platform Cross Platform

GUI/Command line Both Both

B. Technical comparison of WEKA and Orange

To make technical comparison between these tools,

Classifier WEKA(%) Orange(%)

Naïve bays 83.7 82.4

ISSN: 2231-5381 http://www.ijettjournal.org Page 33

ISSN: 2231-5381 http://www.ijettjournal.org Page 34

ISSN: 2231-5381 http://www.ijettjournal.org Page 35

View publication stats

You might also like