0% found this document useful (0 votes)

65 views

Classification of Multivariate Data Sets Without Missing Values Using Memory Based Classifiers - An Effectiveness Evaluation

This document evaluates the performance of different memory-based classifiers for classifying multivariate datasets without missing values. It compares classifiers like IB1, IBk, KStar, and LWL on four datasets from a machine learning repository, including the iris dataset. The iris dataset contains 150 instances characterized by 4 features, belonging to 3 classes of iris plants. The document outlines the methodology for evaluating the classifiers and discusses the characteristics of the classifiers used in the evaluation.

Uploaded by

Adam Hansen

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

Classification of Multivariate Data Sets Without Missing Values Using Memory Based Classifiers - An Effectiveness Evaluation

Uploaded by

Adam Hansen

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.

1, January 2013

CLASSIFICATION OF MULTIVARIATE DATA SETS WITHOUT MISSING VALUES USING MEMORY BASED CLASSIFIERS AN EFFECTIVENESS EVALUATION
C. Lakshmi Devasena1
1

Department of Computer Science and Engineering, Sphoorthy Engineering College, Hyderabad, India
[email protected]

ABSTRACT
Classification is a gradual practice for allocating a given piece of input into any of the known category. Classification is a crucial Machine Learning technique. There are many classification problem occurs in different application areas and need to be solved. Different types are classification algorithms like memorybased, tree-based, rule-based, etc are widely used. This work evaluates the performance of different memory based classifiers for classification of Multivariate data set without having Missing values from UCI machine learning repository using the open source machine learning tool. A comparison of different memory based classifiers used and a practical guideline for selecting the renowned and most suited algorithm for a classification is presented. Apart from that some pragmatic criteria for describing and evaluating the best classifiers are discussed.

KEYWORDS
Classification, IB1 Classifier, IBk Classifier, K Star Classifier, LWL Classifier

1. INTRODUCTION
In machine learning, classification refers to an algorithmic process for designating a given input data into one among the different categories given. An example would be a given program can be assigned into "private" or "public" classes. An algorithm that implements classification is known as a classifier. The input data can be termed as an instance and the categories are known as classes. The characteristics of the instance can be described by a vector of features. These features can be nominal, ordinal, integer-valued or real-valued. Many data mining algorithms work only in terms of nominal data and require that real or integer-valued data be converted into groups. Classification is a supervised procedure that learns to classify new instances based on the knowledge learnt from a previously classified training set of instances. The equivalent unsupervised procedure is known as clustering. It entails grouping data into classes based on inherent similarity measure. Classification and clustering are examples of the universal problems like pattern recognition. In machine learning, classification systems induced from empirical data (examples) are first of all rated by their predictive accuracy. In practice, however, the interpretability or transparency of a classifier is often important as well. This work evaluates the effectiveness of memory-based classifiers to classify the Multivariate Datasets without containing missing values.

DOI : 10.5121/ijaia.2013.4110

129

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.1, January 2013

2. LITERATURE REVIEW
In [1], the comparison of the performance analysis of Fuzzy C mean (FCM) clustering algorithm with Hard C Mean (HCM) algorithm on Iris flower data set is done and concluded Fuzzy clustering are proper for handling the issues related to understanding pattern types, incomplete / noisy data, mixed information and human interaction, and can afford fairly accurate solutions faster. In [6], the issues of determining an appropriate number of clusters and of visualizing the strength of the clusters are addressed using the Iris Data Set.

3. DATA SET
IRIS flower data set classification problem is one of the novel multivariate dataset created by Sir Ronald Aylmer Fisher [3] in 1936. IRIS dataset consists of 150 instances from three different types of Iris plants namely Iris setosa, Iris virginica and Iris versicolor, each of which consist of 50 instances. Length and width of sepal and petals is measured from each sample of three selected species of Iris flower. These four features were measured and used to classify the type of plant are the Sepal Length, Petal Length, Sepal Width and Petal Width [4]. Based on the combination of the four features, the classification of the plant is made. Other multivariate datasets selected for Performance evaluation of Memory-Based Classifiers are Car Evaluation Dataset, Glass Identification Dataset and Balance Scale Dataset from UCI Machine Learning Repository [8]. Car Evaluation dataset has six attributes (Buying Price, Maintenance Price, Number of Doors, Capacity, Size of Luggage Boat and Estimated Safety of the car) and consists of 1728 instances of four different classes. Glass Identification Data set has nine attributes (Refractive Index, Sodium, Potassium, Magnesium, Aluminium, Calcium, Silicon, Barium and Iron content) and consists of 214 instances of seven different classes namely Building Windows Float Processed Glass, Vehicle Windows Float Processed Glass, Building Windows Non-Float Processed Glass, Vehicle Windows Non-Float Processed Glass, Containers Non-Window Glass, Tableware Non-Window Glass and Headlamps Non-Window Glass. Balance Scale Dataset contains four attributes (Left weight, Left distance, Right Weight and Right Distance) and 625 instances.

4. CLASSIFIERS USED
Different memory based Classifiers are evaluated to find the effectiveness of those classifiers in the classification of Iris Data set. The Classifiers evaluated here are.

4.1. IB1 Classifier

IB1 is nearest neighbour classifier. It uses normalized Euclidean distance to find the training instance c losest to the given test instance, and predicts the same class as this training instance. If several instances have the smallest distance to the test instance, the first one obtained is used. Nearest neighbour method is one of the effortless and uncomplicated learning/classification algorithms, and has been effectively applied to a broad range of problems [5]. To classify an unclassified vector X, this algorithm ranks the neighbours of X amongst a given set of N data (Xi, ci), i = 1, 2, ...,N, and employs the class labels cj (j = 1, 2, ...,K) of the K most similar neighbours to predict the class of the new vector X. In specific, the classes of the K neighbours are weighted using the similarity between X and its each of the neighbours, where the Euclidean distance metric is used to measure the similarity. Then, X is assigned the class label with the greatest number of votes among the K nearest class labels. The nearest neighbour classifier works based on the intuition that the classification of an instance is likely to be most similar to the classification of other instances that are nearby to it within the vector space. Compared to other classification methods such as Naive Bayes, nearest neighbour classifier does not rely on prior probabilities, and it is computationally efficient if the data set concerned is not very large.
130

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.1, January 2013

4.2. IBk Classifier

IBK is an implementation of the k-nearest-neighbours classifier. Each case is considered as a point in multi-dimensional space and classification is done based on the nearest neighbours. The value of k for nearest neighbours can vary. This determines how many cases are to be considered as neighbours to decide how to classify an unknown instance. For example, for the iris data, IBK would consider the 4 dimensional space for the four input variables. A new instance would be classified as belonging to the class of its closest neighbour using Euclidean distance measurement. If 5 is used as the value of k, then 5 closest neighbours are considered. The class of the new instance is considered to be the class of the majority of the instances. If 5 is used as the value of k and 3 of the closest neighbours are of type Iris-setosa, then the class of the test instance would be assigned as Iris-setosa. The time taken to classify a test instance with nearest-neighbour classifier increases linearly with the number of training instances kept in the classifier. It has a large storage requirement. Its performance degrades quickly with increasing noise levels. It also performs badly when different attributes affect the outcome to different extents. One parameter that can affect the performance of the IBK algorithm is the number of nearest neighbours to be used. By default it uses just one nearest neighbour.

4.3. K Star Classifier

KStar is a memory-based classifier that is the class of a test instance is based upon the class of those training instances similar to it, as determined by some similarity function. The use of entropy as a distance measure has several benefits. Amongst other things it provides a consistent approach to handling of symbolic attributes, real valued attributes and missing values. K* is an instance-based learner which uses such a measure [6]. Specification of K* Let I be a (possibly infinite) set of instances and T a finite set of transformations on I. Each t T maps instances to instances: t: I I. T contains a distinguished member (the stop symbol) which for completeness maps instances to themselves ((a) = a). Let P be the set of all prefix codes from T* which are terminated by . Members of T* (and so of P) uniquely define a transformation on I: t(a) = tn (tn-1 (... t1(a) ...)) where t = t1,...tn A probability function p is defined on T*. It satisfies the following properties:

(1) As a consequence it satisfies the following:

(2) The probability function P* is defined as the probability of all paths from instance a to instance b: (3)

131

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.1, January 2013

It is easily proven that P* satisfies the following properties:

(4) The K* function is then defined as:

(5) K* is not strictly a distance function. For example, K*(a|a) is in general non-zero and the function (as emphasized by the | notation) is not symmetric. Although possibly counter-intuitive the lack of these properties does not interfere with the development of the K* algorithm below. The following properties are provable:

(6).

4.4. LWL Classifier

LWL is a learning model that belongs to the category of memory based classifiers. Machine Learning Tools work by default with LWL model and Decision Stump in combination as classifier. Decision Stump usually is used in conjunction with a boosting algorithm. Boosting is one of the most important recent developments in classification methodology. Boosting works by sequentially applying a classification algorithm to reweighted versions of the training data, and then taking a weighted majority vote of the sequence of classifiers thus produced. For many classification algorithms, this simple strategy results in dramatic improvements in performance. This seemingly mysterious phenomenon can be understood in terms of well known statistical principles, namely additive modelling and maximum likelihood. For the two-class problem, boosting can be viewed as an approximation to additive modelling on the logistic scale using maximum Bernoulli likelihood as a criterion. We are trying to find the best estimate for the outputs, using a local model that is a hiper-plane. Distance weighting the data training points corresponds to requiring the local model to fit nearby points well, with less concern for distant points:

(7) This process has a physical interpretation. The strength of the springs are equal in the unweighted case, and the position of the hiper-plane minimizes the sum of the stored energy in the springs (Equation 8). We will ignore a factor of 1/2 in all our energy calculations to simplify notation. The stored energy in the springs in this case is C of Equation 7, which is minimized by the physical process.

(8) The linear model in the parameters can be expressed as:

132

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.1, January 2013

xiT = yi

(9)

In what follows we will assume that the constant 1 has been appended to all the input vectors xi to include a constant term in the regression. The data training points can be collected in a matrix equation: X = y (10)

where X is a matrix whose ith row is xiT and y is a vector whose ith element is yi . Thus, the dimensionality of X is n x d where n is the number of data training points and d is the dimensionality of x. Estimating the parameters using an unweighted regression minimizes the criterion given in equation 1 [7]. By solving the normal equations (XTX) = XT y For : = (XTX) - iXTy (12) (11)

Inverting the matrix XTX is not the numerically best way to solve the normal equations from the point of view of efficiency or accuracy, and usually other matrix techniques are used to solve Equation 11.

5. CRITERIA USED FOR CLASSIFICATION EVALUATION

The comparison of the results is made on the basis of the following criteria.

5.1. Accuracy Classification

All classification result could have an error rate and it may fail to classify correctly. So accuracy can be calculated as follows. Accuracy = (Instances Correctly Classified / Total Number of Instances)*100 % (13)

5.2. Mean Absolute Error

MAE is the average of difference between predicted and actual value in all test cases. The formula for calculating MAE is given in equation shown below: MAE = (|a1 c1| + |a2 c2| + +|an cn|) / n Here a is the actual output and c is the expected output. (14)

5.3. Root Mean Squared Error

RMSE is used to measure differences between values predicted by a model and the values actually observed. It is calculated by taking the square root of the mean square error as shown in equation given below:

(15)

133

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.1, January 2013

Here a is the actual output and c is the expected output. The mean-squared error is the commonly used measure for numeric prediction.

5.4. Confusion Matrix

A confusion matrix contains information about actual and predicted classifications done by a classification system. The classification accuracy, mean absolute error, root mean squared error and confusion matrices are calculated for each machine learning algorithm using the machine learning tool.

6. RESULTS AND DISCUSSION

This work is performed using Machine learning tool to evaluate the effectiveness of all the memory- based classifiers for various multivariate datasets. Data Set 1: Iris Data set The performance of the memory based algorithms for Iris Data set in terms of Classification Accuracy, Time taken to test the Model, RMSE and MAE values as shown in Table 1. Comparison among these classifiers based on the correctly classified instances is shown in Fig. 1. Comparison among these classifiers based on MAE and RMSE values are shown in Fig. 2. The confusion matrix arrived for these classifiers are shown from Table 2 to Table 5. The overall ranking is done based on the classification accuracy, Time taken to test the Model, MAE and RMSE values. Based on the results arrived, IB1Classifier which has 100% accuracy and zero MAE and RMSE got the first position in ranking followed by IBk, K Star and LWL as shown in Table 1. Table 1. Overall Results of Memory Based Classifiers IRIS Dataset Classifier Used Instances Correctly Classified (Out of 150) 150 150 150 147 Classification Time taken Accuracy (%) to Test Model (sec) 100 100 100 98 0.02 0.02 0.27 0.02 MAE RMSE Rank

IB1 IBk K Star LWL

0 0.0085 0.0062 0.0765

0 0.0091 0.0206 0.1636

1 2 3 4

Table 2. Confusion Matrix for IB1 Classifier IRIS Dataset A 50 0 0 B 0 50 0 C 0 0 50

A = Iris-Setosa B = Iris-Versicolor C = Iris-Virginica

Table 3. Confusion Matrix for IBk Classifier IRIS Dataset A 50 0 0 B 0 50 0 C 0 0 50

134

A = Iris-Setosa B = Iris-Versicolor C = Iris-Virginica

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.1, January 2013
Comparison based on Corre ctly Classified Instance s 150 140 130 120 110 100

IB1

Ibk

K S tar

LWL

Te chniques Used Correctly Classifie d Incorre ctly Classified

Figure 1. Comparison based on Number of Instances Correctly Classified Iris Dataset

Comparison based on MAE and RMSE

0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0

IB1

IBk

K Star

LWL

Techniques Used Mean Absolute Error Root Mean Squared Error

Figure 2. Comparison based on MAE and RMSE values Iris Dataset Table 4. Confusion Matrix for K*Classifier IRIS Dataset A 50 0 0 B 0 50 0 C 0 0 50

A = Iris-Setosa B = Iris-Versicolor C = Iris-Virginica

Table 5. Confusion Matrix for LWL Classifiers IRIS Dataset A 50 0 0 B 0 49 2 C 0 1 48

A = Iris-Setosa B = Iris-Versicolor C = Iris-Virginica

135

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.1, January 2013

Data Set 2: Car Evaluation Data set The performance of the memory based algorithms for Car Evaluation Data set in terms of Classification Accuracy, Time taken to test the Model, RMSE and MAE values as shown in Table 6. Comparison among the classifiers based on the correctly classified instances is shown in Fig. 3. Comparison among these classifiers based on MAE and RMSE values are shown in Fig. 4. The confusion matrix arrived for these classifiers are shown from Table 7 to Table 10. The overall ranking is done based on the classification accuracy, MAE and RMSE values and it is given in Table 6. Based on the results arrived, IB1 Classifier has 100% accuracy and zero MAE and RMSE got the first position in ranking followed by IBk, K Star and LWL as shown in Table 6. Table 6. Overall Results of Memory Based Classifiers CAR Dataset Classifier Used Instances Correctly Classified (Out of 1728) 1728 1728 1728 1210 Classification Accuracy (%) Time taken to Test Model 0.62 0.62 3.49 2.72 MAE RMSE Rank

IB1 IBk K Star LWL

100 100 100 70.02

0 0.0009 0.1027 0.1373

0 0.001 0.1644 0.266

1 2 3 4

Table 7. Confusion Matrix for IB1Classifier CAR Dataset A 1210 0 0 0 B 0 384 0 0 C 0 0 69 0 D 0 0 0 65

A = Unaccident B = Accident C = Good D = Verygood

Table 8. Confusion Matrix for IBk Classifier CAR Dataset A 1210 0 0 0 B 0 384 0 0 C 0 0 69 0 D 0 0 0 65

A = Unaccident B = Accident C = Good D = Verygood

Table 9. Confusion Matrix for K Star Classifier CAR Dataset A 1210 0 0 0 B 0 384 0 0 C 0 0 69 0 D 0 0 0 65

A = Unaccident B = Accident C = Good D = Verygood

136

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.1, January 2013
Comparison based on Correctly Classified Insances

1800 1600 1400 1200 1000 800 600 400 200 0

IB1

Ibk

K Star

LWL

Techniques Used Correctly Classified Incorrecly Classified

Figure 3. Comparison based on Number of Instances Correctly Classified CAR Dataset

Comparison based on MAE and RMSE

0.3 0.25 0.2 0.15 0.1 0.05 0

IB1

IBk

K Star

LWL

Techniques Used Mean Absolute Error Root Mean Squared Error

Figure 4. Comparison based on MAE and RMSE values CAR Dataset Table 10. Confusion Matrix for LWL Classifier CAR Dataset A 1210 384 69 65 B 0 0 0 0 C 0 0 0 0 D 0 0 0 0

A = Unaccident B = Accident C = Good D = Verygood Data Set 3: Glass Identification Data set

The performance of the memory based algorithms for Glass Identification Dataset in terms of Classification Accuracy, Time taken to test the Model, RMSE and MAE values as shown in Table 11. Comparison among the classifiers based on the correctly classified instances is shown in Fig. 5. Comparison among these classifiers based on MAE and RMSE values are shown in Fig. 6. The confusion matrix arrived for these classifiers are shown from Table 12 to Table 15. The overall ranking is done based on the classification accuracy, Time taken to test the Model, MAE and
137

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.1, January 2013

RMSE values. Based on the results arrived, IB1 Classifier has 100% accuracy with Nil MAE and RMSE got the first position in ranking followed by IBk, K Star and LWL as shown in Table 11. Table 11. Overall Results of Memory Based Classifiers Glass Dataset Classifier Used Instances Correctly Classified (Out of 214) 214 214 214 97 Classification Accuracy (%) Time taken to Test Model (sec) 0.08 0.08 0.70 0.47 MAE RMSE Rank

IB1 IBk K Star LWL

100 100 100 45.33

0 0.0077 0.0002 0.1724

0 0.011 0.0026 0.291

1 2 3 4

Comparison based on Correctly Classified Instances

250 200 150 100 50 0

IB1

Ibk

K Star

LWL

Techniques Used Correctly Classified Incorrectly Classified

Figure 5. Comparison based on Number of Instances Correctly Classified Glass Dataset

Comparison based on MAE and RMSE

0.3 0.25 0.2 0.15 0.1 0.05 0

IB1

IBk

K Star

LWL

Techniques Used Mean Absolute Error Root Mean Squared Error

Figure 6. Comparison based on MAE and RMSE values Glass Dataset

138

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.1, January 2013

Table 12. Confusion Matrix for IB1Classifier GLASS Dataset A 70 0 0 0 0 0 0 B 0 76 0 0 0 0 0 C 0 0 17 0 0 0 0 D 0 0 0 0 0 0 0 E 0 0 0 0 13 0 0 F 0 0 0 0 0 9 0 G 0 0 0 0 0 0 29

A = Build window float B = Build window non-float C = Vehicle Window Float D = Vehicle Window non-Float E = Containers F = Tableware G = Headlamps

Table 13. Confusion Matrix for IBk Classifier GLASS Dataset A 70 0 0 0 0 0 0 B 0 76 0 0 0 0 0 C 0 0 17 0 0 0 0 D 0 0 0 0 0 0 0 E 0 0 0 0 13 0 0 F 0 0 0 0 0 9 0 G 0 0 0 0 0 0 29

A = Build window float B = Build window non-float C = Vehicle Window Float D = Vehicle Window non-Float E = Containers F = Tableware G = Headlamps

Table 14. Confusion Matrix for K Star Classifier GLASS Dataset A 70 0 0 0 0 0 0 B 0 76 0 0 0 0 0 C 0 0 17 0 0 0 0 D 0 0 0 0 0 0 0 E 0 0 0 0 13 0 0 F 0 0 0 0 0 9 0 G 0 0 0 0 0 0 29

A = Build window float B = Build window non-float C = Vehicle Window Float D = Vehicle Window non-Float E = Containers F = Tableware G = Headlamps

Table 15. Confusion Matrix for LWL Classifier GLASS Dataset A 70 63 17 0 0 0 3 B 0 1 0 0 0 0 0 C 0 0 0 0 0 0 0 D 0 0 0 0 0 0 0 E 0 0 0 0 0 0 0 F 0 0 0 0 0 0 0 G 0 12 0 0 13 9 26

A = Build window float B = Build window non-float C = Vehicle Window Float D = Vehicle Window non-Float E = Containers F = Tableware G = Headlamps Data Set 4: Balance Scale Dataset

The performance of the memory based algorithms for Balance Scale Dataset in terms of Classification Accuracy, Time taken to test the Model, RMSE and MAE values as shown in Table 16. Comparison among the classifiers based on the correctly classified instances is shown in Fig.
139

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.1, January 2013

7. Comparison among these classifiers based on MAE and RMSE values are shown in Fig. 8. The confusion matrix arrived for these classifiers are shown from Table 17 to Table 20. Table 16. Overall Results of Memory Based Classifiers Balance Scale Dataset Classifier Used Instances Correctly Classified (Out of 625) 625 625 589 352 Classification Accuracy (%) Time taken to Test Model (sec) 0.3 0.3 0.62 0.78 MAE RMSE Rank

IB1 IBk K Star LWL

100 100 94.24 56.32

0 0.0021 0.1349 0.3192

0 0.0023 0.1995 0.3973

1 2 3 4

The overall ranking is done based on the classification accuracy, Time taken to test the Model, MAE and RMSE values. Based on the results arrived, IB1 Classifier has 100% accuracy with Nil MAE and RMSE got the first position in ranking followed by IBk, K Star and LWL as shown in Table 16.
Comparison based on Correctly Classified Instances

700 600 500 400 300 200 100 0

IB1

Ibk

K Star

LWL

Techniques Used Correctly Classified Incorrectly Classified

Figure 7. Comparison based on Number of Instances Correctly Classified Balance Scale Dataset
Comparison based on MAE and RMSE

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

IB1

IBk

K Star

LWL

Techniques Used Mean Absolute Error Root Mean Squared Error

Figure 8. Comparison based on MAE and RMSE values Balance Scale Dataset
140

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.1, January 2013

Table 17. Confusion Matrix for IB1Classifier Balance Scale Dataset A 288 0 0 B 0 49 0 C 0 0 288

A = Left B = Balanced C = Right

Table 18. Confusion Matrix for IBkClassifier Balance Scale Dataset A 288 0 0 B 0 49 0 C 0 0 288

A = Left B = Balanced C = Right

Table 19. Confusion Matrix for K Star Classifier Balance Scale Dataset A 288 12 0 B 0 13 0 C 0 24 288

A = Left B = Balanced C = Right

Table 20. Confusion Matrix for LWL Classifier Balance Scale Dataset A 176 23 112 B 0 0 0 C 112 26 176

A = Left B = Balanced C = Right

7. CONCLUSIONS
In this performance evaluation work, Memory based classifiers are experimented to estimate classification accuracy of those classifiers in the classification of Multivariate Data sets without Missing Values using Iris, Glass Identification, Balance Scale, Car Evaluation and Congressional Voting Records Data Sets. The experiments were done using an open source Machine Learning Tool. The performance of the classifiers was measured and results are compared. Among the four classifiers (IB1 Classifier, IBk Classifier, K Star Classifier and LWL Classifier) IB1 Classifier performs well in this classification problem. IBk Classifier, K Star Classifier and LWL classifier are getting the successive ranks based on classification accuracy and other evaluation measures.

ACKNOWLEDGEMENTS
The author thanks the Management of Sphoorthy Engineering College and Faculties of CSE Department for the cooperation extended.

REFERENCES
[1] [2] Pawan Kumar and Deepika Sirohi, Comparative Analysis of FCM and HCM Algorithm on Iris Data Set, International Journal of Computer Applications, Vol. 5, No.2, pp 33 37, August 2010. David Benson-Putnins, Margaret monfardin, Meagan E. Magnoni, and Daniel Martin, Spectral Clustering and Visualization: A Novel Clustering of Fisher's Iris Data Set. 141

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.1, January 2013 [3] [4] [5] [6] [7] [8] Fisher, R.A, The use of multiple measurements in taxonomic problems Annual Eugenics, 7, pp.179 188, 1936. Patrick S. Hoey, Statistical Analysis of the Iris Flower Dataset. M. Kuramochi, G. Karypis. Gene classification using expression profiles: a feasibility study, International Journal on Artificial Intelligence Tools, 14(4):641-660, 2005. John G. Cleary, K*: An Instance-based Learner Using an Entropic Distance Measure. Christopher G. Atkeson, Andrew W. Moore and Stefan Schaal, Locally Weighted Learning October 1996. UCI Machine Learning Data Repository http://archive.ics.uci.edu/ml/datasets.

Authors
C. Lakshmi Devasena has completed MCA, M.Phil. and pursuing Ph.D. She has Nine years of teaching experience and Two years of industrial experience. Her area of research interest is Image processing, Medical Image Analysis, Cryptography and Data mining. She has published 16 papers in International Journals and Twelve papers in Proceedings of International and National Conferences. She has presented 30 papers in National and international conferences. At Present, she is working as Associate Professor in Sphoorthy Engineering College, Hyderabad, AP.

142

Tu Dedo Corazon Paloma Ruiz PDF Gratis
0% (1)
Tu Dedo Corazon Paloma Ruiz PDF Gratis
4 pages
Real-Life Project Management Plan Phase
No ratings yet
Real-Life Project Management Plan Phase
9 pages
Export Data From Excel To Table Using Custom Web ADI Integrator
No ratings yet
Export Data From Excel To Table Using Custom Web ADI Integrator
17 pages
College Website Presentation
60% (10)
College Website Presentation
55 pages
Classification Algorithm in Data Mining: An
No ratings yet
Classification Algorithm in Data Mining: An
6 pages
A Review of Various KNN Techniques
No ratings yet
A Review of Various KNN Techniques
6 pages
PCA-based Algorithm For Constructing Ensembles of Feature Ranking Filters
No ratings yet
PCA-based Algorithm For Constructing Ensembles of Feature Ranking Filters
6 pages
Minor Project Synopsis
No ratings yet
Minor Project Synopsis
12 pages
Performance Enhancement Using Combinatorial Approach of Classification and Clustering in Machine Learning
No ratings yet
Performance Enhancement Using Combinatorial Approach of Classification and Clustering in Machine Learning
8 pages
Comparative Investigation of K-Means and K-Medoid Algorithm On Iris Data
No ratings yet
Comparative Investigation of K-Means and K-Medoid Algorithm On Iris Data
4 pages
Ambo University Inistitute of Technology Department of Computer Science
No ratings yet
Ambo University Inistitute of Technology Department of Computer Science
13 pages
Knowledge Mining Using Classification Through Clustering
No ratings yet
Knowledge Mining Using Classification Through Clustering
6 pages
K : An Instance-Based Learner Using An Entropic Distance Measure
No ratings yet
K : An Instance-Based Learner Using An Entropic Distance Measure
14 pages
A Comparative Study Between Feature Selection Algorithms - Ok
No ratings yet
A Comparative Study Between Feature Selection Algorithms - Ok
10 pages
Multiclass Recognition With Multiple Feature Trees
No ratings yet
Multiclass Recognition With Multiple Feature Trees
7 pages
Entropy-Based Algorithm For Discretization: April 2011
No ratings yet
Entropy-Based Algorithm For Discretization: April 2011
9 pages
Unsupervised K-Means Clustering Algorithm
No ratings yet
Unsupervised K-Means Clustering Algorithm
17 pages
Unit - 5: Anuj Khanna Assistant Profesor (Kiot, Kanpur)
No ratings yet
Unit - 5: Anuj Khanna Assistant Profesor (Kiot, Kanpur)
23 pages
Performance Comparison Between Naïve Bayes, Decision Tree and K-Nearest Neighbor in Searching Alternative Design in An Energy Simulation Tool
No ratings yet
Performance Comparison Between Naïve Bayes, Decision Tree and K-Nearest Neighbor in Searching Alternative Design in An Energy Simulation Tool
7 pages
Performance Evaluation of K-Means Clustering Algorithm With Various Distance Metrics
No ratings yet
Performance Evaluation of K-Means Clustering Algorithm With Various Distance Metrics
5 pages
A Survey On Seeds Affinity Propagation: Preeti Kashyap, Babita Ujjainiya
No ratings yet
A Survey On Seeds Affinity Propagation: Preeti Kashyap, Babita Ujjainiya
9 pages
An Integration of K-Means and Decision Tree (ID3) Towards A More Efficient Data Mining Algorithm
No ratings yet
An Integration of K-Means and Decision Tree (ID3) Towards A More Efficient Data Mining Algorithm
7 pages
Comparison of Different Clustering Algorithms Using WEKA Tool
No ratings yet
Comparison of Different Clustering Algorithms Using WEKA Tool
3 pages
Case Study-1: Department of Computer Science and Engineering (7 Semester)
No ratings yet
Case Study-1: Department of Computer Science and Engineering (7 Semester)
16 pages
Machine Learning Toolbox
No ratings yet
Machine Learning Toolbox
10 pages
Machine Learning Algorithms in Web Page Classification
No ratings yet
Machine Learning Algorithms in Web Page Classification
9 pages
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
No ratings yet
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
5 pages
Comparative Analysis of Various Decision
No ratings yet
Comparative Analysis of Various Decision
7 pages
Performance Analysis of Decision Tree Classifiers
100% (1)
Performance Analysis of Decision Tree Classifiers
9 pages
a565709-613
No ratings yet
a565709-613
8 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Comparative Analysis of Various Decision PDF
No ratings yet
Comparative Analysis of Various Decision PDF
7 pages
Automatic Feature Subset Selection Using Genetic Algorithm For Clustering
No ratings yet
Automatic Feature Subset Selection Using Genetic Algorithm For Clustering
5 pages
Machine Learning QNA
No ratings yet
Machine Learning QNA
1 page
Glossary of Terms Journal of Machine Learning
No ratings yet
Glossary of Terms Journal of Machine Learning
4 pages
Journal On Decision Tree
No ratings yet
Journal On Decision Tree
5 pages
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
No ratings yet
Optimal Feature Selection from VMware ESXi 5.1 Feature Set
8 pages
Week3 Stat
No ratings yet
Week3 Stat
4 pages
DWDM Unit-3: What Is Classification? What Is Prediction?
No ratings yet
DWDM Unit-3: What Is Classification? What Is Prediction?
12 pages
Decision Trees
No ratings yet
Decision Trees
14 pages
Doan 2015
No ratings yet
Doan 2015
8 pages
ML Unit 3 Part 3
No ratings yet
ML Unit 3 Part 3
33 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
Literature Survey On Genetic Algorithm Approach For Fuzzy Rule-Based System
No ratings yet
Literature Survey On Genetic Algorithm Approach For Fuzzy Rule-Based System
4 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
Suitability of Various Intelligent Tree Based Classifiers For Diagnosing Noisy Medical Data
No ratings yet
Suitability of Various Intelligent Tree Based Classifiers For Diagnosing Noisy Medical Data
12 pages
Attack Detection by Clustering and Classification Approach: Ms. Priyanka J. Pathak, Asst. Prof. Snehlata S. Dongre
No ratings yet
Attack Detection by Clustering and Classification Approach: Ms. Priyanka J. Pathak, Asst. Prof. Snehlata S. Dongre
4 pages
Decision Tree and Related Techniques For Classification in Scalation
No ratings yet
Decision Tree and Related Techniques For Classification in Scalation
12 pages
K-Means Clustering and Affinity Clustering Based On Heterogeneous Transfer Learning
No ratings yet
K-Means Clustering and Affinity Clustering Based On Heterogeneous Transfer Learning
7 pages
A Review of Supervised Learning Based Classification For Text To Speech System
No ratings yet
A Review of Supervised Learning Based Classification For Text To Speech System
8 pages
Unit IV and V
No ratings yet
Unit IV and V
19 pages
Clustering
No ratings yet
Clustering
7 pages
I Jcs It 2015060141
No ratings yet
I Jcs It 2015060141
5 pages
Algorithms
No ratings yet
Algorithms
7 pages
An Entropy-Based Subspace Clustering Algorithm For Categorical Data
No ratings yet
An Entropy-Based Subspace Clustering Algorithm For Categorical Data
7 pages
Machine Learning and Deep Learning Supervised Learning 1682688720
No ratings yet
Machine Learning and Deep Learning Supervised Learning 1682688720
121 pages
Decision Tree Report
100% (1)
Decision Tree Report
29 pages
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
No ratings yet
Data Mining: Kabith Sivaprasad (BE/1234/2009) Rimjhim (BE/1134/2009) Utkarsh Ahuja (BE/1226/2009)
32 pages
Va41 3 405
No ratings yet
Va41 3 405
6 pages
Top 10 Algorithms in Data Mining - 10algorithms-08
No ratings yet
Top 10 Algorithms in Data Mining - 10algorithms-08
37 pages
Comparison of Purity and Entropy of K-Means Clustering and Fuzzy C Means Clustering
No ratings yet
Comparison of Purity and Entropy of K-Means Clustering and Fuzzy C Means Clustering
4 pages
Cluster Center Initialization Algorithm For K-Means Clustering
No ratings yet
Cluster Center Initialization Algorithm For K-Means Clustering
10 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Uncertainty Estimation in Neural Networks Through Multi-Task Learning
No ratings yet
Uncertainty Estimation in Neural Networks Through Multi-Task Learning
13 pages
International Journal of Artificial Intelligence & Applications (IJAIA)
No ratings yet
International Journal of Artificial Intelligence & Applications (IJAIA)
2 pages
International Journal of Artificial Intelligence & Applications (IJAIA)
No ratings yet
International Journal of Artificial Intelligence & Applications (IJAIA)
2 pages
International Journal of Artificial Intelligence & Applications (IJAIA)
No ratings yet
International Journal of Artificial Intelligence & Applications (IJAIA)
2 pages
International Journal of Artificial Intelligence & Applications (IJAIA)
No ratings yet
International Journal of Artificial Intelligence & Applications (IJAIA)
2 pages
Answer Set Programming (DLV - Clingo) :connect 4 Solver
100% (1)
Answer Set Programming (DLV - Clingo) :connect 4 Solver
9 pages
International Journal of Artificial Intelligence & Applications (IJAIA)
No ratings yet
International Journal of Artificial Intelligence & Applications (IJAIA)
2 pages
International Journal of Artificial Intelligence & Applications (IJAIA)
No ratings yet
International Journal of Artificial Intelligence & Applications (IJAIA)
2 pages
International Journal of Artificial Intelligence & Applications (IJAIA)
No ratings yet
International Journal of Artificial Intelligence & Applications (IJAIA)
2 pages
Development of An Intelligent Vital Sign Monitoring Robot System
No ratings yet
Development of An Intelligent Vital Sign Monitoring Robot System
21 pages
International Journal of Artificial Intelligence & Applications (IJAIA)
No ratings yet
International Journal of Artificial Intelligence & Applications (IJAIA)
2 pages
Book 2
No ratings yet
Book 2
17 pages
EDGE-Net: Efficient Deep-Learning Gradients Extraction Network
No ratings yet
EDGE-Net: Efficient Deep-Learning Gradients Extraction Network
15 pages
International Journal of Artificial Intelligence & Applications (IJAIA)
No ratings yet
International Journal of Artificial Intelligence & Applications (IJAIA)
2 pages
A Comparison of Document Similarity Algorithms
No ratings yet
A Comparison of Document Similarity Algorithms
10 pages
Design That Uses Ai To Overturn Stereotypes: Make Witches Wicked Again
No ratings yet
Design That Uses Ai To Overturn Stereotypes: Make Witches Wicked Again
12 pages
Spot-The-Camel: Computer Vision For Safer Roads
No ratings yet
Spot-The-Camel: Computer Vision For Safer Roads
10 pages
Investigating The Effect of Bd-Craft To Text Detection Algorithms
No ratings yet
Investigating The Effect of Bd-Craft To Text Detection Algorithms
16 pages
Call For Papers - International Journal of Artificial Intelligence & Applications (IJAIA)
No ratings yet
Call For Papers - International Journal of Artificial Intelligence & Applications (IJAIA)
2 pages
Adpp: A Novel Anomaly Detection and Privacy-Preserving Framework Using Blockchain and Neural Networks in Tokenomics
No ratings yet
Adpp: A Novel Anomaly Detection and Privacy-Preserving Framework Using Blockchain and Neural Networks in Tokenomics
16 pages
Intel 82093 Apic
No ratings yet
Intel 82093 Apic
20 pages
Vestel 17mb25 SM
No ratings yet
Vestel 17mb25 SM
78 pages
2 Semester 1439-1440 H
No ratings yet
2 Semester 1439-1440 H
37 pages
TwinCAT 2 Manual v3.0.0
No ratings yet
TwinCAT 2 Manual v3.0.0
383 pages
Pliny The Younger - Letters & Panegyricus - Latin & English - II
100% (1)
Pliny The Younger - Letters & Panegyricus - Latin & English - II
455 pages
Laws of Indices
100% (1)
Laws of Indices
6 pages
Fortran 77 Programmers Guide
100% (3)
Fortran 77 Programmers Guide
168 pages
Register Renaming: ECEN 6253 Advanced Digital Computer Design
No ratings yet
Register Renaming: ECEN 6253 Advanced Digital Computer Design
7 pages
Filled-Area Primitives
No ratings yet
Filled-Area Primitives
13 pages
Difference Equations 3 PDF
No ratings yet
Difference Equations 3 PDF
24 pages
Unit 2 Encapsulation in C
No ratings yet
Unit 2 Encapsulation in C
48 pages
BRKRST-2124-Introduction To Segment Routing PDF
No ratings yet
BRKRST-2124-Introduction To Segment Routing PDF
94 pages
CMDh_271_2012_Rev.2_2022_12_clean_-_PP_on_request_of_multiple_APP_during_DCPs
No ratings yet
CMDh_271_2012_Rev.2_2022_12_clean_-_PP_on_request_of_multiple_APP_during_DCPs
3 pages
Self MV
No ratings yet
Self MV
16 pages
CSS Box Model
No ratings yet
CSS Box Model
12 pages
PMP Project Status Report
No ratings yet
PMP Project Status Report
3 pages
Lesson Planning y MX C
No ratings yet
Lesson Planning y MX C
1 page
Lost and Found Web Application For Cal Poly Pomona Students
No ratings yet
Lost and Found Web Application For Cal Poly Pomona Students
29 pages
дискретка спасение
No ratings yet
дискретка спасение
38 pages
Using Netcore Docker and Kubernetes Succinctly
100% (1)
Using Netcore Docker and Kubernetes Succinctly
91 pages
The Assigment of Work
No ratings yet
The Assigment of Work
4 pages
Jbase Programmers Reference Manual: Part No: 42-M-JAPRM-20.1 1991 - 1997 James Anthony Computing. All Rights Reserved
No ratings yet
Jbase Programmers Reference Manual: Part No: 42-M-JAPRM-20.1 1991 - 1997 James Anthony Computing. All Rights Reserved
355 pages
Oracle DBA LAB Manual
No ratings yet
Oracle DBA LAB Manual
47 pages
Customer Relationship Management
No ratings yet
Customer Relationship Management
25 pages
Maths - English Medium 2 To 5
No ratings yet
Maths - English Medium 2 To 5
8 pages
10th Std. Paper C
No ratings yet
10th Std. Paper C
1 page

Classification of Multivariate Data Sets Without Missing Values Using Memory Based Classifiers - An Effectiveness Evaluation

Uploaded by

Classification of Multivariate Data Sets Without Missing Values Using Memory Based Classifiers - An Effectiveness Evaluation

Uploaded by

International Journal of Artificial Intelligence & Applications (IJAIA), Vol.4, No.

4.1. IB1 Classifier

4.2. IBk Classifier

4.3. K Star Classifier

(1) As a consequence it satisfies the following:

It is easily proven that P* satisfies the following properties:

(4) The K* function is then defined as:

4.4. LWL Classifier

(8) The linear model in the parameters can be expressed as:

5. CRITERIA USED FOR CLASSIFICATION EVALUATION

5.1. Accuracy Classification

5.2. Mean Absolute Error

5.3. Root Mean Squared Error

5.4. Confusion Matrix

6. RESULTS AND DISCUSSION

IB1 IBk K Star LWL

0 0.0085 0.0062 0.0765

0 0.0091 0.0206 0.1636

Table 2. Confusion Matrix for IB1 Classifier IRIS Dataset A 50 0 0 B 0 50 0 C 0 0 50

A = Iris-Setosa B = Iris-Versicolor C = Iris-Virginica

Table 3. Confusion Matrix for IBk Classifier IRIS Dataset A 50 0 0 B 0 50 0 C 0 0 50

A = Iris-Setosa B = Iris-Versicolor C = Iris-Virginica

Te chniques Used Correctly Classifie d Incorre ctly Classified

Figure 1. Comparison based on Number of Instances Correctly Classified Iris Dataset

0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0

Techniques Used Mean Absolute Error Root Mean Squared Error

A = Iris-Setosa B = Iris-Versicolor C = Iris-Virginica

Table 5. Confusion Matrix for LWL Classifiers IRIS Dataset A 50 0 0 B 0 49 2 C 0 1 48

A = Iris-Setosa B = Iris-Versicolor C = Iris-Virginica

IB1 IBk K Star LWL

100 100 100 70.02

0 0.0009 0.1027 0.1373

0 0.001 0.1644 0.266

Table 7. Confusion Matrix for IB1Classifier CAR Dataset A 1210 0 0 0 B 0 384 0 0 C 0 0 69 0 D 0 0 0 65

A = Unaccident B = Accident C = Good D = Verygood

A = Unaccident B = Accident C = Good D = Verygood

A = Unaccident B = Accident C = Good D = Verygood

1800 1600 1400 1200 1000 800 600 400 200 0

Techniques Used Correctly Classified Incorrecly Classified

Figure 3. Comparison based on Number of Instances Correctly Classified CAR Dataset

0.3 0.25 0.2 0.15 0.1 0.05 0

Techniques Used Mean Absolute Error Root Mean Squared Error

IB1 IBk K Star LWL

100 100 100 45.33

0 0.0077 0.0002 0.1724

0 0.011 0.0026 0.291

Comparison based on Correctly Classified Instances

250 200 150 100 50 0

Techniques Used Correctly Classified Incorrectly Classified

Figure 5. Comparison based on Number of Instances Correctly Classified Glass Dataset

0.3 0.25 0.2 0.15 0.1 0.05 0

Techniques Used Mean Absolute Error Root Mean Squared Error

Figure 6. Comparison based on MAE and RMSE values Glass Dataset

Table 12. Confusion Matrix for IB1Classifier GLASS Dataset A 70 0 0 0 0 0 0 B 0 76 0 0 0 0 0 C 0 0 17 0 0 0 0 D 0 0 0 0 0 0 0 E 0 0 0 0 13 0 0 F 0 0 0 0 0 9 0 G 0 0 0 0 0 0 29

Table 13. Confusion Matrix for IBk Classifier GLASS Dataset A 70 0 0 0 0 0 0 B 0 76 0 0 0 0 0 C 0 0 17 0 0 0 0 D 0 0 0 0 0 0 0 E 0 0 0 0 13 0 0 F 0 0 0 0 0 9 0 G 0 0 0 0 0 0 29

Table 14. Confusion Matrix for K Star Classifier GLASS Dataset A 70 0 0 0 0 0 0 B 0 76 0 0 0 0 0 C 0 0 17 0 0 0 0 D 0 0 0 0 0 0 0 E 0 0 0 0 13 0 0 F 0 0 0 0 0 9 0 G 0 0 0 0 0 0 29

Table 15. Confusion Matrix for LWL Classifier GLASS Dataset A 70 63 17 0 0 0 3 B 0 1 0 0 0 0 0 C 0 0 0 0 0 0 0 D 0 0 0 0 0 0 0 E 0 0 0 0 0 0 0 F 0 0 0 0 0 0 0 G 0 12 0 0 13 9 26

IB1 IBk K Star LWL

100 100 94.24 56.32

0 0.0021 0.1349 0.3192

0 0.0023 0.1995 0.3973

700 600 500 400 300 200 100 0

Techniques Used Correctly Classified Incorrectly Classified

0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0

Techniques Used Mean Absolute Error Root Mean Squared Error

A = Left B = Balanced C = Right

A = Left B = Balanced C = Right

A = Left B = Balanced C = Right

A = Left B = Balanced C = Right

You might also like