0% found this document useful (0 votes)
137 views

DWDM Record With Alignment

This experiment demonstrates association rule mining on a contact lenses dataset using the Apriori algorithm in WEKA. The algorithm generates rules with a minimum support of 5 instances and confidence of 90%. The top rules found indicate that reduced tear production rate is associated with not wearing contact lenses, and wearing soft contact lenses is associated with no astigmatism and normal tear production rate.

Uploaded by

navya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views

DWDM Record With Alignment

This experiment demonstrates association rule mining on a contact lenses dataset using the Apriori algorithm in WEKA. The algorithm generates rules with a minimum support of 5 instances and confidence of 90%. The top rules found indicate that reduced tear production rate is associated with not wearing contact lenses, and wearing soft contact lenses is associated with no astigmatism and normal tear production rate.

Uploaded by

navya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Demonstration of preprocessing on dataset student.

arff

Aim: This experiment illustrates some of the basic data preprocessing operations that can
be performed using WEKA-Explorer. The sample dataset used for this example is the
student data available in arff format.

Procedure ( Using Explorer)

Step1: Loading the data. We can load the dataset into weka by clicking on open button in

preprocessing interface and selecting the appropriate file.

Step2: Once the data is loaded, weka will recognize the attributes and during the scan of
the data weka will compute some basic strategies on each attribute. The left panel in the
above figure shows the list of recognized attributes while the top panel indicates the names
of the base relation or table and the current working relation (which are same initially).

Step3: Clicking on an attribute in the left panel will show the basic statistics on the
attributes for the categorical attributes the frequency of each attribute value is shown,
while for continuous attributes we can obtain min, max, mean, standard deviation and
deviation etc.,

Step4:The visualization in the right button panel in the form of cross-tabulation across two
attributes.

Note:we can select another attribute using the dropdown list.


Step5:Selecting or filtering attributes
Removing an attribute-When we need to remove an attribute,we can do this by using the
attribute filters in weka.In the filter model panel,click on choose button,This will show a
popup window with a list of available filters. Scroll down the list and select the
“weka.filters.unsupervised.attribute.remove” filters.

Step 6:

a) Next click the textbox immediately to the right of the choose button.In the resulting

dialog box enter the index of the attribute to be filtered out.

b) Make sure that invert selection option is set to false.The click OK now in the filter

box.you will see “Remove-R-7”.


c) Click the apply button to apply filter to this data.This will remove the attribute and
create new working relation.

d) Save the new working relation as an arff file by clicking save button on the

top(button)panel.(student.arff)

Discretization

1) Sometimes association rule mining can only be performed on categorical data.This


requires performing discretization on numeric or continuous attributes.In the
following example let us discretize age attribute.

➢ Let us divide the values of age attribute into three bins(intervals).


➢ First load the dataset into weka(student.arff) Select the age attribute.
➢ Activate filter-dialog box and select
“WEKA.filters.unsupervised.attribute.discretize”from the list.
➢ To change the defaults for the filters,click on the box immediately to the right of
the choose button.
➢ We enter the index for the attribute to be discretized.In this case the attribute is
age.So we must enter ‘1’ corresponding to the age attribute.
➢ Enter ‘3’ as the number of bins.Leave the remaining field values as they are
Click OK button.
➢ Click apply in the filter panel.This will result in a new working relation with the
selected attribute partition into 3 bins.
➢ Save the new working relation in a file called student-data-discretized.arff

DATASET STUDENT .ARFF


@relation student
@attribute age {<30,30-40,>40}
@attribute income {low, medium, high}
@attribute student {yes, no}
@attribute credit-rating {fair, excellent}
@attribute buyspc {yes, no}
@data
%

<30, high, no, fair, no


<30, high, no, excellent, no
30-40, high, no, fair, yes
>40, medium, no, fair, yes
>40, low, yes, fair, yes
>40, low, yes, excellent, no
30-40, low, yes, excellent, yes
<30, medium, no, fair, no
<30, low, yes, fair, no
>40, medium, yes, fair, yes
<30, medium, yes, excellent, yes
30-40, medium, no, excellent, yes
30-40, high, yes, fair, yes
>40, medium, no, excellent, no

The following screenshot shows the effect of discretization.


OUTPUT

Result:
This program has been successfully executed
2. Demonstration of preprocessing on dataset labor.arff

Aim: This experiment illustrates some of the basic data preprocessing operations that can
be performed using WEKA-Explorer. The sample dataset used for this example is the
labor data available in arff format.

Procedure ( Using Explorer)

Step1:Loading the data. We can load the dataset into weka by clicking on open button in
preprocessing interface and selecting the appropriate file.

Step2:Once the data is loaded, weka will recognize the attributes and during the scan of
the
data weka will compute some basic strategies on each attribute. The left panel in
the
above figure shows the list of recognized attributes while the top panel indicates the names
of the base relation or table and the current working relation (which are same initially).

Step3:Clicking on an attribute in the left panel will show the basic statistics on the
attributes
for the categorical attributes the frequency of each attribute value is shown, while
for
continuous attributes we can obtain min, max, mean, standard deviation and
deviation
etc.,

Step4:The visualization in the right button panel in the form of cross-tabulation across two
attributes.
Note:we can select another attribute using the dropdown list.

Step5:Selecting or filtering attributes

Removing an attribute-When we need to remove an attribute,we can do this by


using
the attribute filters in weka.In the filter model panel,click on choose button,This will show
a popup window with a list of available filters.

Scroll down the list and select the “weka.filters.unsupervised.attribute.remove” filters.


Step 6:a) Next click the textbox immediately to the right of the choose button.In the
resulting
dialog box enter the index of the attribute to be filtered out.

b) Make sure that invert selection option is set to false.The click OK now in the
filter box.you
will see “Remove-R-7”.
c)Click the apply button to apply filter to this data.This will remove the attribute
and create new working relation.

d)Save the new working relation as an arff file by clicking save button on the
top(button)panel.(labor.arff)
Discretization

1)Sometimes association rule mining can only be performed on categorical data.This


requires performing discretization on numeric or continuous attributes.In the following
example let us discretize duration attribute.

➢ Let us divide the values of duration attribute into three bins(intervals)..


➢ First load the dataset into weka(labor.arff) Select the duration attribute.
➢ Activate filter-dialog box and select
WEKA.filters.unsupervised.attribute.discretize”from the list.
➢ To change the defaults for the filters,click on the box immediately to the right of
the choose button.
➢ We enter the index for the attribute to be discretized.In this case the attribute is
duration So we must enter ‘1’ corresponding to the duration attribute.
➢ Enter ‘1’ as the number of bins.Leave the remaining field values as they are.
➢ Click OK button.
➢ Click apply in the filter panel.This will result in a new working relation with the
selected attribute partition into 1 bin.
➢ Save the new working relation in a file called labor-data-discretized.arff Dataset
labor.arff
The following screenshot shows the effect of discretization

Result:

This program has been successfully executed.


3.Demonstration of Association rule process on dataset contactlenses.arff using
apriori algorithm
Aim:
This experiment illustrates some of the basic elements of asscociation rule mining
using WEKA. The sample dataset used for this example is contactlenses.arff

Procedure ( Using Explorer)

Step1: Open the data file in Weka Explorer. It is presumed that the required data fields
have been discretized. In this example it is age attribute.
Step2: Clicking on the associate tab will bring up the interface for association rule
algorithm.

Step3: We will use apriori algorithm. This is the default algorithm.

Step4: Inorder to change the parameters for the run (example support, confidence etc) we
click on the text box immediately to the right of the choose button.
Dataset contactlenses.arff
The following output shows the association rules that were generated when apriori algorithm is
applied on the given dataset.

OUTPUT
=== Run information ===

Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1


Relation: contact-lenses
Instances: 24
Attributes: 5
age
spectacle-prescrip
astigmatism
tear-prod-rate
contact-lenses
=== Associator model (full training set) ===

Apriori
=======
Minimum support: 0.2 (5 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 16
Generated sets of large itemsets:
Size of set of large itemsets L(1): 11
Size of set of large itemsets L(2): 21
Size of set of large itemsets L(3): 6

Best rules found:


1. tear-prod-rate=reduced 12 ==> contact-lenses=none 12 <conf:(1)> lift:(1.6) lev:(0.19) [4]
conv: (4.5)
2. spectacle-prescrip=myope tear-prod-rate=reduced 6 ==> contact-lenses=none 6 <conf:(1)>
lift:(1.6) lev:(0.09) [2] conv:(2.25)
3. spectacle-prescrip=hypermetrope tear-prod-rate=reduced 6 ==> contact-lenses=none 6
<conf:(1)> lift:(1.6) lev:(0.09) [2] conv:(2.25)
4. astigmatism=no tear-prod-rate=reduced 6 ==> contact-lenses=none 6 <conf:(1)> lift:(1.6)
lev:(0.09) [2] conv:(2.25)
5. astigmatism=yes tear-prod-rate=reduced 6 ==> contact-lenses=none 6 <conf:(1)> lift:(1.6)
lev:(0.09) [2] conv:(2.25)
6. contact-lenses=soft 5 ==> astigmatism=no 5 <conf:(1)> lift:(2) lev:(0.1) [2] conv:(2.5)
7. contact-lenses=soft 5 ==> tear-prod-rate=normal 5 <conf:(1)> lift:(2) lev:(0.1) [2]
conv:(2.5)
8. tear-prod-rate=normal contact-lenses=soft 5 ==> astigmatism=no 5 <conf:(1)> lift:(2)
lev:(0.1) [2] conv:(2.5)
9. astigmatism=no contact-lenses=soft 5 ==> tear-prod-rate=normal 5 <conf:(1)> lift:(2)
lev:(0.1) [2] conv:(2.5)
10. contact-lenses=soft 5 ==> astigmatism=no tear-prod-rate=normal 5 <conf:(1)> lift:(4)
lev:(0.16) [3] conv:(3.75)

Procedure ( Using Knowledge Flow)

Step1: Open Knowledge flow. Load sample training data set using arff loader.
Step2: Select Apriori Icon from Association tab, connect it with arff loader using dataset.
Step3: Select text viewer from visualization and connect it with apriori icon using text.
Step4: Run the flow using run icon.
Step5: Right Click the text viewer icon and select show results to view the results.
Knowledge Flow Diagram

OUTPUT

=== Associator model ==


Scheme: Apriori
Relation: contact-lenses
Apriori
=======
Minimum support: 0.2 (5 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 16
Generated sets of large itemsets:
Size of set of large itemsets L(1): 11
Size of set of large itemsets L(2): 21
Size of set of large itemsets L(3): 6
Best rules found:

1. tear-prod-rate=reduced 12 ==> contact-lenses=none 12 <conf:(1)> lift:(1.6) lev:(0.19) [4]


conv:(4.5)
2. spectacle-prescrip=myope tear-prod-rate=reduced 6 ==> contact-lenses=none 6 <conf:(1)>
lift:(1.6) lev:(0.09) [2] conv:(2.25)
3. spectacle-prescrip=hypermetrope tear-prod-rate=reduced 6 ==> contact-lenses=none 6
<conf:(1)> lift:(1.6) lev:(0.09) [2] conv:(2.25)
4. astigmatism=no tear-prod-rate=reduced 6 ==> contact-lenses=none 6 <conf:(1)> lift:(1.6)
lev:(0.09) [2] conv:(2.25)
5. astigmatism=yes tear-prod-rate=reduced 6 ==> contact-lenses=none 6 <conf:(1)> lift:(1.6)
lev:(0.09) [2] conv:(2.25)
6. contact-lenses=soft 5 ==> astigmatism=no 5 <conf:(1)> lift:(2) lev:(0.1) [2] conv:(2.5)
7. contact-lenses=soft 5 ==> tear-prod-rate=normal 5 <conf:(1)> lift:(2) lev:(0.1) [2] conv:(2.5)
8. tear-prod-rate=normal contact-lenses=soft 5 ==> astigmatism=no 5 <conf:(1)> lift:(2)
lev:(0.1) [2] conv:(2.5)
9. astigmatism=no contact-lenses=soft 5 ==> tear-prod-rate=normal 5 <conf:(1)> lift:(2)
lev:(0.1) [2] conv:(2.5)
10. contact-lenses=soft 5 ==> astigmatism=no tear-prod-rate=normal 5 <conf:(1)> lift:(4)
lev:(0.16) [3] conv:(3.75)

Result:
Thus the Apriori algorithm is implemented for item set, transactions and the rules are generated
successfully
4. Demonstration of Association rule process on dataset weather.arff using Aprior

Algorithm.

Aim:

This experiment illustrates some of the basic elements of asscociation rule mining
using WEKA. The sample dataset used for this example is weather.arff

Procedure ( Using Explorer)

Step1: Open the data file in Weka Explorer. It is presumed that the required data fields have been
discretized. In this example it is temperature attribute.

Step2: Clicking on the associate tab will bring up the interface for association rule algorithm.

Step3: We will use apriori algorithm. This is the default algorithm.

Step4: Inorder to change the parameters for the run (example support, confidence etc) we click on
the text box immediately to the right of the choose button.

Dataset weather.arff
@relation weather.symbolic

@attribute outlook {sunny, overcast, rainy}

@attribute temperature {hot, mild, cool}

@attribute humidity {high, normal}

@attribute windy {TRUE, FALSE}

@attribute play {yes, no}

@data

sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no

overcast,hot,high,FALSE,yes

rainy,mild,high,FALSE,yes

rainy,cool,normal,FALSE,yes

rainy,cool,normal,TRUE,no

overcast,cool,normal,TRUE,yes

sunny,mild,high,FALSE,no

sunny,cool,normal,FALSE,yes

rainy,mild,normal,FALSE,yes

sunny,mild,normal,TRUE,yes

overcast,mild,high,TRUE,yes

overcast,hot,normal,FALSE,yes

rainy,mild,high,TRUE,no

The following output shows the association rules that were generated when apriori algorithm is
applied on the given dataset.

OUTPUT

=== Run information ===


Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0 -c -1
Relation: weather
Instances: 14
Attributes: 5
outlook
temperature
humidity
windy
play
=== Associator model (full training set) ===
Apriori
=======
Minimum support: 0.15 (2 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 17

Generated sets of large itemsets:


Size of set of large itemsets L(1): 12
Size of set of large itemsets L(2): 47
Size of set of large itemsets L(3): 39
Size of set of large itemsets L(4): 6
Best rules found:
1. outlook=overcast 4 ==> play =yes 4 <conf:(1)> lift:(1.56) lev:(0.1) [1] conv:(1.43)
2. temperature=cool 4 ==> humidity=normal 4 <conf:(1)> lift:(2) lev:(0.14) [2] conv:(2)
3. humidity=normal windy=FALSE 4 ==> play =yes 4 <conf:(1)> lift:(1.56) lev:(0.1) [1]
conv:(1.43)
4. outlook=sunny play =no 3 ==> humidity=high 3 <conf:(1)> lift:(2) lev:(0.11) [1] conv:(1.5)
5. outlook=sunny humidity=high 3 ==> play =no 3 <conf:(1)> lift:(2.8) lev:(0.14) [1]
conv:(1.93)
6. outlook=rainy play =yes 3 ==> windy=FALSE 3 <conf:(1)> lift:(1.75) lev:(0.09) [1]
conv:(1.29)
7. outlook=rainy windy=FALSE 3 ==> play =yes 3 <conf:(1)> lift:(1.56) lev:(0.08) [1]
conv:(1.07)
8. temperature=cool play =yes 3 ==> humidity=normal 3 <conf:(1)> lift:(2) lev:(0.11) [1]
conv:(1.5)
9. outlook=sunny temperature=hot 2 ==> humidity=high 2 <conf:(1)> lift:(2) lev:(0.07) [1]
conv:(1)
10. temperature=hot play =no 2 ==> outlook=sunny 2 <conf:(1)> lift:(2.8) lev:(0.09) [1]
conv:(1.29)

Procedure ( Using Knowledge Flow)


Step1: Open Knowledge flow. Load sample training data set using arff loader.
Step2: Select Apriori Icon from Association tab, connect it with arff loader using dataset.
Step3: Select text viewer from visualization and connect it with apriori icon using text.
Step4: Run the flow using run icon.
Step5: Right Click the text viewer icon and select show results to view the results.
Knowledge Flow Diagram

OUTPUT

=== Associator model ===

Scheme: Apriori

Relation: weather

Apriori

=======

Minimum support: 0.15 (2 instances)

Minimum metric <confidence>: 0.9

Number of cycles performed: 17

Generated sets of large itemsets:


Size of set of large itemsets L(1): 12

Size of set of large itemsets L(2): 47

Size of set of large itemsets L(3): 39

Size of set of large itemsets L(4): 6

Best rules found:

1. outlook=overcast 4 ==> play =yes 4 <conf:(1)> lift:(1.56) lev:(0.1) [1] conv:(1.43)

2. temperature=cool 4 ==> humidity=normal 4 <conf:(1)> lift:(2) lev:(0.14) [2] conv:(2)

3. humidity=normal windy=FALSE 4 ==> play =yes 4 <conf:(1)> lift:(1.56) lev:(0.1) [1]


conv:(1.43)

4. outlook=sunny play =no 3 ==> humidity=high 3 <conf:(1)> lift:(2) lev:(0.11) [1] conv:(1.5)

5. outlook=sunny humidity=high 3 ==> play =no 3 <conf:(1)> lift:(2.8) lev:(0.14) [1]


conv:(1.93)

6. outlook=rainy play =yes 3 ==> windy=FALSE 3 <conf:(1)> lift:(1.75) lev:(0.09) [1]


conv:(1.29)

7. outlook=rainy windy=FALSE 3 ==> play =yes 3 <conf:(1)> lift:(1.56) lev:(0.08) [1]


conv:(1.07)

8. temperature=cool play =yes 3 ==> humidity=normal 3 <conf:(1)> lift:(2) lev:(0.11) [1]


conv:(1.5)

9. outlook=sunny temperature=hot 2 ==> humidity=high 2 <conf:(1)> lift:(2) lev:(0.07) [1]


conv:(1)

10. temperature=hot play =no 2 ==> outlook=sunny 2 <conf:(1)> lift:(2.8) lev:(0.09) [1]
conv:(1.29)

Result:
Thus the Apriori algorithm is implemented for item set, transactions and the rules are generated
successfully.
5. Demonstration of classification rule process on dataset student.arff using Id3/

j48algorithm

Aim:

This experiment illustrates the use of j-48 classifier in weka. The sample data
set used in this experiment is “student” data available at arff format. This document
assumes that appropriate data pre processing has been performed.

Procedure ( Using Explorer)

Steps involved in this experiment:

Step-1: We begin the experiment by loading the data (student.arff)into weka.

Step2: Next we select the “classify” tab and click “choose” button t o select the
“j48”classifier.

Step3: Now we specify the various parameters. These can be specified by clicking in the
text box to the right of the chose button. In this example, we accept the default values.
The default version does perform some pruning but does not perform error pruning.

Step4: Under the “text” options in the main panel. We select the 10-fold cross validation
as our evaluation approach. Since we don’t have separate evaluation data set, this is
necessary to get a reasonable idea of accuracy of generated model.

Step-5: We now click ”start” to generate the model .the Ascii version of the tree as well
as evaluation statistic will appear in the right panel when the model construction is
complete.

Step-6: Note that the classification accuracy of model is about 69%.this indicates that we
may find more work. (Either in preprocessing or in selecting current parameters for the
classification)

Step-7: Now weka also lets us a view a graphical version of the classification tree. This
can be done by right clicking the last result set and selecting “visualize tree” from the pop-
up menu.

Step-8: We will use our model to classify the new instances.

Step-9: In the main panel under “text” options click the “supplied test set” radio button
and then click the “set” button. This wills pop-up a window which will allow you to open
the file containing test instances.
Student Data set
CREDUT-
AGE INCOME STUDENT RATING BUYSPC

<30 HIGH NO FAIR NO

<30 HIGH NO EXCELLENT NO

30-40 HIGH NO EXCELLENT YES

>40 MEDIUM NO FAIR YES

>40 LOW YES FAIR YES

>40 LOW YES EXCELLENT NO

>40 LOW YES EXCELLENT NO

30-40 LOW YES EXCELLENT YES

<30 MEDIUM NO FAIR NO

<30 LOW YES FAIR NO

>40 MEDIUM YES FAIR YES

<30 MEDIUM YES FAIR YES

30-40 MEDIUM NO FAIR YES

30-40 HIGH YES EXCELLENT YES

>40 MEDIUM NO EXCELLENT NO

Dataset student .arff


@relation STUTENT

@attribute AGE {' <30',<30,30-40,>40}


@attribute INCOME {HIGH,MEDIUM,LOW}
@attribute STUDENT {NO,'NO ',YES}
@attribute CREDUT-RATING {FAIR,EXCELLENT}
@attribute BUYSPC {NO,YES}
@data
' <30',HIGH,NO,FAIR,NO
<30,HIGH,'NO ',EXCELLENT,NO
30-40,HIGH,NO,EXCELLENT,YES
>40,MEDIUM,NO,FAIR,YES
>40,LOW,YES,FAIR,YES
>40,LOW,YES,EXCELLENT,NO
>40,LOW,YES,EXCELLENT,NO
30-40,LOW,YES,EXCELLENT,YES
<30,MEDIUM,NO,FAIR,NO
<30,LOW,YES,FAIR,NO
>40,MEDIUM,YES,FAIR,YES
<30,MEDIUM,YES,FAIR,YES
30-40,MEDIUM,NO,FAIR,YES
30-40,HIGH,YES,EXCELLENT,YES
>40,MEDIUM,NO,EXCELLENT,NO

The following output shows the classification rules that were generated when j48
algorithm is applied on the given dataset.

OUTPUT
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: student
Instances: 15
Attributes: 5
AGE
INCOME
STUDENT
CREDUT-RATING
BUYSPC
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
J48 pruned tree
------------------
AGE = <30: NO (1.0)
AGE = <30: NO (4.0/1.0)
AGE = 30-40: YES (4.0)
AGE = >40
| CREDUT-RATING = FAIR: YES (3.0)
| CREDUT-RATING = EXCELLENT: NO (3.0)
Number of Leaves : 5
Size of the tree : 7
Time taken to build model: 0 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 13 86.6667 %
Incorrectly Classified Instances 2 13.3333 %
Kappa statistic 0.7321
Mean absolute error 0.1692
Root mean squared error 0.329
Relative absolute error 33.2913 %
Root relative squared error 64.5658 %
Total Number of Instances 15
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC
Area Class
0.857 0.125 0.857 0.857 0.857 0.732 0.902 0.814 NO
0.875 0.143 0.875 0.875 0.875 0.732 0.902 0.942 YES
Weighted Avg. 0.867 0.135 0.867 0.867 0.867 0.732 0.902 0.882
=== Confusion Matrix ===
a b <-- classified as
6 1 | a = NO
1 7 | b = YES
❖ Under the Result List right click the item to get the options as shown the
figure and select the option “visualize tree” option.
❖ After selecting the” visualize tree” option the output is represented as a tree
in a separate window as shown in the figure given below

OUTPUT VISUALIZE TREE

Procedure ( Using Knowledge Flow)

Step1: Open Knowledge flow. Load sample training data set using arff loader.
Step2: Select class assigner from evalution, connect it with arff loader using dataset.
Step 3: Select cross validation fold maker Icon from Evaluation tab, connect it with Class
assigner using dataset.
Step4: Select j48 from Classifier tab and connect it with cross validation Fold maker using
test set and training set.
Step 5: Select Classifier performance evaluator from Evaluation tab and connect it with
j48 using batch cluster.
Select 6: graph viewer from visualization and connect it with J48 icon using Graph.
Step7: Run the flow using run icon.
Step8: Right Click the Graph viewer icon and select show results to view the results.

KNOWLEDGE FLOW DIAGRAM

OUTPUT

Result:
Thus the Id3/ j48 classification algorithm is implemented for Student dataset and
the results are generated successfully.
6.Demonstration of classification rule process on dataset employee.arff
using id3 /j48algorithm
Aim:
This experiment illustrates the use of id3 classifier in weka. The sample data set
used in this experiment is “employee”data available at arff format. This document
assumes that appropriate data pre processing has been performed.

Procedure ( Using Explorer)

Steps involved in this experiment:

1. We begin the experiment by loading the data (employee.arff) into weka.

Step2: next we select the “classify” tab and click “choose” button to select the
“id3”classifier.

Step3: now we specify the various parameters. These can be specified by clicking in the
text
box to the right of the chose button. In this example, we accept the default values
his
default version does perform some pruning but does not perform error pruning.

Step4: under the “text “options in the main panel. We select the 10-fold cross validation as
our evaluation approach. Since we don’t have separate evaluation data set, this is
necessary to get a reasonable idea of accuracy of generated model.

Step-5: we now click”start”to generate the model .the ASCII version of the tree as well as
evaluation statistic will appear in the right panel when the model construction is
complete.
Step-6: note that the classification accuracy of model is about 69%.this indicates that we
may
find more work. (Either in preprocessing or in selecting current parameters for the
classification)

Step-7: now weka also lets us a view a graphical version of the classification tree. This can
be
done by right clicking the last result set and selecting “visualize tree” from the
pop-up
menu.
Step-8: we will use our model to classify the new instances.

Step-9: In the main panel under “text “options click the “supplied test set” radio button
and
then click the “set” button. This will show pop-up window which will allow you to
open the file containing test instances.

DATASET EMPLOYEE.ARFF:

@relation employee

@attribute age {25, 27, 28, 29, 30, 35, 48} @attribute
salary{10k,15k,17k,20k,25k,30k,35k,32k}

@attribute performance {good, avg, poor}

@data

25, 10k, poor

27, 15k, poor

27, 17k, poor

28, 17k, poor

29, 20k, avg

30, 25k, avg

29, 25k, avg

30, 20k, avg

35, 32k, good

48, 34k, good

48, 32k, good

The following output shows the classification rules that were generated when id3
algorithm is applied on the given dataset.
OUTPUT

=== Run information ===

Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2


Relation: employee
Instances: 12
Attributes: 3
age
salary
performance
Test mode: 10-fold cross-validation

=== Classifier model (full training set) ===

J48 pruned tree


------------------

age <= 29: POOR (6.0/1.0)


age > 29: AVG (6.0/2.0)

Number of Leaves : 2
Size of the tree : 3
Time taken to build model: 0.01 seconds

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 5 41.6667 %


Incorrectly Classified Instances 7 58.3333 %
Kappa statistic 0.0345
Mean absolute error 0.354
Root mean squared error 0.5424
Relative absolute error 79.0079 %
Root relative squared error 111.4352 %
Total Number of Instances 12

=== Detailed Accuracy By Class ===


TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC
Area Class
0.400 0.286 0.500 0.400 0.444 0.120 0.714 0.583 POOR
0.600 0.571 0.429 0.600 0.500 0.029 0.543 0.527 AVG
0.000 0.100 0.000 0.000 0.000 -0.135 0.525 0.208 GOOD
Weighted Avg. 0.417 0.374 0.387 0.417 0.394 0.039 0.611 0.497

=== Confusion Matrix ===

a b c <-- classified as
2 3 0 | a = POOR
1 3 1 | b = AVG
1 1 0 | c = GOOD

Procedure ( Using Knowledge Flow)

Step1: Open Knowledge flow. Load employee.arff training data set using arff loader.
Step2: Select class assigner from evalution, connect it with arff loader using dataset.
Step 3: Select cross validation fold maker Icon from Evaluation tab, connect it with Class
assigner using dataset.
Step4: Select j48 from Classifier tab and connect it with cross validation Fold maker using
test set and training set.
Step 5: Select Classifier performance evaluator from Evaluation tab and connect it with
j48
using batch cluster.
Step 6: Select text viewer from visualization and connect it with Classifier performance
evaluator icon using text.
Step7: Run the flow using run icon.
Step8: Right Click the text viewer icon and select show results to view the results.
KNOWLEDGE FLOW DIAGRAM

OUTPUT
The following screenshot shows the classification rules that were generated when id3
algorithm is applied on the given dataset.

Result:
Thus the Id3/ J48 classification algorithm is implemented for employee Student
dataset and the results are generated successfully
7. Demonstration of classification rule process on dataset labour.arff using Id3
/J48algorithm

Aim:
This experiment illustrates the use of id3 classifier in weka. The sample data set
used in this experiment is “labour”data available at arff format. This document assumes
that appropriate data pre processing has been performed.

Procedure ( Using Explorer)


Steps involved in this experiment:
1. We begin the experiment by loading the data (employee.arff) into weka.

Step2: next we select the “classify” tab and click “choose” button to select the
“id3”classifier.

Step3: now we specify the various parameters. These can be specified by clicking in the
text
box to the right of the chose button. In this example, we accept the default values
his
default version does perform some pruning but does not perform error pruning.

Step4: under the “text “options in the main panel. We select the 10-fold cross validation as
our evaluation approach. Since we don’t have separate evaluation data set, this is
necessary to get a reasonable idea of accuracy of generated model.

Step-5: we now click”start”to generate the model .the ASCII version of the tree as well as
evaluation statistic will appear in the right panel when the model construction is
complete.

Step-6: note that the classification accuracy of model is about 69%.this indicates that we
may
find more work. (Either in preprocessing or in selecting current parameters for the
classification)
Step-7: now weka also lets us a view a graphical version of the classification tree. This can
be
done by right clicking the last result set and selecting “visualize tree” from the
pop-up
menu.

Step-8: we will use our model to classify the new instances.

Step-9: In the main panel under “text “options click the “supplied test set” radio button
and
then click the “set” button. This will show pop-up window which will allow you
to
open the file containing test instances.

DATASET LABOUR.ARFF

% Date: Tue, 15 Nov 88 15:44:08 EST


% From: stan <[email protected]>
% To: [email protected]
%
% 1. Title: Final settlements in labor negotitions in Canadian industry
%
% 2. Source Information
% -- Creators: Collective Barganing Review, montly publication,
% Labour Canada, Industrial Relations Information Service,
% Ottawa, Ontario, K1A 0J2, Canada, (819) 997-3117
% The data includes all collective agreements reached
% in the business and personal services sector for locals
% with at least 500 members (teachers, nurses, university
% staff, police, etc) in Canada in 87 and first quarter of 88.
% -- Donor: Stan Matwin, Computer Science Dept, University of Ottawa,
% 34 Somerset East, K1N 9B4, ([email protected])
% -- Date: November 1988
%
% 3. Past Usage:
% -- testing concept learning software, in particular
% an experimental method to learn two-tiered concept descriptions.
% The data was used to learn the description of an acceptable
% and unacceptable contract.
% The unacceptable contracts were either obtained by interviewing
% experts, or by inventing near misses.
% Examples of use are described in:
% Bergadano, F., Matwin, S., Michalski, R.,
% Zhang, J., Measuring Quality of Concept Descriptions,
% Procs. of the 3rd European Working Sessions on Learning,
% Glasgow, October 1988.
% Bergadano, F., Matwin, S., Michalski, R., Zhang, J.,
% Representing and Acquiring Imprecise and Context-dependent
% Concepts in Knowledge-based Systems, Procs. of ISMIS'88,
% North Holland, 1988.
% 4. Relevant Information:
% -- data was used to test 2tier approach with learning
% from positive and negative examples
%
% 5. Number of Instances: 57
%
% 6. Number of Attributes: 16
%
% 7. Attribute Information:
% 1. dur: duration of agreement
% [1..7]
% 2 wage1.wage : wage increase in first year of contract
% [2.0 .. 7.0]
% 3 wage2.wage : wage increase in second year of contract
% [2.0 .. 7.0]
% 4 wage3.wage : wage increase in third year of contract
% [2.0 .. 7.0]
% 5 cola : cost of living allowance
% [none, tcf, tc]
% 6 hours.hrs : number of working hours during week
% [35 .. 40]
% 7 pension : employer contributions to pension plan
% [none, ret_allw, empl_contr]
% 8 stby_pay : standby pay
% [2 .. 25]
% 9 shift_diff : shift differencial : supplement for work on II and III shift
% [1 .. 25]
% 10 educ_allw.boolean : education allowance
% [true false]
% 11 holidays : number of statutory holidays
% [9 .. 15]
% 12 vacation : number of paid vacation days
% [ba, avg, gnr]
% 13 lngtrm_disabil.boolean :
% employer's help during employee longterm disabil
% ity [true , false]
% 14 dntl_ins : employers contribution towards the dental plan
% [none, half, full]
% 15 bereavement.boolean : employer's financial contribution towards the
% covering the costs of bereavement
% [true , false]
% 16 empl_hplan : employer's contribution towards the health plan
% [none, half, full]
%
% 8. Missing Attribute Values: None
%
% 9. Class Distribution:
%
% 10. Exceptions from format instructions: no commas between attribute values.
%
%
@relation 'labor-neg-data'
@attribute 'duration' numeric
@attribute 'wage-increase-first-year' numeric
@attribute 'wage-increase-second-year' numeric
@attribute 'wage-increase-third-year' numeric
@attribute 'cost-of-living-adjustment' {'none','tcf','tc'}
@attribute 'working-hours' numeric
@attribute 'pension' {'none','ret_allw','empl_contr'}
@attribute 'standby-pay' numeric
@attribute 'shift-differential' numeric
@attribute 'education-allowance' {'yes','no'}
@attribute 'statutory-holidays' numeric
@attribute 'vacation' {'below_average','average','generous'}
@attribute 'longterm-disability-assistance' {'yes','no'}
@attribute 'contribution-to-dental-plan' {'none','half','full'}
@attribute 'bereavement-assistance' {'yes','no'}
@attribute 'contribution-to-health-plan' {'none','half','full'}
@attribute 'class' {'bad','good'}
@data
1,5,?,?,?,40,?,?,2,?,11,'average',?,?,'yes',?,'good'
2,4.5,5.8,?,?,35,'ret_allw',?,?,'yes',11,'below_average',?,'full',?,'full','good'
?,?,?,?,?,38,'empl_contr',?,5,?,11,'generous','yes','half','yes','half','good'
3,3.7,4,5,'tc',?,?,?,?,'yes',?,?,?,?,'yes',?,'good'
3,4.5,4.5,5,?,40,?,?,?,?,12,'average',?,'half','yes','half','good'
2,2,2.5,?,?,35,?,?,6,'yes',12,'average',?,?,?,?,'good'
3,4,5,5,'tc',?,'empl_contr',?,?,?,12,'generous','yes','none','yes','half','good'
3,6.9,4.8,2.3,?,40,?,?,3,?,12,'below_average',?,?,?,?,'good'
2,3,7,?,?,38,?,12,25,'yes',11,'below_average','yes','half','yes',?,'good'
1,5.7,?,?,'none',40,'empl_contr',?,4,?,11,'generous','yes','full',?,?,'good'
3,3.5,4,4.6,'none',36,?,?,3,?,13,'generous',?,?,'yes','full','good'
2,6.4,6.4,?,?,38,?,?,4,?,15,?,?,'full',?,?,'good'
2,3.5,4,?,'none',40,?,?,2,'no',10,'below_average','no','half',?,'half','bad'
3,3.5,4,5.1,'tcf',37,?,?,4,?,13,'generous',?,'full','yes','full','good'
1,3,?,?,'none',36,?,?,10,'no',11,'generous',?,?,?,?,'good'
2,4.5,4,?,'none',37,'empl_contr',?,?,?,11,'average',?,'full','yes',?,'good'
1,2.8,?,?,?,35,?,?,2,?,12,'below_average',?,?,?,?,'good'
1,2.1,?,?,'tc',40,'ret_allw',2,3,'no',9,'below_average','yes','half',?,'none','bad'
1,2,?,?,'none',38,'none',?,?,'yes',11,'average','no','none','no','none','bad'
2,4,5,?,'tcf',35,?,13,5,?,15,'generous',?,?,?,?,'good'
2,4.3,4.4,?,?,38,?,?,4,?,12,'generous',?,'full',?,'full','good'
2,2.5,3,?,?,40,'none',?,?,?,11,'below_average',?,?,?,?,'bad'
3,3.5,4,4.6,'tcf',27,?,?,?,?,?,?,?,?,?,?,'good'
2,4.5,4,?,?,40,?,?,4,?,10,'generous',?,'half',?,'full','good'
1,6,?,?,?,38,?,8,3,?,9,'generous',?,?,?,?,'good'
3,2,2,2,'none',40,'none',?,?,?,10,'below_average',?,'half','yes','full','bad'
2,4.5,4.5,?,'tcf',?,?,?,?,'yes',10,'below_average','yes','none',?,'half','good'
2,3,3,?,'none',33,?,?,?,'yes',12,'generous',?,?,'yes','full','good'
2,5,4,?,'none',37,?,?,5,'no',11,'below_average','yes','full','yes','full','good'
3,2,2.5,?,?,35,'none',?,?,?,10,'average',?,?,'yes','full','bad'
3,4.5,4.5,5,'none',40,?,?,?,'no',11,'average',?,'half',?,?,'good'
3,3,2,2.5,'tc',40,'none',?,5,'no',10,'below_average','yes','half','yes','full','bad'
2,2.5,2.5,?,?,38,'empl_contr',?,?,?,10,'average',?,?,?,?,'bad'
2,4,5,?,'none',40,'none',?,3,'no',10,'below_average','no','none',?,'none','bad'
3,2,2.5,2.1,'tc',40,'none',2,1,'no',10,'below_average','no','half','yes','full','bad'
2,2,2,?,'none',40,'none',?,?,'no',11,'average','yes','none','yes','full','bad'
1,2,?,?,'tc',40,'ret_allw',4,0,'no',11,'generous','no','none','no','none','bad'
1,2.8,?,?,'none',38,'empl_contr',2,3,'no',9,'below_average','yes','half',?,'none','bad'
3,2,2.5,2,?,37,'empl_contr',?,?,?,10,'average',?,?,'yes','none','bad'
2,4.5,4,?,'none',40,?,?,4,?,12,'average','yes','full','yes','half','good'
1,4,?,?,'none',?,'none',?,?,'yes',11,'average','no','none','no','none','bad'
2,2,3,?,'none',38,'empl_contr',?,?,'yes',12,'generous','yes','none','yes','full','bad'
2,2.5,2.5,?,'tc',39,'empl_contr',?,?,?,12,'average',?,?,'yes',?,'bad'
2,2.5,3,?,'tcf',40,'none',?,?,?,11,'below_average',?,?,'yes',?,'bad'
2,4,4,?,'none',40,'none',?,3,?,10,'below_average','no','none',?,'none','bad'
2,4.5,4,?,?,40,?,?,2,'no',10,'below_average','no','half',?,'half','bad'
2,4.5,4,?,'none',40,?,?,5,?,11,'average',?,'full','yes','full','good'
2,4.6,4.6,?,'tcf',38,?,?,?,?,?,?,'yes','half',?,'half','good'
2,5,4.5,?,'none',38,?,14,5,?,11,'below_average','yes',?,?,'full','good'
2,5.7,4.5,?,'none',40,'ret_allw',?,?,?,11,'average','yes','full','yes','full','good'
2,7,5.3,?,?,?,?,?,?,?,11,?,'yes','full',?,?,'good'
3,2,3,?,'tcf',?,'empl_contr',?,?,'yes',?,?,'yes','half','yes',?,'good'
3,3.5,4,4.5,'tcf',35,?,?,?,?,13,'generous',?,?,'yes','full','good'
3,4,3.5,?,'none',40,'empl_contr',?,6,?,11,'average','yes','full',?,'full','good'
3,5,4.4,?,'none',38,'empl_contr',10,6,?,11,'generous','yes',?,?,'full','good'
3,5,5,5,?,40,?,?,?,?,12,'average',?,'half','yes','half','good'
3,6,6,4,?,35,?,?,14,?,9,'generous','yes','full','yes','full','good'
%
%
%

The following output shows the classification rules that were generated when id3
algorithm is applied on the given dataset
OUTPUT

=== Run information ===

Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2

Relation: labor-neg-data

Instances: 57

Attributes: 17

duration

wage-increase-first-year

wage-increase-second-year

wage-increase-third-year

cost-of-living-adjustment

working-hours

pension

standby-pay

shift-differential

education-allowance

statutory-holidays

vacation

longterm-disability-assistance

contribution-to-dental-plan

bereavement-assistance

contribution-to-health-plan

class

Test mode: 10-fold cross-validation


=== Classifier model (full training set) ===

J48 pruned tree

------------------

wage-increase-first-year <= 2.5: bad (15.27/2.27)

wage-increase-first-year > 2.5

| statutory-holidays <= 10: bad (10.77/4.77)

| statutory-holidays > 10: good (30.96/1.0)

Number of Leaves : 3

Size of the tree :

Time taken to build model: 0.01 seconds

=== Stratified cross-validation ===

=== Summary ===

Correctly Classified Instances 42 73.6842 %

Incorrectly Classified Instances 15 26.3158 %

Kappa statistic 0.4415

Mean absolute error 0.3192

Root mean squared error 0.4669

Relative absolute error 69.7715 %

Root relative squared error 97.7888 %

Total Number of Instances 57

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC


Area Class
0.700 0.243 0.609 0.700 0.651 0.444 0.695 0.559 bad

0.757 0.300 0.824 0.757 0.789 0.444 0.695 0.738 good

Weighted Avg. 0.737 0.280 0.748 0.737 0.740 0.444 0.695 0.675

=== Confusion Matrix ===

a b <-- classified as

14 6 | a = bad

9 28 | b = good

VISUALIZE TREE

Procedure ( Using Knowledge Flow)

Step1: Open Knowledge flow. Load labour.arff training data set using arff loader.
Step2: Select class assigner from evalution, connect it with arff loader using dataset.
Step 3: Select cross validation fold maker Icon from Evaluation tab, connect it with Class
assigner using dataset.
Step4: Select j48 from Classifier tab and connect it with cross validation Fold maker using
test set and training set.
Step 5: Select Classifier performance evaluator from Evaluation tab and connect it with
j48 using batch cluster.
Select 6 : graph viewer from visualization and connect it with J48 icon using Graph.
Step7: Run the flow using run icon.
Step8: Right Click the Graph viewer icon and select show results to view the results.
KNOWLEDGE FLOW DIAGRAM

Output
The following screenshot shows the classification rules that were generated when id3
algorithm is applied on the given dataset.

Result:
Thus the ID3/J48 classification algorithm is implemented for labour dataset and
the results are generated successfully
8. Demonstration of clustering rule process on dataset iris.arff using
simple k-means

Aim:
This experiment illustrates the use of simple k-mean clustering with Weka
explorer. The sample data set used for this example is based on the iris data available in
ARFF format. This document assumes that appropriate preprocessing has been performed.
This iris dataset includes 150 instances.

PROCEDURE ( USING EXPLORER)

Steps involved in this Experiment

Step 1: Run the Weka explorer and load the data file iris.arff in preprocessing interface.

Step 2: Inorder to perform clustering select the ‘cluster’ tab in the explorer and click on
the

choose button. This step results in a dropdown list of available clustering


algorithms.

Step 3 : In this case we select ‘simple k-means’.

Step 4: Next click in text button to the right of the choose button to get popup window
shown

in the screenshots. In this window we enter six on the number of clusters and we

leave the value of the seed on as it is. The seed value is used in generating a random
number which is used for making the internal assignments of instances of clusters.

Step 5 : Once of the option have been specified. We run the clustering algorithm there we

must make sure that they are in the ‘cluster mode’ panel. The use of training set

option is selected and then we click ‘start’ button. This process and resulting window are
shown in the following screenshots.

Step 6 : The result window shows the centroid of each cluster as well as statistics on the
number and the percent of instances assigned to different clusters. Here clusters centroid
are means vectors for each clusters. This clusters can be used to characterized the
cluster.For eg, the centroid of cluster1 shows the class iris.versicolor mean value of the
sepal length is 5.4706, sepal width 2.4765, petal width 1.1294, petal length 3.7941.

Step 7: Another way of understanding characterstics of each cluster through visualization


,we can do this, try right clicking the result set on the result. List panel and selecting the
visualize cluster assignments.

DATA SET

@RELATION iris

@ATTRIBUTE sepallength REAL

@ATTRIBUTE sepalwidth REAL

@ATTRIBUTE petallength REAL

@ATTRIBUTE petalwidth REAL

@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}

@DATA

5.1,3.5,1.4,0.2,Iris-setosa

4.9,3.0,1.4,0.2,Iris-setosa

4.7,3.2,1.3,0.2,Iris-setosa

4.6,3.1,1.5,0.2,Iris-setosa

5.0,3.6,1.4,0.2,Iris-setosa

5.4,3.9,1.7,0.4,Iris-setosa

4.6,3.4,1.4,0.3,Iris-setosa

5.0,3.4,1.5,0.2,Iris-setosa

4.4,2.9,1.4,0.2,Iris-setosa

4.9,3.1,1.5,0.1,Iris-setosa

5.4,3.7,1.5,0.2,Iris-setosa

4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa

4.3,3.0,1.1,0.1,Iris-setosa

5.8,4.0,1.2,0.2,Iris-setosa

5.7,4.4,1.5,0.4,Iris-setosa

5.4,3.9,1.3,0.4,Iris-setosa

5.1,3.5,1.4,0.3,Iris-setosa

5.7,3.8,1.7,0.3,Iris-setosa

5.1,3.8,1.5,0.3,Iris-setosa

5.4,3.4,1.7,0.2,Iris-setosa

5.1,3.7,1.5,0.4,Iris-setosa

4.6,3.6,1.0,0.2,Iris-setosa

5.1,3.3,1.7,0.5,Iris-setosa

4.8,3.4,1.9,0.2,Iris-setosa

5.0,3.0,1.6,0.2,Iris-setosa

5.0,3.4,1.6,0.4,Iris-setosa

5.2,3.5,1.5,0.2,Iris-setosa

5.2,3.4,1.4,0.2,Iris-setosa

4.7,3.2,1.6,0.2,Iris-setosa

4.8,3.1,1.6,0.2,Iris-setosa

5.4,3.4,1.5,0.4,Iris-setosa

5.2,4.1,1.5,0.1,Iris-setosa

5.5,4.2,1.4,0.2,Iris-setosa

4.9,3.1,1.5,0.1,Iris-setosa

5.0,3.2,1.2,0.2,Iris-setosa

5.5,3.5,1.3,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa

4.4,3.0,1.3,0.2,Iris-setosa

5.1,3.4,1.5,0.2,Iris-setosa

5.0,3.5,1.3,0.3,Iris-setosa

4.5,2.3,1.3,0.3,Iris-setosa

4.4,3.2,1.3,0.2,Iris-setosa

5.0,3.5,1.6,0.6,Iris-setosa

5.1,3.8,1.9,0.4,Iris-setosa

4.8,3.0,1.4,0.3,Iris-setosa

5.1,3.8,1.6,0.2,Iris-setosa

4.6,3.2,1.4,0.2,Iris-setosa

5.3,3.7,1.5,0.2,Iris-setosa

5.0,3.3,1.4,0.2,Iris-setosa

7.0,3.2,4.7,1.4,Iris-versicolor

6.4,3.2,4.5,1.5,Iris-versicolor

6.9,3.1,4.9,1.5,Iris-versicolor

5.5,2.3,4.0,1.3,Iris-versicolor

6.5,2.8,4.6,1.5,Iris-versicolor

5.7,2.8,4.5,1.3,Iris-versicolor

6.3,3.3,4.7,1.6,Iris-versicolor

4.9,2.4,3.3,1.0,Iris-versicolor

6.6,2.9,4.6,1.3,Iris-versicolor

5.2,2.7,3.9,1.4,Iris-versicolor

5.0,2.0,3.5,1.0,Iris-versicolor

5.9,3.0,4.2,1.5,Iris-versicolor
6.0,2.2,4.0,1.0,Iris-versicolor

6.1,2.9,4.7,1.4,Iris-versicolor

5.6,2.9,3.6,1.3,Iris-versicolor

6.7,3.1,4.4,1.4,Iris-versicolor

5.6,3.0,4.5,1.5,Iris-versicolor

5.8,2.7,4.1,1.0,Iris-versicolor

6.2,2.2,4.5,1.5,Iris-versicolor

5.6,2.5,3.9,1.1,Iris-versicolor

5.9,3.2,4.8,1.8,Iris-versicolor

6.1,2.8,4.0,1.3,Iris-versicolor

6.3,2.5,4.9,1.5,Iris-versicolor

6.1,2.8,4.7,1.2,Iris-versicolor

6.4,2.9,4.3,1.3,Iris-versicolor

6.6,3.0,4.4,1.4,Iris-versicolor

6.8,2.8,4.8,1.4,Iris-versicolor

6.7,3.0,5.0,1.7,Iris-versicolor

6.0,2.9,4.5,1.5,Iris-versicolor

5.7,2.6,3.5,1.0,Iris-versicolor

5.5,2.4,3.8,1.1,Iris-versicolor

5.5,2.4,3.7,1.0,Iris-versicolor

5.8,2.7,3.9,1.2,Iris-versicolor

6.0,2.7,5.1,1.6,Iris-versicolor

5.4,3.0,4.5,1.5,Iris-versicolor

6.0,3.4,4.5,1.6,Iris-versicolor

6.7,3.1,4.7,1.5,Iris-versicolor
6.3,2.3,4.4,1.3,Iris-versicolor

5.6,3.0,4.1,1.3,Iris-versicolor

5.5,2.5,4.0,1.3,Iris-versicolor

5.5,2.6,4.4,1.2,Iris-versicolor

6.1,3.0,4.6,1.4,Iris-versicolor

5.8,2.6,4.0,1.2,Iris-versicolor

5.0,2.3,3.3,1.0,Iris-versicolor

5.6,2.7,4.2,1.3,Iris-versicolor

5.7,3.0,4.2,1.2,Iris-versicolor

5.7,2.9,4.2,1.3,Iris-versicolor

6.2,2.9,4.3,1.3,Iris-versicolor

5.1,2.5,3.0,1.1,Iris-versicolor

5.7,2.8,4.1,1.3,Iris-versicolor

6.3,3.3,6.0,2.5,Iris-virginica

5.8,2.7,5.1,1.9,Iris-virginica

7.1,3.0,5.9,2.1,Iris-virginica

6.3,2.9,5.6,1.8,Iris-virginica

6.5,3.0,5.8,2.2,Iris-virginica

7.6,3.0,6.6,2.1,Iris-virginica

4.9,2.5,4.5,1.7,Iris-virginica

7.3,2.9,6.3,1.8,Iris-virginica

6.7,2.5,5.8,1.8,Iris-virginica

7.2,3.6,6.1,2.5,Iris-virginica

6.5,3.2,5.1,2.0,Iris-virginica

6.4,2.7,5.3,1.9,Iris-virginica
6.8,3.0,5.5,2.1,Iris-virginica

5.7,2.5,5.0,2.0,Iris-virginica

5.8,2.8,5.1,2.4,Iris-virginica

6.4,3.2,5.3,2.3,Iris-virginica

6.5,3.0,5.5,1.8,Iris-virginica

7.7,3.8,6.7,2.2,Iris-virginica

7.7,2.6,6.9,2.3,Iris-virginica

6.0,2.2,5.0,1.5,Iris-virginica

6.9,3.2,5.7,2.3,Iris-virginica

5.6,2.8,4.9,2.0,Iris-virginica

7.7,2.8,6.7,2.0,Iris-virginica

6.3,2.7,4.9,1.8,Iris-virginica

6.7,3.3,5.7,2.1,Iris-virginica

7.2,3.2,6.0,1.8,Iris-virginica

6.2,2.8,4.8,1.8,Iris-virginica

6.1,3.0,4.9,1.8,Iris-virginica

6.4,2.8,5.6,2.1,Iris-virginica

7.2,3.0,5.8,1.6,Iris-virginica

7.4,2.8,6.1,1.9,Iris-virginica

7.9,3.8,6.4,2.0,Iris-virginica

6.4,2.8,5.6,2.2,Iris-virginica

6.3,2.8,5.1,1.5,Iris-virginica

6.1,2.6,5.6,1.4,Iris-virginica

7.7,3.0,6.1,2.3,Iris-virginica

6.3,3.4,5.6,2.4,Iris-virginica
6.4,3.1,5.5,1.8,Iris-virginica

6.0,3.0,4.8,1.8,Iris-virginica

6.9,3.1,5.4,2.1,Iris-virginica

6.7,3.1,5.6,2.4,Iris-virginica

6.9,3.1,5.1,2.3,Iris-virginica

5.8,2.7,5.1,1.9,Iris-virginica

6.8,3.2,5.9,2.3,Iris-virginica

6.7,3.3,5.7,2.5,Iris-virginica

6.7,3.0,5.2,2.3,Iris-virginica

6.3,2.5,5.0,1.9,Iris-virginica

6.5,3.0,5.2,2.0,Iris-virginica

6.2,3.4,5.4,2.3,Iris-virginica

5.9,3.0,5.1,1.8,Iris-virginica

%
The following output shows the clustering rules that were generated when simple k means
algorithm is applied on the given dataset.

OUTP
=== Run information ===

Scheme: weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-


pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 2 -A "weka.core.EuclideanDistance -
R first-last" -I 500 -num-slots 1 -S 10

Relation: iris

Instances: 150

Attributes: 5
sepallength

sepalwidth

petallength

petalwidth

class

Test mode: evaluate on training data

=== Clustering model (full training set) ===

kMeans

======

Number of iterations: 7

Within cluster sum of squared errors: 62.1436882815797

Initial starting points (random):

Cluster 0: 6.1,2.9,4.7,1.4,Iris-versicolor

Cluster 1: 6.2,2.9,4.3,1.3,Iris-versicolor

Missing values globally replaced with mean/mode

Final cluster centroids:

Cluster#

Attribute Full Data 0 1

(150.0) (100.0) (50.0)

===========================================

sepallength 5.8433 6.262 5.006

sepalwidth 3.054 2.872 3.418

petallength 3.7587 4.906 1.464


petalwidth 1.1987 1.676 0.244

class Iris-setosa Iris-versicolor Iris-setosa

Time taken to build model (full training data) : 0.01 seconds

=== Model and evaluation on training set ===

Clustered Instances

0 100 ( 67%)

1 50 ( 33%)

Interpretation of the above visualization

From the above visualization, we can understand the distribution of sepal length and petal
length in each cluster. For instance, for each cluster is dominated by petal length. In this
case by changing the color dimension to other attributes we can see their distribution with
in each of the cluster.

Step 8: We can assure that resulting dataset which included each instance along with its
assign cluster. To do so we click the save button in the visualization window and save the
result iris k-mean .The top portion of this file is shown in the following figure
Procedure ( Using Knowledge Flow)

Step1: Open Knowledge flow. Load sample training data set using arff loader.
Step2: Select cross validation fold maker Icon from Evaluation tab, connect it with arff
loader using dataset.
Step3: Select Simlpe K Means from cluster tab and connect it with cross validation Fold
maker using test set and training set.
Step 4: Select Cluster performance evaluator from Evaluation tab and connect it with
Simlpe K Means using batch cluster.
Step 5: Select text viewer from visualization and connect it with Cluster performance
evaluator icon using text.
Step6: Run the flow using run icon.
Step5: Right Click the text viewer icon and select show results to view the results.
Knowledge Flow Diagram

OUTPUT
=== Evaluation result for training instances ===

Scheme: SimpleKMeans-init 0 -max-candidates 100 -periodic-pruning 10000 -min-density


2.0 -t1 -1.25 -t2 -1.0 -N 2 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-
slots 1 -S 10

Relation: iris

kMeans

======

Number of iterations: 2

Witiin cluster sum of squared errors: 62.11536812340375

Initial starting points (random):

Cluster 0: 5.5,2.4,3.8,1.1,Iris-versicolor

Cluster 1: 7.7,3.8,6.7,2.2,Iris-virginica

Missing values globally replaced with mean/mode


Final cluster centroids:

Cluster#

Attribute Full Data 0 1

(135.0) (88.0) (47.0)

================================================================

sepallength 5.8748 5.4909 6.5936

sepalwidth 3.0444 3.083 2.9723

petallength 3.8459 2.933 5.5553

petalwidth 1.2326 0.8045 2.034

class Iris-virginica Iris-versicolor Iris-virginica

Clustered Instances

0 88 ( 65%)

1 47 ( 35%)

Result:
Thus the Simple K Means clustering algorithm is implemented for iris dataset
and the Clusters are generated successfully.
9.Demonstration of clustering rule process on dataset student.arff using
simple k-means

Aim:
This experiment illustrates the use of simple k-mean clustering with Weka
explorer. The sample data set used for this example is based on the student data available
in ARFF format. This document assumes that appropriate preprocessing has been
performed. This istudent dataset includes 14 instances.

PROCEDURE ( USING EXPLORER)

Steps involved in this Experiment


Step 1: Run the Weka explorer and load the data file student.arff in preprocessing
interface.
Step 2: Inorder to perform clustering select the ‘cluster’ tab in the explorer and click on
the

choose button. This step results in a dropdown list of available clustering


algorithms.

Step 3 : In this case we select ‘simple k-means’.


Step 4: Next click in text button to the right of the choose button to get popup window
shown

in the screenshots. In this window we enter six on the number of clusters and we

leave the value of the seed on as it is. The seed value is used in generating a random
number which is used for making the internal assignments of instances of clusters.

Step 5 : Once of the option have been specified. We run the clustering algorithm there we

must make sure that they are in the ‘cluster mode’ panel. The use of training set

option is selected and then we click ‘start’ button. This process and resulting window are
shown in the following screenshots.

Step 6 : The result window shows the centroid of each cluster as well as statistics on the
number and the percent of instances assigned to different clusters. Here clusters centroid
are means vectors for each clusters. This clusters can be used to characterized the cluster.

Step 7: Another way of understanding characterstics of each cluster through visualization


,we

can do this, try right clicking the result set on the result. List panel and selecting the
visualize cluster assignments.
Dataset student .arff
@relation student
@attribute age {<30,30-40,>40}
@attribute income {low,medium,high}
@attribute student {yes,no}
@attribute credit-rating {fair,excellent}
@attribute buyspc {yes,no}
@data
%
<30, high, no, fair, no
<30, high, no, excellent, no
30-40, high, no, fair, yes
>40, medium, no, fair, yes
>40, low, yes, fair, yes
>40, low, yes, excellent, no
30-40, low, yes, excellent, yes
<30, medium, no, fair, no
<30, low, yes, fair, no
>40, medium, yes, fair, yes
<30, medium, yes, excellent, yes
30-40, medium, no, excellent, yes
30-40, high, yes, fair, yes
>40, medium, no, excellent, no
%
The following output shows the clustering rules that were generated when simple k-means
algorithm is applied on the given dataset.

OUTPUT

=== Run information ===

Scheme: weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning


10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 2 -A "weka.core.EuclideanDistance -R first-last"
-I 500 -num-slots 1 -S 10

Relation: STUDENT1

Instances: 14

Attributes: 5

age

income

student

credit-rating

buyspc

Test mode: evaluate on training data

=== Clustering model (full training set) ===

kMeans

======

Number of iterations: 3

Within cluster sum of squared errors: 28.0

Initial starting points (random):

Cluster 0: >40,medium,yes,fair,yes
Cluster 1: 30-40,low,yes,excellent,yes

Missing values globally replaced with mean/mode

Final cluster centroids:

Cluster#

Attribute Full Data 0 1

(14.0) (10.0) (4.0)

=============================================

age <30 <30 >40

income medium medium low

student no no no

credit-rating fair fair excellent

buyspc yes yes no

Time taken to build model (full training data) : 0.01 seconds

=== Model and evaluation on training set ===

Clustered Instances

0 10 ( 71%)

1 4 ( 29%)

VISUALIZE CLUSTER ASSIGNMENT


Interpretation of the above visualization
From the above visualization, we can understand the distribution of age and instance number
in each cluster. For instance, for each cluster is dominated by age. In this case by changing the
color dimension to other attributes we can see their distribution with in each of the cluster.

Step 8: We can assure that resulting dataset which included each instance along with its
assign cluster. To do so we click the save button in the visualization window and save the
result student k-mean .The top portion of this file is shown in the following figure.

Procedure ( Using Knowledge Flow)

Step1: Open Knowledge flow. Load sample training data set using arff loader.
Step2: Select cross validation fold maker Icon from Evaluation tab, connect it with arff loader
using dataset.
Step3: Select Simlpe K Means from cluster tab and connect it with cross validation Fold
maker using test set and training set.
Step 4: Select Cluster performance evaluator from Evaluation tab and connect it with Simlpe
K Means using batch cluster.
Step 5: Select text viewer from visualization and connect it with Cluster performance
evaluator icon using text.
Step6: Run the flow using run icon.
Step5: Right Click the text viewer icon and select show results to view the results.
STUDENT.ARRF DATA SET

credit-
age income student rating buyspc
<30 high no fair no
<30 high no excellent no
30-40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
30-40 low yes excellent yes
<30 medium no fair no
<30 low yes fair no
>40 medium yes fair yes
<30 medium yes excellent yes
30-40 medium no excellent yes
30-40 high yes fair yes
>40 medium no excellent no

Knowledge Flow Diagram


OUTPUT

=== Evaluation result for training instances ===

Scheme: SimpleKMeans-init 0 -max-candidates 100 -periodic-pruning 10000 -min-density


2.0 -t1 -1.25 -t2 -1.0 -N 2 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -
S 10

Relation: STUDENT1

kMeans

======

Number of iterations: 3

Within cluster sum of squared errors: 18.0

Initial starting points (random):

Cluster 0: <30,high,no,excellent,no

Cluster 1: 30-40,medium,no,excellent,yes

Missing values globally replaced with mean/mode

Final cluster centroids:

Cluster#

Attribute Full Data 0 1

(12.0) (5.0) (7.0)

================================================

age <30 <30 30-40

income medium high medium

student no no yes

credit-rating fair fair fair

buyspc yes no yes


Clustered Instances

0 5 ( 42%)

1 7 ( 58%)

Result:

Thus the Simple K Means clustering algorithm is implemented for iris dataset and the
Clusters are generated successfully
10. CREATION OF A DATA WAREHOUSE USING POSTGRESSQL

Aim:
To create a data warehouse using postgresSQL and perform operations.

Procedure:

Step1: create a database.


Step 2.create a schema.
Step 3.Design and create a list of tables.
Step 4: Load the data.
Step 5: Perform the operations.

Step1: Create a database.


create database databasename;

Step 2: Create a schema.


create schema schemaname;

Step 3.Design and create a list of tables.


Creation of tables.

#1

create table dw.timedim(


ID_time INT NOT NULL,
DayOfWeek CHAR(15) NOT NULL,
DateMonth INT NOT NULL,
DateYear INT NOT NULL,
PRIMARY KEY(ID_time)
);

#2
create table dw.phonerate(
ID_phoneRate INTEGER NOT NULL,
phoneRateType VARCHAR(20) NOT NULL,
PRIMARY KEY(ID_phoneRate)
);
#3
create table dw.location(
ID_location INTEGER NOT NULL,
City VARCHAR(20) NOT NULL,
Province CHAR(20) NOT NULL,
Region CHAR(20) NOT NULL,
PRIMARY KEY(ID_location)
);

#4
create table dw.facts(
ID_time INTEGER NOT NULL,
ID_phoneRate INTEGER NOT NULL,
ID_location_Caller INTEGER NOT NULL,
ID_location_Receiver INTEGER NOT NULL,
Price FLOAT NOT NULL,
NumberOfCalls INTEGER NOT NULL,
PRIMARY KEY(ID_time,ID_phoneRate,ID_location_Caller,ID_location_Receiver),
FOREIGN KEY(ID_time) REFERENCES dw.timedim(ID_time),
FOREIGN KEY(ID_phoneRate) REFERENCES dw.phonerate(ID_phoneRate),
FOREIGN KEY(ID_location_Caller) REFERENCES dw.location(ID_location),
FOREIGN KEY(ID_location_Receiver) REFERENCES dw.location(ID_location)
);
Operations:

#1 Select the yearly income for each phone rate, the total income for each phone rate, and the
total yearly income.
------
select sum(Price), dateYear, phoneRateType
from dw.facts F, dw.timedim T, dw.phonerate P
where F.Id_time = T.Id_time and F.Id_phoneRate = P.Id_phoneRate
group by cube(phoneRateType, dateYear)
#2 Select the monthly number of calls and the monthly income. Associate the RANK() to
each month according to its income (1 for the month with the highest income, 2 for the
second, etc., the last month is the one with the least income).
------
SELECT DateMonth,DateYear, SUM(NumberOfCalls) as TotNumOfCalls,
SUM(price) as totalIncome,
RANK() over (ORDER BY SUM(price) DESC) as RankIncome
FROM dw.facts F, dw.timedim Te
WHERE F.id_time=Te.id_time
GROUP BY DateMonth,DateYear;
#3 For each month in 2003, select the total number of calls. Associate the RANK() to each
month according to its total number of calls (1 for the month with the highest number of calls,
2 for the second, etc., the last month is the one with the least number of calls).
------
SELECT DateMonth, SUM(NumberOfCalls) as TotNumOfCalls,
RANK() over (ORDER BY SUM(NumberOfCalls) DESC) as RankNumOfCalls
FROM dw.facts F, dw.timedim Te
WHERE F.id_time=Te.id_time
AND DateYear=2003
GROUP BY DateMonth;

Result:

Thus the data warehouse has been created and some operations are performed using

postgres.

You might also like