DWDM Record With Alignment
DWDM Record With Alignment
arff
Aim: This experiment illustrates some of the basic data preprocessing operations that can
be performed using WEKA-Explorer. The sample dataset used for this example is the
student data available in arff format.
Step1: Loading the data. We can load the dataset into weka by clicking on open button in
Step2: Once the data is loaded, weka will recognize the attributes and during the scan of
the data weka will compute some basic strategies on each attribute. The left panel in the
above figure shows the list of recognized attributes while the top panel indicates the names
of the base relation or table and the current working relation (which are same initially).
Step3: Clicking on an attribute in the left panel will show the basic statistics on the
attributes for the categorical attributes the frequency of each attribute value is shown,
while for continuous attributes we can obtain min, max, mean, standard deviation and
deviation etc.,
Step4:The visualization in the right button panel in the form of cross-tabulation across two
attributes.
Step 6:
a) Next click the textbox immediately to the right of the choose button.In the resulting
b) Make sure that invert selection option is set to false.The click OK now in the filter
d) Save the new working relation as an arff file by clicking save button on the
top(button)panel.(student.arff)
Discretization
Result:
This program has been successfully executed
2. Demonstration of preprocessing on dataset labor.arff
Aim: This experiment illustrates some of the basic data preprocessing operations that can
be performed using WEKA-Explorer. The sample dataset used for this example is the
labor data available in arff format.
Step1:Loading the data. We can load the dataset into weka by clicking on open button in
preprocessing interface and selecting the appropriate file.
Step2:Once the data is loaded, weka will recognize the attributes and during the scan of
the
data weka will compute some basic strategies on each attribute. The left panel in
the
above figure shows the list of recognized attributes while the top panel indicates the names
of the base relation or table and the current working relation (which are same initially).
Step3:Clicking on an attribute in the left panel will show the basic statistics on the
attributes
for the categorical attributes the frequency of each attribute value is shown, while
for
continuous attributes we can obtain min, max, mean, standard deviation and
deviation
etc.,
Step4:The visualization in the right button panel in the form of cross-tabulation across two
attributes.
Note:we can select another attribute using the dropdown list.
b) Make sure that invert selection option is set to false.The click OK now in the
filter box.you
will see “Remove-R-7”.
c)Click the apply button to apply filter to this data.This will remove the attribute
and create new working relation.
d)Save the new working relation as an arff file by clicking save button on the
top(button)panel.(labor.arff)
Discretization
Result:
Step1: Open the data file in Weka Explorer. It is presumed that the required data fields
have been discretized. In this example it is age attribute.
Step2: Clicking on the associate tab will bring up the interface for association rule
algorithm.
Step4: Inorder to change the parameters for the run (example support, confidence etc) we
click on the text box immediately to the right of the choose button.
Dataset contactlenses.arff
The following output shows the association rules that were generated when apriori algorithm is
applied on the given dataset.
OUTPUT
=== Run information ===
Apriori
=======
Minimum support: 0.2 (5 instances)
Minimum metric <confidence>: 0.9
Number of cycles performed: 16
Generated sets of large itemsets:
Size of set of large itemsets L(1): 11
Size of set of large itemsets L(2): 21
Size of set of large itemsets L(3): 6
Step1: Open Knowledge flow. Load sample training data set using arff loader.
Step2: Select Apriori Icon from Association tab, connect it with arff loader using dataset.
Step3: Select text viewer from visualization and connect it with apriori icon using text.
Step4: Run the flow using run icon.
Step5: Right Click the text viewer icon and select show results to view the results.
Knowledge Flow Diagram
OUTPUT
Result:
Thus the Apriori algorithm is implemented for item set, transactions and the rules are generated
successfully
4. Demonstration of Association rule process on dataset weather.arff using Aprior
Algorithm.
Aim:
This experiment illustrates some of the basic elements of asscociation rule mining
using WEKA. The sample dataset used for this example is weather.arff
Step1: Open the data file in Weka Explorer. It is presumed that the required data fields have been
discretized. In this example it is temperature attribute.
Step2: Clicking on the associate tab will bring up the interface for association rule algorithm.
Step4: Inorder to change the parameters for the run (example support, confidence etc) we click on
the text box immediately to the right of the choose button.
Dataset weather.arff
@relation weather.symbolic
@data
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes
sunny,mild,high,FALSE,no
sunny,cool,normal,FALSE,yes
rainy,mild,normal,FALSE,yes
sunny,mild,normal,TRUE,yes
overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes
rainy,mild,high,TRUE,no
The following output shows the association rules that were generated when apriori algorithm is
applied on the given dataset.
OUTPUT
OUTPUT
Scheme: Apriori
Relation: weather
Apriori
=======
4. outlook=sunny play =no 3 ==> humidity=high 3 <conf:(1)> lift:(2) lev:(0.11) [1] conv:(1.5)
10. temperature=hot play =no 2 ==> outlook=sunny 2 <conf:(1)> lift:(2.8) lev:(0.09) [1]
conv:(1.29)
Result:
Thus the Apriori algorithm is implemented for item set, transactions and the rules are generated
successfully.
5. Demonstration of classification rule process on dataset student.arff using Id3/
j48algorithm
Aim:
This experiment illustrates the use of j-48 classifier in weka. The sample data
set used in this experiment is “student” data available at arff format. This document
assumes that appropriate data pre processing has been performed.
Step2: Next we select the “classify” tab and click “choose” button t o select the
“j48”classifier.
Step3: Now we specify the various parameters. These can be specified by clicking in the
text box to the right of the chose button. In this example, we accept the default values.
The default version does perform some pruning but does not perform error pruning.
Step4: Under the “text” options in the main panel. We select the 10-fold cross validation
as our evaluation approach. Since we don’t have separate evaluation data set, this is
necessary to get a reasonable idea of accuracy of generated model.
Step-5: We now click ”start” to generate the model .the Ascii version of the tree as well
as evaluation statistic will appear in the right panel when the model construction is
complete.
Step-6: Note that the classification accuracy of model is about 69%.this indicates that we
may find more work. (Either in preprocessing or in selecting current parameters for the
classification)
Step-7: Now weka also lets us a view a graphical version of the classification tree. This
can be done by right clicking the last result set and selecting “visualize tree” from the pop-
up menu.
Step-9: In the main panel under “text” options click the “supplied test set” radio button
and then click the “set” button. This wills pop-up a window which will allow you to open
the file containing test instances.
Student Data set
CREDUT-
AGE INCOME STUDENT RATING BUYSPC
The following output shows the classification rules that were generated when j48
algorithm is applied on the given dataset.
OUTPUT
=== Run information ===
Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2
Relation: student
Instances: 15
Attributes: 5
AGE
INCOME
STUDENT
CREDUT-RATING
BUYSPC
Test mode: 10-fold cross-validation
=== Classifier model (full training set) ===
J48 pruned tree
------------------
AGE = <30: NO (1.0)
AGE = <30: NO (4.0/1.0)
AGE = 30-40: YES (4.0)
AGE = >40
| CREDUT-RATING = FAIR: YES (3.0)
| CREDUT-RATING = EXCELLENT: NO (3.0)
Number of Leaves : 5
Size of the tree : 7
Time taken to build model: 0 seconds
=== Stratified cross-validation ===
=== Summary ===
Correctly Classified Instances 13 86.6667 %
Incorrectly Classified Instances 2 13.3333 %
Kappa statistic 0.7321
Mean absolute error 0.1692
Root mean squared error 0.329
Relative absolute error 33.2913 %
Root relative squared error 64.5658 %
Total Number of Instances 15
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC
Area Class
0.857 0.125 0.857 0.857 0.857 0.732 0.902 0.814 NO
0.875 0.143 0.875 0.875 0.875 0.732 0.902 0.942 YES
Weighted Avg. 0.867 0.135 0.867 0.867 0.867 0.732 0.902 0.882
=== Confusion Matrix ===
a b <-- classified as
6 1 | a = NO
1 7 | b = YES
❖ Under the Result List right click the item to get the options as shown the
figure and select the option “visualize tree” option.
❖ After selecting the” visualize tree” option the output is represented as a tree
in a separate window as shown in the figure given below
Step1: Open Knowledge flow. Load sample training data set using arff loader.
Step2: Select class assigner from evalution, connect it with arff loader using dataset.
Step 3: Select cross validation fold maker Icon from Evaluation tab, connect it with Class
assigner using dataset.
Step4: Select j48 from Classifier tab and connect it with cross validation Fold maker using
test set and training set.
Step 5: Select Classifier performance evaluator from Evaluation tab and connect it with
j48 using batch cluster.
Select 6: graph viewer from visualization and connect it with J48 icon using Graph.
Step7: Run the flow using run icon.
Step8: Right Click the Graph viewer icon and select show results to view the results.
OUTPUT
Result:
Thus the Id3/ j48 classification algorithm is implemented for Student dataset and
the results are generated successfully.
6.Demonstration of classification rule process on dataset employee.arff
using id3 /j48algorithm
Aim:
This experiment illustrates the use of id3 classifier in weka. The sample data set
used in this experiment is “employee”data available at arff format. This document
assumes that appropriate data pre processing has been performed.
Step2: next we select the “classify” tab and click “choose” button to select the
“id3”classifier.
Step3: now we specify the various parameters. These can be specified by clicking in the
text
box to the right of the chose button. In this example, we accept the default values
his
default version does perform some pruning but does not perform error pruning.
Step4: under the “text “options in the main panel. We select the 10-fold cross validation as
our evaluation approach. Since we don’t have separate evaluation data set, this is
necessary to get a reasonable idea of accuracy of generated model.
Step-5: we now click”start”to generate the model .the ASCII version of the tree as well as
evaluation statistic will appear in the right panel when the model construction is
complete.
Step-6: note that the classification accuracy of model is about 69%.this indicates that we
may
find more work. (Either in preprocessing or in selecting current parameters for the
classification)
Step-7: now weka also lets us a view a graphical version of the classification tree. This can
be
done by right clicking the last result set and selecting “visualize tree” from the
pop-up
menu.
Step-8: we will use our model to classify the new instances.
Step-9: In the main panel under “text “options click the “supplied test set” radio button
and
then click the “set” button. This will show pop-up window which will allow you to
open the file containing test instances.
DATASET EMPLOYEE.ARFF:
@relation employee
@attribute age {25, 27, 28, 29, 30, 35, 48} @attribute
salary{10k,15k,17k,20k,25k,30k,35k,32k}
@data
The following output shows the classification rules that were generated when id3
algorithm is applied on the given dataset.
OUTPUT
Number of Leaves : 2
Size of the tree : 3
Time taken to build model: 0.01 seconds
a b c <-- classified as
2 3 0 | a = POOR
1 3 1 | b = AVG
1 1 0 | c = GOOD
Step1: Open Knowledge flow. Load employee.arff training data set using arff loader.
Step2: Select class assigner from evalution, connect it with arff loader using dataset.
Step 3: Select cross validation fold maker Icon from Evaluation tab, connect it with Class
assigner using dataset.
Step4: Select j48 from Classifier tab and connect it with cross validation Fold maker using
test set and training set.
Step 5: Select Classifier performance evaluator from Evaluation tab and connect it with
j48
using batch cluster.
Step 6: Select text viewer from visualization and connect it with Classifier performance
evaluator icon using text.
Step7: Run the flow using run icon.
Step8: Right Click the text viewer icon and select show results to view the results.
KNOWLEDGE FLOW DIAGRAM
OUTPUT
The following screenshot shows the classification rules that were generated when id3
algorithm is applied on the given dataset.
Result:
Thus the Id3/ J48 classification algorithm is implemented for employee Student
dataset and the results are generated successfully
7. Demonstration of classification rule process on dataset labour.arff using Id3
/J48algorithm
Aim:
This experiment illustrates the use of id3 classifier in weka. The sample data set
used in this experiment is “labour”data available at arff format. This document assumes
that appropriate data pre processing has been performed.
Step2: next we select the “classify” tab and click “choose” button to select the
“id3”classifier.
Step3: now we specify the various parameters. These can be specified by clicking in the
text
box to the right of the chose button. In this example, we accept the default values
his
default version does perform some pruning but does not perform error pruning.
Step4: under the “text “options in the main panel. We select the 10-fold cross validation as
our evaluation approach. Since we don’t have separate evaluation data set, this is
necessary to get a reasonable idea of accuracy of generated model.
Step-5: we now click”start”to generate the model .the ASCII version of the tree as well as
evaluation statistic will appear in the right panel when the model construction is
complete.
Step-6: note that the classification accuracy of model is about 69%.this indicates that we
may
find more work. (Either in preprocessing or in selecting current parameters for the
classification)
Step-7: now weka also lets us a view a graphical version of the classification tree. This can
be
done by right clicking the last result set and selecting “visualize tree” from the
pop-up
menu.
Step-9: In the main panel under “text “options click the “supplied test set” radio button
and
then click the “set” button. This will show pop-up window which will allow you
to
open the file containing test instances.
DATASET LABOUR.ARFF
The following output shows the classification rules that were generated when id3
algorithm is applied on the given dataset
OUTPUT
Relation: labor-neg-data
Instances: 57
Attributes: 17
duration
wage-increase-first-year
wage-increase-second-year
wage-increase-third-year
cost-of-living-adjustment
working-hours
pension
standby-pay
shift-differential
education-allowance
statutory-holidays
vacation
longterm-disability-assistance
contribution-to-dental-plan
bereavement-assistance
contribution-to-health-plan
class
------------------
Number of Leaves : 3
Weighted Avg. 0.737 0.280 0.748 0.737 0.740 0.444 0.695 0.675
a b <-- classified as
14 6 | a = bad
9 28 | b = good
VISUALIZE TREE
Step1: Open Knowledge flow. Load labour.arff training data set using arff loader.
Step2: Select class assigner from evalution, connect it with arff loader using dataset.
Step 3: Select cross validation fold maker Icon from Evaluation tab, connect it with Class
assigner using dataset.
Step4: Select j48 from Classifier tab and connect it with cross validation Fold maker using
test set and training set.
Step 5: Select Classifier performance evaluator from Evaluation tab and connect it with
j48 using batch cluster.
Select 6 : graph viewer from visualization and connect it with J48 icon using Graph.
Step7: Run the flow using run icon.
Step8: Right Click the Graph viewer icon and select show results to view the results.
KNOWLEDGE FLOW DIAGRAM
Output
The following screenshot shows the classification rules that were generated when id3
algorithm is applied on the given dataset.
Result:
Thus the ID3/J48 classification algorithm is implemented for labour dataset and
the results are generated successfully
8. Demonstration of clustering rule process on dataset iris.arff using
simple k-means
Aim:
This experiment illustrates the use of simple k-mean clustering with Weka
explorer. The sample data set used for this example is based on the iris data available in
ARFF format. This document assumes that appropriate preprocessing has been performed.
This iris dataset includes 150 instances.
Step 1: Run the Weka explorer and load the data file iris.arff in preprocessing interface.
Step 2: Inorder to perform clustering select the ‘cluster’ tab in the explorer and click on
the
Step 4: Next click in text button to the right of the choose button to get popup window
shown
in the screenshots. In this window we enter six on the number of clusters and we
leave the value of the seed on as it is. The seed value is used in generating a random
number which is used for making the internal assignments of instances of clusters.
Step 5 : Once of the option have been specified. We run the clustering algorithm there we
must make sure that they are in the ‘cluster mode’ panel. The use of training set
option is selected and then we click ‘start’ button. This process and resulting window are
shown in the following screenshots.
Step 6 : The result window shows the centroid of each cluster as well as statistics on the
number and the percent of instances assigned to different clusters. Here clusters centroid
are means vectors for each clusters. This clusters can be used to characterized the
cluster.For eg, the centroid of cluster1 shows the class iris.versicolor mean value of the
sepal length is 5.4706, sepal width 2.4765, petal width 1.1294, petal length 3.7941.
DATA SET
@RELATION iris
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.4,3.7,1.5,0.2,Iris-setosa
4.8,3.4,1.6,0.2,Iris-setosa
4.8,3.0,1.4,0.1,Iris-setosa
4.3,3.0,1.1,0.1,Iris-setosa
5.8,4.0,1.2,0.2,Iris-setosa
5.7,4.4,1.5,0.4,Iris-setosa
5.4,3.9,1.3,0.4,Iris-setosa
5.1,3.5,1.4,0.3,Iris-setosa
5.7,3.8,1.7,0.3,Iris-setosa
5.1,3.8,1.5,0.3,Iris-setosa
5.4,3.4,1.7,0.2,Iris-setosa
5.1,3.7,1.5,0.4,Iris-setosa
4.6,3.6,1.0,0.2,Iris-setosa
5.1,3.3,1.7,0.5,Iris-setosa
4.8,3.4,1.9,0.2,Iris-setosa
5.0,3.0,1.6,0.2,Iris-setosa
5.0,3.4,1.6,0.4,Iris-setosa
5.2,3.5,1.5,0.2,Iris-setosa
5.2,3.4,1.4,0.2,Iris-setosa
4.7,3.2,1.6,0.2,Iris-setosa
4.8,3.1,1.6,0.2,Iris-setosa
5.4,3.4,1.5,0.4,Iris-setosa
5.2,4.1,1.5,0.1,Iris-setosa
5.5,4.2,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
5.0,3.2,1.2,0.2,Iris-setosa
5.5,3.5,1.3,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
4.4,3.0,1.3,0.2,Iris-setosa
5.1,3.4,1.5,0.2,Iris-setosa
5.0,3.5,1.3,0.3,Iris-setosa
4.5,2.3,1.3,0.3,Iris-setosa
4.4,3.2,1.3,0.2,Iris-setosa
5.0,3.5,1.6,0.6,Iris-setosa
5.1,3.8,1.9,0.4,Iris-setosa
4.8,3.0,1.4,0.3,Iris-setosa
5.1,3.8,1.6,0.2,Iris-setosa
4.6,3.2,1.4,0.2,Iris-setosa
5.3,3.7,1.5,0.2,Iris-setosa
5.0,3.3,1.4,0.2,Iris-setosa
7.0,3.2,4.7,1.4,Iris-versicolor
6.4,3.2,4.5,1.5,Iris-versicolor
6.9,3.1,4.9,1.5,Iris-versicolor
5.5,2.3,4.0,1.3,Iris-versicolor
6.5,2.8,4.6,1.5,Iris-versicolor
5.7,2.8,4.5,1.3,Iris-versicolor
6.3,3.3,4.7,1.6,Iris-versicolor
4.9,2.4,3.3,1.0,Iris-versicolor
6.6,2.9,4.6,1.3,Iris-versicolor
5.2,2.7,3.9,1.4,Iris-versicolor
5.0,2.0,3.5,1.0,Iris-versicolor
5.9,3.0,4.2,1.5,Iris-versicolor
6.0,2.2,4.0,1.0,Iris-versicolor
6.1,2.9,4.7,1.4,Iris-versicolor
5.6,2.9,3.6,1.3,Iris-versicolor
6.7,3.1,4.4,1.4,Iris-versicolor
5.6,3.0,4.5,1.5,Iris-versicolor
5.8,2.7,4.1,1.0,Iris-versicolor
6.2,2.2,4.5,1.5,Iris-versicolor
5.6,2.5,3.9,1.1,Iris-versicolor
5.9,3.2,4.8,1.8,Iris-versicolor
6.1,2.8,4.0,1.3,Iris-versicolor
6.3,2.5,4.9,1.5,Iris-versicolor
6.1,2.8,4.7,1.2,Iris-versicolor
6.4,2.9,4.3,1.3,Iris-versicolor
6.6,3.0,4.4,1.4,Iris-versicolor
6.8,2.8,4.8,1.4,Iris-versicolor
6.7,3.0,5.0,1.7,Iris-versicolor
6.0,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1.0,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1.0,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor
6.0,2.7,5.1,1.6,Iris-versicolor
5.4,3.0,4.5,1.5,Iris-versicolor
6.0,3.4,4.5,1.6,Iris-versicolor
6.7,3.1,4.7,1.5,Iris-versicolor
6.3,2.3,4.4,1.3,Iris-versicolor
5.6,3.0,4.1,1.3,Iris-versicolor
5.5,2.5,4.0,1.3,Iris-versicolor
5.5,2.6,4.4,1.2,Iris-versicolor
6.1,3.0,4.6,1.4,Iris-versicolor
5.8,2.6,4.0,1.2,Iris-versicolor
5.0,2.3,3.3,1.0,Iris-versicolor
5.6,2.7,4.2,1.3,Iris-versicolor
5.7,3.0,4.2,1.2,Iris-versicolor
5.7,2.9,4.2,1.3,Iris-versicolor
6.2,2.9,4.3,1.3,Iris-versicolor
5.1,2.5,3.0,1.1,Iris-versicolor
5.7,2.8,4.1,1.3,Iris-versicolor
6.3,3.3,6.0,2.5,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
7.1,3.0,5.9,2.1,Iris-virginica
6.3,2.9,5.6,1.8,Iris-virginica
6.5,3.0,5.8,2.2,Iris-virginica
7.6,3.0,6.6,2.1,Iris-virginica
4.9,2.5,4.5,1.7,Iris-virginica
7.3,2.9,6.3,1.8,Iris-virginica
6.7,2.5,5.8,1.8,Iris-virginica
7.2,3.6,6.1,2.5,Iris-virginica
6.5,3.2,5.1,2.0,Iris-virginica
6.4,2.7,5.3,1.9,Iris-virginica
6.8,3.0,5.5,2.1,Iris-virginica
5.7,2.5,5.0,2.0,Iris-virginica
5.8,2.8,5.1,2.4,Iris-virginica
6.4,3.2,5.3,2.3,Iris-virginica
6.5,3.0,5.5,1.8,Iris-virginica
7.7,3.8,6.7,2.2,Iris-virginica
7.7,2.6,6.9,2.3,Iris-virginica
6.0,2.2,5.0,1.5,Iris-virginica
6.9,3.2,5.7,2.3,Iris-virginica
5.6,2.8,4.9,2.0,Iris-virginica
7.7,2.8,6.7,2.0,Iris-virginica
6.3,2.7,4.9,1.8,Iris-virginica
6.7,3.3,5.7,2.1,Iris-virginica
7.2,3.2,6.0,1.8,Iris-virginica
6.2,2.8,4.8,1.8,Iris-virginica
6.1,3.0,4.9,1.8,Iris-virginica
6.4,2.8,5.6,2.1,Iris-virginica
7.2,3.0,5.8,1.6,Iris-virginica
7.4,2.8,6.1,1.9,Iris-virginica
7.9,3.8,6.4,2.0,Iris-virginica
6.4,2.8,5.6,2.2,Iris-virginica
6.3,2.8,5.1,1.5,Iris-virginica
6.1,2.6,5.6,1.4,Iris-virginica
7.7,3.0,6.1,2.3,Iris-virginica
6.3,3.4,5.6,2.4,Iris-virginica
6.4,3.1,5.5,1.8,Iris-virginica
6.0,3.0,4.8,1.8,Iris-virginica
6.9,3.1,5.4,2.1,Iris-virginica
6.7,3.1,5.6,2.4,Iris-virginica
6.9,3.1,5.1,2.3,Iris-virginica
5.8,2.7,5.1,1.9,Iris-virginica
6.8,3.2,5.9,2.3,Iris-virginica
6.7,3.3,5.7,2.5,Iris-virginica
6.7,3.0,5.2,2.3,Iris-virginica
6.3,2.5,5.0,1.9,Iris-virginica
6.5,3.0,5.2,2.0,Iris-virginica
6.2,3.4,5.4,2.3,Iris-virginica
5.9,3.0,5.1,1.8,Iris-virginica
%
The following output shows the clustering rules that were generated when simple k means
algorithm is applied on the given dataset.
OUTP
=== Run information ===
Relation: iris
Instances: 150
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
class
kMeans
======
Number of iterations: 7
Cluster 0: 6.1,2.9,4.7,1.4,Iris-versicolor
Cluster 1: 6.2,2.9,4.3,1.3,Iris-versicolor
Cluster#
===========================================
Clustered Instances
0 100 ( 67%)
1 50 ( 33%)
From the above visualization, we can understand the distribution of sepal length and petal
length in each cluster. For instance, for each cluster is dominated by petal length. In this
case by changing the color dimension to other attributes we can see their distribution with
in each of the cluster.
Step 8: We can assure that resulting dataset which included each instance along with its
assign cluster. To do so we click the save button in the visualization window and save the
result iris k-mean .The top portion of this file is shown in the following figure
Procedure ( Using Knowledge Flow)
Step1: Open Knowledge flow. Load sample training data set using arff loader.
Step2: Select cross validation fold maker Icon from Evaluation tab, connect it with arff
loader using dataset.
Step3: Select Simlpe K Means from cluster tab and connect it with cross validation Fold
maker using test set and training set.
Step 4: Select Cluster performance evaluator from Evaluation tab and connect it with
Simlpe K Means using batch cluster.
Step 5: Select text viewer from visualization and connect it with Cluster performance
evaluator icon using text.
Step6: Run the flow using run icon.
Step5: Right Click the text viewer icon and select show results to view the results.
Knowledge Flow Diagram
OUTPUT
=== Evaluation result for training instances ===
Relation: iris
kMeans
======
Number of iterations: 2
Cluster 0: 5.5,2.4,3.8,1.1,Iris-versicolor
Cluster 1: 7.7,3.8,6.7,2.2,Iris-virginica
Cluster#
================================================================
Clustered Instances
0 88 ( 65%)
1 47 ( 35%)
Result:
Thus the Simple K Means clustering algorithm is implemented for iris dataset
and the Clusters are generated successfully.
9.Demonstration of clustering rule process on dataset student.arff using
simple k-means
Aim:
This experiment illustrates the use of simple k-mean clustering with Weka
explorer. The sample data set used for this example is based on the student data available
in ARFF format. This document assumes that appropriate preprocessing has been
performed. This istudent dataset includes 14 instances.
in the screenshots. In this window we enter six on the number of clusters and we
leave the value of the seed on as it is. The seed value is used in generating a random
number which is used for making the internal assignments of instances of clusters.
Step 5 : Once of the option have been specified. We run the clustering algorithm there we
must make sure that they are in the ‘cluster mode’ panel. The use of training set
option is selected and then we click ‘start’ button. This process and resulting window are
shown in the following screenshots.
Step 6 : The result window shows the centroid of each cluster as well as statistics on the
number and the percent of instances assigned to different clusters. Here clusters centroid
are means vectors for each clusters. This clusters can be used to characterized the cluster.
can do this, try right clicking the result set on the result. List panel and selecting the
visualize cluster assignments.
Dataset student .arff
@relation student
@attribute age {<30,30-40,>40}
@attribute income {low,medium,high}
@attribute student {yes,no}
@attribute credit-rating {fair,excellent}
@attribute buyspc {yes,no}
@data
%
<30, high, no, fair, no
<30, high, no, excellent, no
30-40, high, no, fair, yes
>40, medium, no, fair, yes
>40, low, yes, fair, yes
>40, low, yes, excellent, no
30-40, low, yes, excellent, yes
<30, medium, no, fair, no
<30, low, yes, fair, no
>40, medium, yes, fair, yes
<30, medium, yes, excellent, yes
30-40, medium, no, excellent, yes
30-40, high, yes, fair, yes
>40, medium, no, excellent, no
%
The following output shows the clustering rules that were generated when simple k-means
algorithm is applied on the given dataset.
OUTPUT
Relation: STUDENT1
Instances: 14
Attributes: 5
age
income
student
credit-rating
buyspc
kMeans
======
Number of iterations: 3
Cluster 0: >40,medium,yes,fair,yes
Cluster 1: 30-40,low,yes,excellent,yes
Cluster#
=============================================
student no no no
Clustered Instances
0 10 ( 71%)
1 4 ( 29%)
Step 8: We can assure that resulting dataset which included each instance along with its
assign cluster. To do so we click the save button in the visualization window and save the
result student k-mean .The top portion of this file is shown in the following figure.
Step1: Open Knowledge flow. Load sample training data set using arff loader.
Step2: Select cross validation fold maker Icon from Evaluation tab, connect it with arff loader
using dataset.
Step3: Select Simlpe K Means from cluster tab and connect it with cross validation Fold
maker using test set and training set.
Step 4: Select Cluster performance evaluator from Evaluation tab and connect it with Simlpe
K Means using batch cluster.
Step 5: Select text viewer from visualization and connect it with Cluster performance
evaluator icon using text.
Step6: Run the flow using run icon.
Step5: Right Click the text viewer icon and select show results to view the results.
STUDENT.ARRF DATA SET
credit-
age income student rating buyspc
<30 high no fair no
<30 high no excellent no
30-40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
30-40 low yes excellent yes
<30 medium no fair no
<30 low yes fair no
>40 medium yes fair yes
<30 medium yes excellent yes
30-40 medium no excellent yes
30-40 high yes fair yes
>40 medium no excellent no
Relation: STUDENT1
kMeans
======
Number of iterations: 3
Cluster 0: <30,high,no,excellent,no
Cluster 1: 30-40,medium,no,excellent,yes
Cluster#
================================================
student no no yes
0 5 ( 42%)
1 7 ( 58%)
Result:
Thus the Simple K Means clustering algorithm is implemented for iris dataset and the
Clusters are generated successfully
10. CREATION OF A DATA WAREHOUSE USING POSTGRESSQL
Aim:
To create a data warehouse using postgresSQL and perform operations.
Procedure:
#1
#2
create table dw.phonerate(
ID_phoneRate INTEGER NOT NULL,
phoneRateType VARCHAR(20) NOT NULL,
PRIMARY KEY(ID_phoneRate)
);
#3
create table dw.location(
ID_location INTEGER NOT NULL,
City VARCHAR(20) NOT NULL,
Province CHAR(20) NOT NULL,
Region CHAR(20) NOT NULL,
PRIMARY KEY(ID_location)
);
#4
create table dw.facts(
ID_time INTEGER NOT NULL,
ID_phoneRate INTEGER NOT NULL,
ID_location_Caller INTEGER NOT NULL,
ID_location_Receiver INTEGER NOT NULL,
Price FLOAT NOT NULL,
NumberOfCalls INTEGER NOT NULL,
PRIMARY KEY(ID_time,ID_phoneRate,ID_location_Caller,ID_location_Receiver),
FOREIGN KEY(ID_time) REFERENCES dw.timedim(ID_time),
FOREIGN KEY(ID_phoneRate) REFERENCES dw.phonerate(ID_phoneRate),
FOREIGN KEY(ID_location_Caller) REFERENCES dw.location(ID_location),
FOREIGN KEY(ID_location_Receiver) REFERENCES dw.location(ID_location)
);
Operations:
#1 Select the yearly income for each phone rate, the total income for each phone rate, and the
total yearly income.
------
select sum(Price), dateYear, phoneRateType
from dw.facts F, dw.timedim T, dw.phonerate P
where F.Id_time = T.Id_time and F.Id_phoneRate = P.Id_phoneRate
group by cube(phoneRateType, dateYear)
#2 Select the monthly number of calls and the monthly income. Associate the RANK() to
each month according to its income (1 for the month with the highest income, 2 for the
second, etc., the last month is the one with the least income).
------
SELECT DateMonth,DateYear, SUM(NumberOfCalls) as TotNumOfCalls,
SUM(price) as totalIncome,
RANK() over (ORDER BY SUM(price) DESC) as RankIncome
FROM dw.facts F, dw.timedim Te
WHERE F.id_time=Te.id_time
GROUP BY DateMonth,DateYear;
#3 For each month in 2003, select the total number of calls. Associate the RANK() to each
month according to its total number of calls (1 for the month with the highest number of calls,
2 for the second, etc., the last month is the one with the least number of calls).
------
SELECT DateMonth, SUM(NumberOfCalls) as TotNumOfCalls,
RANK() over (ORDER BY SUM(NumberOfCalls) DESC) as RankNumOfCalls
FROM dw.facts F, dw.timedim Te
WHERE F.id_time=Te.id_time
AND DateYear=2003
GROUP BY DateMonth;
Result:
Thus the data warehouse has been created and some operations are performed using
postgres.