Data Mining Lab
Data Mining Lab
1. Create an Employee Table with the help of Data Mining Tool WEKA. 2
2. Create a Weather Table with the help of Data Mining Tool WEKA. 4
1
EXPERIMENT NO: 1
Aim:
Create an Employee Table with the help of Data Mining Tool WEKA.
Description:
We need to create an Employee Table with training data set which includes attributes like name, id, salary,
experience, gender, phone number.
Procedure:
Steps:
@relation employee
@attribute name {x,y,z,a,b}
@attribute id numeric
@attribute salary {low,medium,high}
@attribute exp numeric
@attribute gender {male,female}
@attribute phone numeric
@data
x,101,low,2,male,250311
y,102,high,3,female,251665
z,103,medium,1,male,240238
a,104,low,5,female,200200
b,105,high,2,male,240240
2
Training Data Set Weather Table
Result:
3
EXPERIMENT NO:2
Aim:
Create a Weather Table with the help of Data Mining Tool WEKA.
Description:
We need to create a Weather table with training data set which includes attributes like outlook, temperature,
humidity, windy, play.
Procedure:
Steps:
@relation weather
@attribute outlook {sunny,rainy,overcast}
@attribute temparature numeric
@attribute humidity numeric
@attribute windy {true,false}
@attribute play {yes,no}
@data
sunny,85.0,85.0,false,no
overcast,80.0,90.0,true,no
sunny,83.0,86.0,false,yes
rainy,70.0,86.0,false,yes
rainy,68.0,80.0,false,yes
rainy,65.0,70.0,true,no
overcast,64.0,65.0,false,yes
sunny,72.0,95.0,true,no
sunny,69.0,70.0,false,yes
rainy,75.0,80.0,false,yes
4
Training Data Set Weather Table
Result:
5
EXPERIMENT NO:3
Aim:
Description:
Real world databases are highly influenced to noise, missing and inconsistency due to their queue size so the
data can be pre-processed to improve the quality of data and missing results and it also improves the efficiency.
1) Add
2) Remove
3) Normalization
Procedure:
@relation weather
@attribute outlook {sunny,rainy,overcast}
@attribute temparature numeric
@attribute humidity numeric
@attribute windy {true,false}
@attribute play {yes,no}
@data
sunny,85.0,85.0,false,no
overcast,80.0,90.0,true,no
sunny,83.0,86.0,false,yes
rainy,70.0,86.0,false,yes
rainy,68.0,80.0,false,yes
rainy,65.0,70.0,true,no
overcast,64.0,65.0,false,yes
sunny,72.0,95.0,true,no
sunny,69.0,70.0,false,yes
rainy,75.0,80.0,false,yes
3) After that the file is saved with .arff file format.
4) Minimize the arff file and then open Start Programs weka-3-4.
5) Click on weka-3-4, then Weka dialog box is displayed on the screen.
6) In that dialog box there are four modes, click on explorer.
7) Explorer shows many options. In that click on ‘open file’ and select the arff file
8) Click on edit button which shows weather table on weka.
6
Add Pre-Processing Technique:
Procedure:
7
Remove Pre-Processing Technique:
Procedure:
8
Weather Table after removing attributes WINDY, PLAY:
Procedure:
9
Result:
10
EXPERIMENT NO:4
Aim:
Description:
Real world databases are highly influenced to noise, missing and inconsistency due to their queue size so the
data can be pre-processed to improve the quality of data and missing results and it also improves the efficiency.
1) Add
2) Remove
3) Normalization
Procedure:
@relation employee
@attribute name {x,y,z,a,b}
@attribute id numeric
@attribute salary {low,medium,high}
@attribute exp numeric
@attribute gender {male,female}
@attribute phone numeric
@data
x,101,low,2,male,250311
y,102,high,3,female,251665
z,103,medium,1,male,240238
a,104,low,5,female,200200
b,105,high,2,male,240240
11
Training Data Set Employee Table
Procedure:
12
Employee Table after adding new attribute ADDRESS:
Procedure:
13
Employee Table after removing attributes SALARY, GENDER:
Procedure:
14
Employee Table after Normalizing ID, EXP, PHONE:
Result:
15
EXPERIMENT NO:5
Aim:
Description:
The knowledge flow provides an alternative way to the explorer as a graphical front end to WEKA’s
algorithm. Knowledge flow is a working progress. So, some of the functionality from explorer is not yet available. So,
on the other hand there are the things that can be done in knowledge flow, but not in explorer. Knowledge flow
presents a dataflow interface to WEKA. The user can select WEKA components from a toolbar placed them on a
layout campus and connect them together in order to form a knowledge flow for processing and analyzing the data.
Procedure:
@relation weather
@attribute outlook {sunny,rainy,overcast}
@attribute temparature numeric
@attribute humidity numeric
@attribute windy {true,false}
@attribute play {yes,no}
@data
sunny,85.0,85.0,false,no
overcast,80.0,90.0,true,no
sunny,83.0,86.0,false,yes
rainy,70.0,86.0,false,yes
rainy,68.0,80.0,false,yes
rainy,65.0,70.0,true,no
overcast,64.0,65.0,false,yes
sunny,72.0,95.0,true,no
sunny,69.0,70.0,false,yes
rainy,75.0,80.0,false,yes
3) After that the file is saved with .arff file format.
4) Minimize the arff file and then open Start Programs weka-3-4.
5) Click on weka-3-4, then Weka dialog box is displayed on the screen.
6) In that dialog box there are four modes, click on explorer.
7) Explorer shows many options. In that click on ‘open file’ and select the arff file
8) Click on edit button which shows Weather table on weka.
16
Output:
18
EXPERIMENT NO:6
Aim:
Description:
The knowledge flow provides an alternative way to the explorer as a graphical front end to WEKA’s
algorithm. Knowledge flow is a working progress. So, some of the functionality from explorer is not yet available. So,
on the other hand there are the things that can be done in knowledge flow, but not in explorer. Knowledge flow
presents a dataflow interface to WEKA. The user can select WEKA components from a toolbar placed them on a
layout campus and connect them together in order to form a knowledge flow for processing and analyzing the data.
Procedure:
19
Output:
20
13) Check whether output is created or not by selecting the preferred path.
14) Rename the data name as a.arff
15) Double click on a.arff then automatically the output will be opened in MS-Excel.
Result:
21
EXPERIMENT NO:7
Description:
In data mining, association rule learning is a popular and well researched method for discovering interesting
relations between variables in large databases. It can be described as analyzing and presenting strong rules discovered
in databases using different measures of interestingness. In market basket analysis association rules are used and they
are also employed in many application areas including Web usage mining, intrusion detection and bioinformatics.
Procedure:
Output:
22
Procedure for Association Rules:
23
Result:
24
EXPERIMENT NO:8
Description:
In data mining, association rule learning is a popular and well researched method for discovering interesting
relations between variables in large databases. It can be described as analyzing and presenting strong rules discovered
in databases using different measures of interestingness. In market basket analysis association rules are used and they
are also employed in many application areas including Web usage mining, intrusion detection and bioinformatics.
Procedure:
25
Procedure for Association Rules:
26
Output:
Result:
27
EXPERIMENT NO:9
Description:
In data mining, association rule learning is a popular and well researched method for discovering interesting
relations between variables in large databases. It can be described as analyzing and presenting strong rules discovered
in databases using different measures of interestingness. In market basket analysis association rules are used and they
are also employed in many application areas including Web usage mining, intrusion detection and bioinformatics.
Procedure:
@data
youth, high, A
youth,medium,B
youth, low, C
middle, low, C
middle, medium, C
middle, high, A
senior, low, C
senior, medium, B
senior, high, B
middle, high, B
28
Training Data Set Employee Table
29
Output:
Result:
30
EXPERIMENT NO:10
Aim:
Description:
Classification is the process for finding a model that describes the data values and concepts for the
purpose of Prediction.
Decision Tree:
A decision Tree is a classification scheme to generate a tree consisting of root node, internal nodes
and external nodes.
Root nodes representing the attributes. Internal nodes are also the attributes. External nodes are the
classes and each branch represents the values of the attributes
Decision Tree also contains set of rules for a given data set; there are two subsets in Decision Tree.
One is a Training data set and second one is a Testing data set. Training data set is previously classified data.
Testing data set is newly generated data.
Procedure:
@data
sunny,85,85,FALSE,no
sunny,80,90,TRUE,no
overcast,83,86,FALSE,yes
rainy,70,96,FALSE,yes
rainy,68,80,FALSE,yes
rainy,65,70,TRUE,no
overcast,64,65,TRUE,yes
sunny,72,95,FALSE,no
sunny,69,70,FALSE,yes
rainy,75,80,FALSE,yes
sunny,75,70,TRUE,yes
overcast,72,90,TRUE,yes
overcast,81,75,FALSE,yes
rainy,71,91,TRUE,no
31
5) Click on weka-3-4, then Weka dialog box is displayed on the screen.
6) In that dialog box there are four modes, click on explorer.
7) Explorer shows many options. In that click on ‘open file’ and select the arff file
8) Click on edit button which shows weather table on weka.
32
Output:
Decision Tree:
33