0% found this document useful (0 votes)
35 views53 pages

DMDW LAB NEW - Merged

This document provides steps to summarize an experiment demonstrating the Weka data mining tool. The experiment involves downloading and installing Weka, analyzing attributes of iris and weather datasets, performing data preprocessing tasks like discretization and resampling on iris data, performing classification and clustering on iris data using algorithms like J48, Naive Bayes, and k-means clustering, and comparing classification results of different algorithms. It also demonstrates creating a knowledge flow application on Weka and analyzing results.

Uploaded by

jaswanthch16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views53 pages

DMDW LAB NEW - Merged

This document provides steps to summarize an experiment demonstrating the Weka data mining tool. The experiment involves downloading and installing Weka, analyzing attributes of iris and weather datasets, performing data preprocessing tasks like discretization and resampling on iris data, performing classification and clustering on iris data using algorithms like J48, Naive Bayes, and k-means clustering, and comparing classification results of different algorithms. It also demonstrates creating a knowledge flow application on Weka and analyzing results.

Uploaded by

jaswanthch16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Experiment - 1

1)Downloading and Installing WEKA.

Follow the below steps to install Weka on Windows:


Step 1: Visit this website using any web browser. Click on Free Download.

Step 2: It will redirect to a new webpage, click on Start Download.


Downloading of the executable file will start shortly. It is a big 118 MB file
that will take some minutes.

Step 3: Now check for the executable file in downloads in your system and
run it.
Step 4: It will prompt confirmation to make changes to your system. Click on
Yes.

Step 5: Setup screen will appear, click on Next.


Step 6: The next screen will be of License Agreement, click on I Agree.

Step 7: Next screen is of choosing components, all components are already


marked so don’t change anything just click on the Install button.
Step 8: The next screen will be of installing location so choose the drive
which will have sufficient memory space for installation. It needed a memory
space of 301 MB.

Step 9: Next screen will be of choosing the Start menu folder so don’t do
anything just click on Install Button.
Step 10: After this installation process will start and will hardly take a minute
to complete the installation.

Step 11: Click on the Next button after the installation process is complete.
Step 12: Click on Finish to finish the installation process.

Step 13: Weka is successfully installed on the system and an icon is created
on the desktop.
Step 14: Run the software and see the interface.

Congratulations!! At this point, you have successfully installed Weka on your


windows system.
Experiment – 2
2)Write down number of attributes with selected attributes draw the histogram for the
corresponding attribute for iris and weather data sets.
Step-1 Open WEKA and then go to workbench and click on OPEN FILE. In files click on DATA
and click on IRIS and open.
Step-2 Note down details of SEPAL LENGTH,SEPAL WIDTH,PETAL LENGTH,PETAL WIDTH
AND CLASS.
 SEPAL LENGTH

 SEPAL WIDTH
 PETAL LENGTH

 PETAL WIDTH
 CLASS

Step-3 Do same for the WETHER DATA. Note down the details of OUT
LOOK,TEMPERATURE,HUMIDITY,WINDY AND PLAY.

 OUT LOOK
 TEMPERATURE

 HUMIDITY
 WINDY

 PLAY
EXPERIMENT – 3
3) Perform data prepocessing task and demonstrate performing filters discretize and
resamples on iris.

Step-1 Open WEKA and then go to workbench and click on OPEN FILE. In files click on
DATA and click on IRIS and open.

Step-2 Now click on the choose filter and select UNSUPERVISED and then select
ATTRIBUTE. In attribute select DISCRETIZE.

Step-3 Note down details of SEPAL LENGTH,SEPAL WIDTH,PETAL LENGTH,PETAL WIDTH


AND CLASS.

 SEPAL LENGTH
 SEPAL WIDTH

 PETAL LENGTH
 PETAL WIDTH

 CLASS
Step-4 Now select the INSTANCE and then select RESAMPLE.
Step-5 Note down details of SEPAL LENGTH,SEPAL WIDTH,PETAL LENGTH,PETAL WIDTH
AND CLASS.
 SEPAL LENGTH

 SEPAL WIDTH
 PETAL LENGTH

 PETAL WIDTH
 CLASS
Experiment -4&5
Demonstrate performing classification & clustering of data sets
Step-1
Open WEKA and then go to workbench and click on OPEN FILE. In files click on DATA and
click on IRIS and open.

Step-2
Now click on the FILTERS and select UNSUPERVISED and in that select ATTRIBUTE. In
attribute select DISCRETIZE and click on apply.

Step-3
Now go to ASSOCIATE and choose APRIORI and click on start. In this apriori change the
different values on “SUPPORT” and “CONFIDENCE”.

 Support : 0.1 and Confidence : 0.9


 Support : 0.2 and Confidence : 0.8
 Support : 0.3 and Confidence : 0.7
We have to stop until we get “NO large itemsets and rules found !”.
Experiment -6
Demonstrate performing classification on the data sets load the data sets
into weka and run J48 classification algorithm. Compute kappa statistics
observe confusion matrix and draw visualization tree.

Step-1
Open WEKA and then go to workbench and click on OPEN FILE. In files click on DATA and
click on IRIS and open.

Step-2
Now click on the FILTERS and select UNSUPERVISED and in that select ATTRIBUTE. In
attribute select DISCRETIZE and click on apply.

Step-3
Now select CLASSIFY. In classify select choose in that select TREES and select J48 and start.

Instances: 150
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
class
Test mode: 10-fold cross-validation

Number of Leaves : 10

Size of the tree : 11


Step-4
Right click on the tree-j48 there you will find a window in that click on “VISUALIZE TREE”.
Experiment – 7
7) Draw the comparision graph for the values of kappa statistics, Mean
absolute error (MAE), Root mean squared error (RMSE), Relative absolute
error (RAE), Root relative squared error(RRSE) with the 10,15,20,25,30 folds.
Step-1 Open WEKA and then go to workbench and click on OPEN FILE. In files click on DATA
and click on IRIS and open.

Step-2 Now click on the FILTERS and select UNSUPERVISED and in that select ATTRIBUTE.
In attribute select DISCRETIZE and click on apply.

Step-3 Now click on classify and apply values for FOLDS and draw the graphs for the
following.

KAPPA STATISTIC (KS)


MEAN ABSOLUTE ERROR (MAE)
ROOT MEAN SQUARED ERROR (RMSE)
RELATIVE ABSOLUTE ERROR (RAE)
ROOT RELATIVE SQUARED ERROR (RRSE)

FOLDS KS MAE RMSE RAE RRSE


10 0.94 0.0489 0.1637 10.99 34.72
15 0.94 0.0482 0.1616 10.83 34.24
20 0.93 0.0485 0.1619 10.89 34.31
25 0.94 0.0491 0.1628 11.04 34.52
30 0.94 0.0489 0.1638 10.98 34.70

=== Graph for KAPPA STATISTIC ===


=== Graph for MEAN ABSOLUTE ERROR ===

=== Graph for ROOT MEAN SQUARED ERROR ===

=== Graph for RELATIVE ABSOLUTE ERROR ===


=== Graph for ROOT RELATIVE SQUARED ERROR ===
Experiment – 8
8) Demonstrate performing classification on IRIS datasets compare
classification results J48,Navie Bayes and IBK classifier and draw the
comparision graph for the classification statistics.

Step-1 Open WEKA and then go to workbench and click on OPEN FILE. In files click on
DATA and click on IRIS and open.

Step-2 Now click on the FILTERS and select UNSUPERVISED and in that select ATTRIBUTE.
In attribute select DISCRETIZE and click on apply.

Step-3 Now click on CLASSIFY and click on choose and select BAYES in it and select NAVIE
BAYES and click on start and note down the values.
Step-4 Now select choose and select lazy classifier and select IBK and click start and note
down the values.

Step – 5 Select TREES classifier and select J48 and click on start and note down the values.
KS MAE RMSE RAE RRSE
Naïve 0.92 0.0532 0.1744 11.9604 37.0028
Bayes
IBK 0.88 0.0611 0.2077 13.7373 44.0545
J48 0.94 0.0489 0.1637 10.9981 34.7274

=== Graph for KAPPA STATISTIC ===

=== Graph for MEAN ABSOLUTE ERROR ===


=== Graph for ROOT MEAN SQUARED ERROR ===

=== Graph for RELATIVE ABSOLUTE ERROR ===

=== Graph for ROOT RELATIVE SQUARED ERROR ===


Experiment – 9
9)Demonstrate performing clustering of iris data set load data set into weka
and run simple kmeans clustering algorithm with different values of k.
 Study the cluster from
 Observe the sum of squared errors and centroids.

Step-1 Open WEKA and then go to workbench and click on OPEN FILE. In files click on DATA
and click on IRIS and open.
Step-2 Now click on the FILTERS and select UNSUPERVISED and in that select ATTRIBUTE. In
attribute select DISCRETIZE and click on apply.
Step-3 Now click on CLUSTER and click choose and select clusterers in clusterers select
Simple K means.click on start.
Step-4 Note down SIMPLE K MEANS values at 2 no . of clusters up to 4 cluster.
FOR CLUSTER-2
FOR CLUSTER-3
FOR CLUSTER-4
DEPARTMENT OF CSE RISE KRISHNA SAI PRAKASAM INSTITUTION :: ONGOLE

Experiment – 10
10) Demonstrate knowledge flow application on data sets.

Step-1

Open WEKA and then click on KNOWLEDGE FLOW .


Step-2
In Knowledge flow there are some DESIGN in that first select DATA SOURCES . In data
sources click on the ATFFLOADER and place it on the screen.
Step-3
Now click on EVALUATION. In Evaluation click on CLASS ASSIGNER and place it on the
screen.
Step-4
Give the connection between ARFFLOADER and CLASS ASSIGNER by right clicking on
Arffloader we can find a window in that select the DATA SET and connect both.
Step-5
Now click on EVALUATION. In Evaluation click on CROSSVALIDATION FOLDMAKER and
place it on the screen. Now give the connection between Class Assigner and Crossvalidation
Foldmaker as DATA SET.
Step-6
Now click on CLASSIFIERS in that TREES then J48 and place it on the screen. Now give
connection between Crossvalidation Foldmaker and J48 as TRAINING SET and TEST SET.
Step-7
Now click on EVALUATION. In Evaluation click on CLASSIFIER PERFORMANCE and place it on
the screen. Now give the connection between J48 and Classfier Performance as BATCH
CLASSIFIER.
Step-8
Now click on VISUALIZATION. In Visualization click on TEXT VIEWER and place it on the
screen. Now give the connection between J48 and Classifier Performance and Text Viewer
as TEXT.

DATA WAREHOUSING AND DATA MINING LAB PAGE NO:


DEPARTMENT OF CSE RISE KRISHNA SAI PRAKASAM INSTITUTION :: ONGOLE

Step-9
Now click on the PLAY BUTTON to run the Knowledge Flow Data.

DATA WAREHOUSING AND DATA MINING LAB PAGE NO:


DEPARTMENT OF CSE RISE KRISHNA SAI PRAKASAM INSTITUTION :: ONGOLE

Step-10
Right click on the TEXT VIEWER and select SHOW RESULLTS.

DATA WAREHOUSING AND DATA MINING LAB PAGE NO:


DEPARTMENMT OF CSE RISE KRISHNA SAI PRAKASAM INSTITUTION :: ONGOLE

12.Write a java program for to perform apriori algorithm

Source code:
import java.io.*;

import java.util.*;

public class extends Observable {

public static void main(String[] args) throws Exception {

Apriori ap = new Apriori(args);

private List<int[]> itemsets ; private String


transaFile;

private int numItems; private int


numTransactions;

private double minSup;

private boolean usedAsLibrary = false;

public Apriori(String[] args, Observer ob)


throws Exception

usedAsLibrary = true;
configure(args);
this.addObserver(ob);
go();

}
public Apriori(String[] args) throws Exception

DATA WAREHOUSING AND DATA MINING LAB PAGE NO:


DEPARTMENMT OF CSE RISE KRISHNA SAI PRAKASAM INSTITUTION :: ONGOLE

{
configure(args);go();

private void go() throws Exception {

long start = System.currentTimeMillis();

createItemsetsOfSize1(); int
itemsetNumber=1;

int nbFrequentSets=0;
while (itemsets.size()>0)

calculateFrequentItemsets();

if(itemsets.size()!=0)

nbFrequentSets+=itemsets.size();

log("Found "+itemsets.size()+" frequent


itemsets of size " + itemsetNumber + " (with support
"+(minSup*100)+"%)");;

createNewItemsetsFromPreviousOnes();

itemsetNumber++;

long end = System.currentTimeMillis();

DATA WAREHOUSING AND DATA MINING LAB PAGE NO:


DEPARTMENMT OF CSE RISE KRISHNA SAI PRAKASAM INSTITUTION :: ONGOLE

log("Execution time is: "+((double)(end-


start)/1000) + " seconds.");
log("Found "+nbFrequentSets+ " frequents sets for
support "+(minSup*100)+"% (absolute
"+Math.round(numTransactions*minSup)+")");

log("Done");

private void foundFrequentItemSet(int[] itemset, int


support) {

if (usedAsLibrary) { this.setChanged();
notifyObservers(itemset);

else
{System.out.println(Arrays.toString(itemset) + " ("+
((support / (double) numTransactions))+" "+support+")");}

private void log(String message) { if


(!usedAsLibrary) {

System.err.println(message);

}
}

private void configure(String[] args) throws Exception

if (args.length!=0) transaFile = args[0]; else


transaFile = "chess.dat"; // default

DATA WAREHOUSING AND DATA MINING LAB PAGE NO:


DEPARTMENMT OF CSE RISE KRISHNA SAI PRAKASAM INSTITUTION :: ONGOLE

if (args.length>=2)
minSup=(Double.valueOf(args[1]).doubleValue());

else minSup = .8;// by default

if (minSup>1 || minSup<0) throw new


Exception("minSup: bad value");

numItems = 0;

numTransactions=0; BufferedReader

data_in = new
BufferedReader(new FileReader(transaFile));
while (data_in.ready()) {

String line=data_in.readLine();

if (line.matches("\\s*")) continue;
numTransactions++;
StringTokenizer t = new
StringTokenizer(line," ");

while (t.hasMoreTokens()) {

int x = Integer.parseInt(t.nextToken());

//log(x);

if (x+1>numItems) numItems=x+1;

DATA WAREHOUSING AND DATA MINING LAB PAGE NO:


DEPARTMENMT OF CSE RISE KRISHNA SAI PRAKASAM INSTITUTION :: ONGOLE

outputConfig();

private void outputConfig() {

//output config info to the user log("Input

configuration: "+numItems+"
items, "+numTransactions+" transactions, ");
log("minsup = "+minSup*100+"%");

}
private void createItemsetsOfSize1() { itemsets
= new ArrayList<int[]>(); for(int i=0;
i<numItems; i++)

int[] cand = {i}; itemsets.add(cand);


}

private void createNewItemsetsFromPreviousOnes()

int currentSizeOfItemsets =
itemsets.get(0).length;

log("Creating itemsets of size


"+(currentSizeOfItemsets+1)+" based on
"+itemsets.size()+" itemsets of size

DATA WAREHOUSING AND DATA MINING LAB PAGE NO:


DEPARTMENMT OF CSE RISE KRISHNA SAI PRAKASAM INSTITUTION :: ONGOLE

"+currentSizeOfItemsets);
HashMap<String, int[]>
tempCandidates = newHashMap<String, int[]>();
//temporary candidates

for(int i=0; i<itemsets.size(); i++)

for(int j=i+1; j<itemsets.size(); j++)

int[] X = itemsets.get(i);int[]
Y = itemsets.get(j);

assert (X.length==Y.length);

int [] newCand = newint[currentSizeOfItemsets+1];

for(int s=0; s<newCand.length-1;s++)


{
newCand[s] = X[s];

int ndifferent = 0;
for(int s1=0; s1<Y.length; s1++)

{
{
boolean found = false;

for(int s2=0; s2<X.length;

if (X[s2]==Y[s1]) {

found = true;break;
}

DATA WAREHOUSING AND DATA MINING LAB PAGE NO:


DEPARTMENMT OF CSE RISE KRISHNA SAI PRAKASAM INSTITUTION :: ONGOLE

(!found){
different++;

newcand=len[newcand-1]=y

if

assert(ndifferent>0);

if (ndifferent==1) {

Arrays.sort(newCand);
tempCandidates.put(Arrays.toString(newCand),new
Cand);

itemsets = new
ArrayList<int[]>(tempCandidates.values());

log("Created "+itemsets.size()+" unique itemsets


of size "+(currentSizeOfItemsets+1));

}
private void line2booleanArray(String line, boolean[]
trans) {

Arrays.fill(trans, false);

StringTokenizer stFile = new StringTokenizer(line, "


"); //read a line from thefile to the tokenizer

while (stFile.hasMoreTokens())

DATA WAREHOUSING AND DATA MINING LAB PAGE NO:


DEPARTMENMT OF CSE RISE KRISHNA SAI PRAKASAM INSTITUTION :: ONGOLE

int parsedVal =
Integer.parseInt(stFile.nextToken());

trans[parsedVal]=true; //if it is not a 0,


assign the value to true

private void calculateFrequentItemsets() throws Exception

log("Passing through the data to compute the


frequency of " + itemsets.size()+ " itemsets of size
"+itemsets.get(0).length);

List<int[]> frequentCandidates = new


ArrayList<int[]>();

boolean match;

int count[] = new int[itemsets.size()];

BufferedReader data_in = new


BufferedReader(new InputStreamReader(new
FileInputStream(transaFile)));

boolean[] trans = new boolean[numItems];

for (int i = 0; i < numTransactions; i++) {

DATA WAREHOUSING AND DATA MINING LAB PAGE NO:


DEPARTMENMT OF CSE RISE KRISHNA SAI PRAKASAM INSTITUTION :: ONGOLE

String line = data_in.readLine();


line2booleanArray(line, trans);

for (int c = 0; c < itemsets.size();


c++) {
match = true;

int[] cand = itemsets.get(c);

for (int xx : cand) {

if (trans[xx] == false) {match


= false; break;
}

if (match) {

count[c]++;

data_in.close();

for (int i = 0; i < itemsets.size(); i++) {

/ if ((count[i] / (double)

DATA WAREHOUSING AND DATA MINING LAB PAGE NO:


DEPARTMENMT OF CSE RISE KRISHNA SAI PRAKASAM INSTITUTION :: ONGOLE

(numTransactions)) >= minSup) {

foundFrequentItemSet(itemsets.get(i),count[i]);
frequentCandidates.add(itemsets.get(i));

itemsets = frequentCandidates;

OUTPUT:

Enter the minimum support (as a floating point value, 0<x<1):


0.5
Transaction Number: 1:

Transaction Number: 2:
Item number 1 = 2
Item number 2 = 3
Item number 3 = 5
Transaction Number: 3:

Transaction Number: 4:

DATA WAREHOUSING AND DATA MINING LAB PAGE NO:


14)

You might also like