WEKA - ExperimenterTutorial
WEKA - ExperimenterTutorial
c
2002-2007 David Scuse and University of Waikato
Contents
1 Introduction 2
2 Standard Experiments 3
Simple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
New experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Results destination . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Experiment type . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Iteration control . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Saving the setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Running an Experiment . . . . . . . . . . . . . . . . . . . . . . . 10
Advanced . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Defining an Experiment . . . . . . . . . . . . . . . . . . . . . . . 12
Running an Experiment . . . . . . . . . . . . . . . . . . . . . . . 15
Changing the Experiment Parameters . . . . . . . . . . . . . . . 17
Other Result Producers . . . . . . . . . . . . . . . . . . . . . . . 24
3 Remote Experiments 29
Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Database Server Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Remote Engine Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Configuring the Experimenter . . . . . . . . . . . . . . . . . . . . . . . 31
Troubleshooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4 Analysing Results 33
Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Saving the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Changing the Baseline Scheme . . . . . . . . . . . . . . . . . . . . . . 36
Statistical Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Summary Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Ranking Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
1
1 Introduction
The Weka Experiment Environment enables the user to create, run, modify,
and analyse experiments in a more convenient manner than is possible when
processing the schemes individually. For example, the user can create an exper-
iment that runs several schemes against a series of datasets and then analyse
the results to determine if one of the schemes is (statistically) better than the
other schemes.
The Experiment Environment can be run from the command line using the
Simple CLI. For example, the following commands could be typed into the CLI
to run the OneR scheme on the Iris dataset using a basic train and test process.
(Note that the commands would be typed on one line into the CLI.)
While commands can be typed directly into the CLI, this technique is not
particularly convenient and the experiments are not easy to modify.
The Experimenter comes in two flavours, either with a simple interface that
provides most of the functionality one needs for experiments, or with an interface
with full access to the Experimenter’s capabilities. You can choose between
those two with the Experiment Configuration Mode radio buttons:
• Simple
• Advanced
Both setups allow you to setup standard experiments, that are run locally on
a single machine, or remote experiments, which are distributed between several
hosts. The distribution of experiments cuts down the time the experiments will
take until completion, but on the other hand the setup takes more time.
The next section covers the standard experiments (both, simple and ad-
vanced), followed by the remote experiments and finally the analysing of the
results.
This manual is also available online on the WekaDoc Wiki [7].
2
2 Standard Experiments
Simple
New experiment
After clicking New default parameters for an Experiment are defined.
Results destination
By default, an ARFF file is the destination for the results output. But you can
choose between
• ARFF file
• CSV file
• JDBC database
ARFF file and JDBC database are discussed in detail in the following sec-
tions. CSV is similar to ARFF, but it can be used to be loaded in an external
spreadsheet application.
ARFF file
If the file name is left empty a temporary file will be created in the TEMP
directory of the system. If one wants to specify an explicit results file, click on
Browse and choose a filename, e.g., Experiment1.arff.
3
Click on Save and the name will appear in the edit field next to ARFF file.
The advantage of ARFF or CSV files is that they can be created without
any additional classes besides the ones from Weka. The drawback is the lack
of the ability to resume an experiment that was interrupted, e.g., due to an
error or the addition of dataset or algorithms. Especially with time-consuming
experiments, this behavior can be annoying.
JDBC database
With JDBC it is easy to store the results in a database. The necessary jar
archives have to be in the CLASSPATH to make the JDBC functionality of a
particular database available.
After changing ARFF file to JDBC database click on User... to specify
JDBC URL and user credentials for accessing the database.
4
After supplying the necessary data and clicking on OK, the URL in the main
window will be updated.
Note: at this point, the database connection is not tested; this is done when
the experiment is started.
Experiment type
The user can choose between the following three different types
• Cross-validation (default)
performs stratified cross-validation with the given number of folds
5
• Train/Test Percentage Split (order preserved)
because it is impossible to specify an explicit train/test files pair, one can
abuse this type to un-merge previously merged train and test file into the
two original files (one only needs to find out the correct percentage)
6
Datasets
One can add dataset files either with an absolute path or with a relative one.
The latter makes it often easier to run experiments on different machines, hence
one should check Use relative paths, before clicking on Add new....
In this example, open the data directory and choose the iris.arff dataset.
After clicking Open the file will be displayed in the datasets list. If one
selects a directory and hits Open, then all ARFF files will be added recursively.
Files can be deleted from the list by selecting them and then clicking on Delete
selected.
7
Iteration control
• Number of repetitions
In order to get statistically meaningful results, the default number of it-
erations is 10. In case of 10-fold cross-validation this means 100 calls of
one classifier with training data and tested against test data.
Algorithms
New algorithms can be added via the Add new... button. Opening this dialog
for the first time, ZeroR is presented, otherwise the one that was selected last.
With the Choose button one can open the GenericObjectEditor and choose
another classifier.
8
Additional algorithms can be added again with the Add new... button, e.g.,
the J48 decision tree.
After setting the classifier parameters, one clicks on OK to add it to the list
of algorithms.
9
With the Load options... and Save options... buttons one can load and save
the setup of a selected classifier from and to XML. This is especially useful
for highly configured classifiers (e.g., nested meta-classifiers), where the manual
setup takes quite some time, and which are used often.
By default, the format of the experiment files is the binary format that Java
serialization offers. The drawback of this format is the possible incompatibility
between different versions of Weka. A more robust alternative to the binary
format is the XML format.
Previously saved experiments can be loaded again via the Open... button.
Running an Experiment
To run the current experiment, click the Run tab at the top of the Experiment
Environment window. The current experiment performs 10 runs of 10-fold strat-
10
ified cross-validation on the Iris dataset using the ZeroR and J48 scheme.
If the experiment was defined correctly, the 3 messages shown above will
be displayed in the Log panel. The results of the experiment are saved to the
dataset Experiment1.arff.
11
Advanced
Defining an Experiment
When the Experimenter is started in Advanced mode, the Setup tab is displayed.
Click New to initialize an experiment. This causes default parameters to be
defined for the experiment.
Double click on the data folder to view the available datasets or navigate to
an alternate location. Select iris.arff and click Open to select the Iris dataset.
12
The dataset name is now displayed in the Datasets panel of the Setup tab.
13
Type the name of the output file, click Select, and then click close (x). The
file name is displayed in the outputFile panel. Click on OK to close the window.
The dataset name is displayed in the Destination panel of the Setup tab.
14
The experiment can be restored by selecting Open in the Setup tab and then
selecting Experiment1.exp in the dialog window.
Running an Experiment
To run the current experiment, click the Run tab at the top of the Experiment
Environment window. The current experiment performs 10 randomized train
and test runs on the Iris dataset, using 66% of the patterns for training and
34% for testing, and using the ZeroR scheme.
15
If the experiment was defined correctly, the 3 messages shown above will
be displayed in the Log panel. The results of the experiment are saved to the
dataset Experiment1.arff. The first few lines in this dataset are shown below.
@relation InstanceResultListener
16
@attribute KB_mean_information numeric
@attribute KB_relative_information numeric
@attribute True_positive_rate numeric
@attribute Num_true_positives numeric
@attribute False_positive_rate numeric
@attribute Num_false_positives numeric
@attribute True_negative_rate numeric
@attribute Num_true_negatives numeric
@attribute False_negative_rate numeric
@attribute Num_false_negatives numeric
@attribute IR_precision numeric
@attribute IR_recall numeric
@attribute F_measure numeric
@attribute Time_training numeric
@attribute Time_testing numeric
@attribute Summary string
@data
iris,1,weka.classifiers.rules.ZeroR,’’,48055541465867954,20051221.0055,99,51,
17,34,0,33.333333,66.666667,0,0,0.444444,0.471405,100,100,80.833088,80.833088,
0,1.584963,1.584963,0,0,0,0,1,17,1,34,0,0,0,0,0.333333,1,0.5,0.01,0,?
17
Click on the splitEvaluator entry to display the SplitEvaluator properties.
This scheme has no modifiable properties (besides debug mode on/off) but
most other schemes do have properties that can be modified by the user. Click
on the Choose button to select a different scheme. The window below shows
the parameters available for the J48 decision-tree scheme. If desired, modify
the parameters and then click OK to close the window.
18
The name of the new scheme is displayed in the Result generator panel.
19
Click Select property and expand splitEvaluator so that the classifier entry
is visible in the property list; click Select.
20
To add another scheme, click on the Choose button to display the Generic-
ObjectEditor window.
21
The new scheme is added to the Generator properties panel. Click Add to
add the new scheme.
Now when the experiment is run, results are generated for both schemes.
To add additional schemes, repeat this process. To remove a scheme, select
the scheme by clicking on it and then click Delete.
22
Raw Output
The raw output generated by a scheme during an experiment can be saved to
a file and then examined at a later time. Open the ResultProducer window by
clicking on the Result generator panel in the Setup tab.
Click on rawOutput and select the True entry from the drop-down list. By
default, the output is sent to the zip file splitEvaluatorOut.zip. The output file
can be changed by clicking on the outputFile panel in the window. Now when
the experiment is run, the result of each processing run is archived, as shown
below.
Number of Leaves : 3
23
Correctly Classified Instances 47 92.1569 %
Incorrectly Classified Instances 4 7.8431 %
Kappa statistic 0.8824
Mean absolute error 0.0723
Root mean squared error 0.2191
Relative absolute error 16.2754 %
Root relative squared error 46.4676 %
Total Number of Instances 51
measureTreeSize : 5.0
measureNumLeaves : 3.0
measureNumRules : 3.0
The Result generator panel now indicates that cross-validation will be per-
formed. Click on More to generate a brief description of the CrossValidation-
ResultProducer.
24
As with the RandomSplitResultProducer, multiple schemes can be run during
cross-validation by adding them to the Generator properties panel.
The number of runs is set to 1 in the Setup tab in this example, so that only
one run of cross-validation for each scheme and dataset is executed.
When this experiment is analysed, the following results are generated. Note
that there are 30 (1 run times 10 folds times 3 schemes) result lines processed.
25
The associated help file is shown below.
26
In this experiment, the ZeroR, OneR, and J48 schemes are run 10 times with
10-fold cross-validation. Each set of 10 cross-validation folds is then averaged,
producing one result line for each run (instead of one result line for each fold as
in the previous example using the CrossValidationResultProducer ) for a total of
30 result lines. If the raw output is saved, all 300 results are sent to the archive.
27
28
3 Remote Experiments
Remote experiments enable you to distribute the computing load across multiple
computers. In the following we will discuss the setup and operation for HSQLDB
[5] and MySQL [6].
Preparation
To run a remote experiment you will need:
• A database server.
• To edit the remote engine policy file included in the Weka distribution to
allow Java class and dataset loading from your home directory.
For the following examples, we assume a user called johndoe with this setup:
29
• MySQL
We won’t go into the details of setting up a MySQL server, but this is
rather straightforward and includes the following steps:
– Download a suitable version of MySQL for your server machine.
– Install and start the MySQL server.
– Create a database - for our example we will use experiment as data-
base name.
– Download the appropriate JDBC driver, extract the JDBC jar and
place it as mysql.jar in /home/johndoe/jars.
30
Configuring the Experimenter
Now we will run the Experimenter:
• HSQLDB
– Copy the DatabaseUtils.props.hsql file from weka/experiment in the
weka.jar archive to the /home/johndoe/remote engine directory
and rename it to DatabaseUtils.props.
– Edit this file and change the ”jdbcURL=jdbc:hsqldb:hsql://server name/database name”
entry to include the name of the machine that is running your database
server (e.g., jdbcURL=jdbc:hsqldb:hsql://dodo.company.com/experiment).
– Now start the Experimenter (inside this directory):
java \
-cp /home/johndoe/jars/hsqldb.jar:remoteEngine.jar:/home/johndoe/weka/weka.jar \
-Djava.rmi.server.codebase=file:/home/johndoe/weka/weka.jar \
weka.gui.experiment.Experimenter
• MySQL
– Copy the DatabaseUtils.props.mysql file from weka/experiment in
the weka.jar archive to the /home/johndoe/remote engine direc-
tory and rename it to DatabaseUtils.props.
– Edit this file and change the ”jdbcURL=jdbc:mysql://server name:3306/database name”
entry to include the name of the machine that is running your database
server and the name of the database the result will be stored in (e.g.,
jdbcURL=jdbc:mysql://dodo.company.com:3306/experiment).
– Now start the Experimenter (inside this directory):
java \
-cp /home/johndoe/jars/mysql.jar:remoteEngine.jar:/home/johndoe/weka/weka.jar \
-Djava.rmi.server.codebase=file:/home/johndoe/weka/weka.jar \
weka.gui.experiment.Experimenter
Note: the database name experiment can still be modified in the Exper-
imenter, this is just the default setup.
31
• Now enable the Distribute Experiment panel by checking the tick box.
• Click on the Hosts button and enter the names of the machines that you
started remote engines on (<Enter> adds the host to the list).
Troubleshooting
• If you get an error at the start of an experiment that looks a bit like this:
01:13:19: RemoteExperiment (//blabla.company.com/RemoteEngine)
(sub)experiment (datataset vineyard.arff) failed :
java.sql.SQLException: Table already exists: EXPERIMENT INDEX
in statement [CREATE TABLE Experiment index ( Experiment type
LONGVARCHAR, Experiment setup LONGVARCHAR, Result table INT )]
01:13:19: dataset :vineyard.arff RemoteExperiment
(//blabla.company.com/RemoteEngine) (sub)experiment (datataset
vineyard.arff) failed : java.sql.SQLException: Table already
exists: EXPERIMENT INDEX in statement [CREATE TABLE
Experiment index ( Experiment type LONGVARCHAR, Experiment setup
LONGVARCHAR, Result table INT )]. Scheduling for execution on
another host.
then do not panic - this happens because multiple remote machines are
trying to create the same table and are temporarily locked out - this will
resolve itself so just leave your experiment running - in fact, it is a sign
that the experiment is working!
32
4 Analysing Results
Setup
Weka includes an experiment analyser that can be used to analyse the results
of experiments (in this example, the results were sent to an InstancesResultLis-
tener ). The experiment shown below uses 3 schemes, ZeroR, OneR, and J48, to
classify the Iris data in an experiment using 10 train and test runs, with 66%
of the data used for training and 34% used for testing.
33
The number of result lines available (Got 30 results) is shown in the Source
panel. This experiment consisted of 10 runs, for 3 schemes, for 1 dataset, for a
total of 30 result lines. Results can also be loaded from an earlier experiment file
by clicking File and loading the appropriate .arff results file. Similarly, results
sent to a database (using the DatabaseResultListener ) can be loaded from the
database.
Select the Percent correct attribute from the Comparison field and click
Perform test to generate a comparison of the 3 schemes.
The schemes used in the experiment are shown in the columns and the
datasets used are shown in the rows.
The percentage correct for each of the 3 schemes is shown in each dataset
row: 33.33% for ZeroR, 94.31% for OneR, and 94.90% for J48. The annotation
v or * indicates that a specific result is statistically better (v) or worse (*)
than the baseline scheme (in this case, ZeroR) at the significance level specified
(currently 0.05). The results of both OneR and J48 are statistically better than
the baseline established by ZeroR. At the bottom of each column after the first
column is a count (xx/ yy/ zz) of the number of times that the scheme was
better than (xx), the same as (yy), or worse than (zz), the baseline scheme on
the datasets used in the experiment. In this example, there was only one dataset
and OneR was better than ZeroR once and never equivalent to or worse than
ZeroR (1/0/0); J48 was also better than ZeroR on the dataset.
The standard deviation of the attribute being evaluated can be generated
by selecting the Show std. deviations check box and hitting Perform test again.
The value (10) at the beginning of the iris row represents the number of esti-
mates that are used to calculate the standard deviation (the number of runs in
this case).
34
Selecting Number correct as the comparison field and clicking Perform test
generates the average number correct (out of 50 test patterns - 33% of 150
patterns in the Iris dataset).
Clicking on the button for the Output format leads to a dialog that lets you
choose the precision for the mean and the std. deviations, as well as the format
of the output.
The following formats are supported:
• LaTeX
• CSV
35
Saving the Results
The information displayed in the Test output panel is controlled by the currently-
selected entry in the Result list panel. Clicking on an entry causes the results
corresponding to that entry to be displayed.
The results shown in the Test output panel can be saved to a file by clicking
Save output. Only one set of results can be saved at a time but Weka permits
the user to save all results to the same file by saving them one at a time and
using the Append option instead of the Overwrite option for the second and
subsequent saves.
36
If the test is performed on the Percent correct field with OneR as the base
scheme, the system indicates that there is no statistical difference between the
results for OneR and J48. There is however a statistically significant difference
between OneR and ZeroR.
Statistical Significance
The term statistical significance used in the previous section refers to the re-
sult of a pair-wise comparison of schemes using either a standard T-Test or the
corrected resampled T-Test [2]. The latter test is the default, because the stan-
dard T-Test can generate too many significant differences due to dependencies
in the estimates (in particular when anything other than one run of an x-fold
cross-validation is used). For more information on the T-Test, consult the Weka
book [1] or an introductory statistics text. As the significance level is decreased,
the confidence in the conclusion increases.
In the current experiment, there is not a statistically significant difference
between the OneR and J48 schemes.
Summary Test
Selecting Summary from Test base and performing a test causes the following
information to be generated.
37
In this experiment, the first row (- 1 1) indicates that column b (OneR) is
better than row a (ZeroR) and that column c (J48) is also better than row a.
The number in brackets represents the number of significant wins for the column
with regard to the row. A 0 means that the scheme in the corresponding column
did not score a single (significant) win with regard to the scheme in the row.
Ranking Test
Selecting Ranking from Test base causes the following information to be gener-
ated.
The ranking test ranks the schemes according to the total number of sig-
nificant wins (>) and losses (<) against the other schemes. The first column
(> − <) is the difference between the number of wins and the number of losses.
This difference is used to generate the ranking.
38
References
[1] Witten, I.H. and Frank, E. (2005) Data Mining: Practical machine learn-
ing tools and techniques. 2nd edition Morgan Kaufmann, San Francisco.
[2] Bengio, Y. and Nadeau, C. (1999) Inference for the Generalization Error.
[3] Ross Quinlan (1993). C4.5: Programs for Machine Learning, Morgan
Kaufmann Publishers, San Mateo, CA.
39