Introduction To Machine Learning and Data Mining: Arturo J. Patungan, Jr. University of Sto. Tomas Strandasia
Introduction To Machine Learning and Data Mining: Arturo J. Patungan, Jr. University of Sto. Tomas Strandasia
• Day 2
Testing the Model
Validating the Model
Finding the right model
Outline
• Day 3
Optimization of Model Parameter
Automated model selection and optimization
Case Study
Rapidminer Studio Interface
and Basic Data Processing
Introduction to Rapidminer
Studio
Repository/ Parameter
Source tabs tabs
Canvas
Operators/
Analysis
tabs Description
tabs
How to Import Data?
1. Cont.
Select the data and click “next”
Configure the data; then, click “next”
Save the data to your repository and click
“Finish”.
2. Using “READ DATA” operator
Locate “Read Data” operator by typing “Read” in
the Operator Area
Drag the “Read Data” operator that you will use
in the canvas
How to Import Data?
2. Using “READ DATA” operator (Cont)
In the Parameter Tabs, you could set the data
you need for analysis by browsing the data or
using the “Import Configuration Wizard”.
Data Viewing and Exploratory
Analysis
• To view the data set and find the descriptive
and diagnostic about the model, just connect
the data set (or “read data”) nodes to the result
knob (“res”)
• Click “RUN” to view.
• Click the “Results” tabs to view the data that
were loaded to the machine.
• To find the basic statistics of each attributes,
click the “Statistics” Tab.
Quick Visualization
• For quick Visualization of the data, look at the
“Result” tab.
• There are two ways to look at the visualization:
Click on the row of the attributes in the
“statistics” tab that you want to view and click
“Open Visualization”.
Click “Visualization” tabs and specify the graph
and variables that you want to see.
Data Preparation
• Split – Validation
– the data analyst will determine how the data will be split
into “training data” set and “testing data” set.
– The training data is where the model will learn and build the
model; while, the testing data (hidden) is where we will
check the “knowledge” we had acquired from the training.
– Question? How much is to be used in training and testing?
• Cross – Validation
– the cases will be split into random k groups so that each
group is approximately equal in sizes.
– A model will be made from each of the group and will be
tested to the “omitted” case from each group
– The problem of affecting the error in arbitrarily assignment
to groups
How to Perform Model Testing?
• Using the First Method
Starting with the “Applying the Model” processes,
we could manually compare the predicted value
with that of the actual value.
Use the “Performance” operator to automatically
find the performance of the model
• The “Performance” operator is dependent on the
model that we build and the goal of the analytics
Use the performance of the model to compare
and improve the model
How to Perform Model Testing?
• Split – Validation
With the model we build from “Applying the
Model” processes, we will introduce “Split –
Validation” operator.
Set the splitting ratio that you will use in the
parameter tabs.
Double click the operator to go to its sub –
process.
In the training area, drag and drop the algorithm
that you will use.
How to Perform Model Testing?
• Cross - Validation
With the model we build from “Applying the
Model” processes, we will introduce “Cross –
Validation” operator.
Set the number of “folds” and the “sampling type”
that you will use in the parameter tabs.
Double click the operator to go to its sub –
process.
In the training area, drag and drop the algorithm
that you will use.
How to Perform Model Testing?