0% found this document useful (0 votes)

188 views

Introduction To Machine Learning and Data Mining: Arturo J. Patungan, Jr. University of Sto. Tomas Strandasia

This document outlines a 3-day training course on machine learning and data mining using Rapidminer Studio. Day 1 covers introducing the Rapidminer interface, data preparation such as handling missing data and data visualization. It also covers building classification, regression, clustering, association and anomaly detection models. Day 2 focuses on applying, testing and validating models. Day 3 covers optimizing model parameters and performing automated model selection and optimization with a case study. The document provides instructions on tasks like importing data, exploring data visually, preparing data by filtering and imputing missing values, and building specific models like logistic regression, decision trees and k-means clustering.

Uploaded by

Tim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

188 views

Introduction To Machine Learning and Data Mining: Arturo J. Patungan, Jr. University of Sto. Tomas Strandasia

Uploaded by

Tim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 103

Introduction to

Machine Learning and

Data Mining
Arturo J. Patungan, Jr.
University of Sto. Tomas
StrandAsia
Outline
Day 1
• Introduction to Rapidminer Interface
• Data Preparation, Basic Descriptive Statistics, Cleaning
• Data Visualization and Exploratory Analysis
• Building a Model
 Classification Model
 Regression Model
 Clustering Model
 Association and Correlation Models
 Anomaly Detection
Outline
• Day 1
• Applying the Model
 Classification Model
 Regression Model
 Clustering Model
 Association and Correlation Models
 Anomaly Detection Samples

• Day 2
 Testing the Model
 Validating the Model
 Finding the right model
Outline
• Day 3
 Optimization of Model Parameter
 Automated model selection and optimization
 Case Study
Rapidminer Studio Interface
and Basic Data Processing
Introduction to Rapidminer
Studio

Repository/ Parameter
Source tabs tabs

Canvas

Operators/
Analysis
tabs Description
tabs
How to Import Data?

1. Create a Data set within Rapidminer.

 Click “File” then “Import Data”
 Choose which will be the source of your data set.
How to Import Data?

1. Cont.
 Select the data and click “next”
 Configure the data; then, click “next”
 Save the data to your repository and click
“Finish”.
2. Using “READ DATA” operator
 Locate “Read Data” operator by typing “Read” in
the Operator Area
 Drag the “Read Data” operator that you will use
in the canvas
How to Import Data?
2. Using “READ DATA” operator (Cont)
 In the Parameter Tabs, you could set the data
you need for analysis by browsing the data or
using the “Import Configuration Wizard”.
Data Viewing and Exploratory
Analysis
• To view the data set and find the descriptive
and diagnostic about the model, just connect
the data set (or “read data”) nodes to the result
knob (“res”)
• Click “RUN” to view.
• Click the “Results” tabs to view the data that
were loaded to the machine.
• To find the basic statistics of each attributes,
click the “Statistics” Tab.
Quick Visualization
• For quick Visualization of the data, look at the
“Result” tab.
• There are two ways to look at the visualization:
 Click on the row of the attributes in the
“statistics” tab that you want to view and click
“Open Visualization”.
 Click “Visualization” tabs and specify the graph
and variables that you want to see.
Data Preparation

1. Removing Cases with missing data

 Get “Filter Examples” from the operator ; then,
drag and drop on the line connecting the “read
data” and the “res” knob in the canvas.
 In the parameter tab, click “Add Filters”.
 Select the “Attributes”; then, select “no missing
attributes” in the condition class.
 You can add more criteria by clicking “Add Entry”
 When you’re finished including all attributes you
want to filter, click “ok” to close.
 Click “run”.
Data Preparation

2. Imputing Missing Data

 Get “Replace Missing Value” from the operator.
 Drag and drop to the line connecting the “read
data” and the “res” knob.
 In the parameter tab, select how many
“Attribute/s” you want to impute.
 You can select the attribute/s that you want to
impute by clicking “Select Attribute”.
 Highlight the attribute and transfer to the
“Selected Attribute” bin. Click “Apply” to proceed.
 Select the method of replacement you want to
perform in the “Default”. Click “Run”
Data Preparation

3. Addressing data with wrong encoding and

duplicates
 To remove “white spaces” in the encoding, use
the operator “TRIM”
 To remove “duplicates” in the encoding, use the
operator “Remove Duplicates”.
 To recode mistyped values, use the “Replace”
operator.
 Click “run” to verify your process.
Other Pre – Processing Steps
• Using the same data set for different process.
 Use the “multiply” operator
• Joining Two Data Sets
*** If two data sets are needed to be merged in
order to make an analysis
 Find “Join” from the operator.
 Connect the first data set in the left nodes of the
operator “join” and the other data set at the right
nodes of the operator.
 Edit the “key attributes” in the parameter tabs.
 This is the connection of the data. Example the
“Costumer ID” of two data the “Order Detail” and
the “Costumer Detail”
Activity 1.
• Using the “bankdata” and “bankdata status”
perform the following:
1. Load the data and create a rapidminer data file.
2. Load the data using the “read excel” operator.
3. Join the two data using “id” as the attribute key.
4. Use the “multiply” operator to perform the
following:
 Remove the cases with missing data
 Impute the data using correct method of
imputation
 Trim, remove the duplicate data, and correct the
incorrect encoding of data
Building a Model in Rapidminer
Types of Models
• Classification Models (Is this A or B?)
 Logistic Regression
 Decision Tree
 Random Forest
 Naïve Bayes
 ANN
 SVM
• Regression Model (How much or How many?)
 Linear Regression
 Non – Linear Regression
 ANN
 SVM
Types of Models
• Clustering Models (How is this organized?
What belongs to each other?)
 K - Mean
 X mean
 DBSCAN
 GMM (Gaussian Mixture)
 Hierarchical
 ANN
 SVM
Types of Models
• Association and Correlation Models (What
Happens Together? What Changes Together?)
 Correlation
 Clustering models
• Anomaly Detection Models (Is this Weird?)
 Outlier detections
 Classification models
 Regression models
 Classification models
**** Little Help?
https://mod.rapidminer.com/
How to Build a Classification
Model
• Logistics Regression
 Load the data set
 Check your data set for possible problems
 Apply necessary data preparation
• Issue of missing data
• Issue of duplication
 Select attributes needed for the model
 Set role to the special attribute (Label)
 Find “Logistics Regression” model in operator
tabs and drag and drop to the canvas
 Connect the “mod”, “exa”, and “wei” of the
Logistics Regression operator to the “res” knob.
 Run the model
How to Build a Classification
Model
• Decision Tree
 Load the data set
 Check your data set for possible problems
 Apply necessary data preparation
• Issue of missing data
• Issue of duplication
 Select attributes needed for the model
 Set role to the special attribute (Label)
 Find “Decision Tree” model in operator tabs and
drag and drop to the canvas
 Connect the “mod”, “exa”, and “wei” of the
Decision Tree operator to the “res” knob.
 Run the model
How to Build a Classification
Model
• Follow the same procedure for other
classification models; however, just change the
model operator part.
Activity 2.
• Build a Classification Model for“bankdata” and
“bankdata status” using:
 Logistics Regression
 Decision Tree
 Random Forest
 Naïve Bayes
How to Build a Regression Model
• Linear Regression
 Load the data set
 Check your data set for possible problems
 Apply necessary data preparation
• Issue of missing data
• Issue of duplication
• Issue of miscoding
• Removing attributes not needed in the analysis
 Set role to the special attribute (Label)
How to Build a Regression Model
(cont)
 Check for possible problem of Multicollinearity
and autocorrelation
• How to check multicollinearity and
autocorrelation?
• Attach the “Correlation Matrix” operator
• Click “Run” to check high correlation between
variables
 Set role to the special attribute (Label)
 Find “Linear Regression” model in operator tabs
and drag and drop to the canvas
 Connect the “mod”, “exa”, and “wei” of the Linear
Regression operator to the “res” knob.
 Run the model
Activity 3: “Car sales data”

Case: You are a car dealer and you want to build

a model for the resale value of a car based from
its manufacturer, distance covered, type of the car,
its brand new price, engine, horsepower, fuel
capacity and fuel consumption.
How to Build a Classification
Model
• K - MEAN
 Load the data set
 Check your data set for possible problems
 Apply necessary data preparation
• Issue of missing data
• Issue of duplication
• Issue of miscoding
• Removing attributes not needed in the analysis
 Set role to the special attribute (Label)
 Find “K - MEAN” model in operator tabs and drag and
drop to the canvas
 Connect the “mod”, “exa”, and “wei” of the K - Mean
operator to the “res” knob.
 Run the model
How to Build a Classification
Model
• Hierarchical
 Load the data set
 Check your data set for possible problems
 Apply necessary data preparation
• Issue of missing data
• Issue of duplication
• Issue of miscoding
• Removing attributes not needed in the analysis
 Set role to the special attribute (Label)
 Find “Hierarchical” model in operator tabs and drag
and drop to the canvas
 Connect the “mod”, “exa”, and “wei” of the Hierarchical
operator to the “res” knob.
 Run the model
Activity 4: “Benefit data”
• Using the ‘benefit data”, cluster the membership
of costumers in a store using the 23 questions
asked from a survey (ben1 to ben23)
• Using the four characteristics (convenience,
service, comfort and goods), create a clustering
model for the costumers.
• Compare the two results.
Applying the Model
Apply???!!! Huh???!!!!
• Applying the model is the process of predicting
using a new data.
• It is finding out the accuracy and precision of
the model.
• This is where Machine Learning and Data
Mining start to differ than the usual statistical
process.
Applying Classification Model
• Logistics Regression
 Starting from the Model we made from “building
the model”, look for “Apply Model” operator
 Drag and drop on the canvas.
 Connect the “mod” from the “Logistic
Regression” operator to the “mod” socket of the
“Apply Model” operator.
• For applying the model in the same data set:
connect “exa” from the “Logistic Regression”
operator to the “exa” socket of the “Apply Model”
operator
 Run the model
Applying Classification Model
• Logistics Regression
 Connect the “mod” from the “Logistic
Regression” operator to the “mod” socket of the
“Apply Model” operator.
• For applying the model in new data set: connect
“exa” from the source of the new data set to the
“exa” socket of the “Apply Model” operator
 Run the model
Applying Classification Model
• Decision Tree
 Starting from the Model we made from “building
the model”, look for “Apply Model” operator
 Drag and drop on the canvas.
 Connect the “mod” from the “Decision Tree”
operator to the “mod” socket of the “Apply
Model” operator.
• For applying the model in the same data set:
connect “exa” from the “Decision Tree” operator to
the “exa” socket of the “Apply Model” operator
 Run the model
Applying Classification Model
• Decision Tree
 Connect the “mod” from the “Decision Tree”
operator to the “mod” socket of the “Apply
Model” operator.
• For applying the model in new data set: connect
“exa” from the source of the new data set to the
“exa” socket of the “Apply Model” operator
 Run the model

*** The process is the same with the other

classification models.
Activity 5: “delisting data”
• Using the “delisting” data set, build a model and
apply the model in: (create a LR, DT, and ANN
model)
 The data used in building the model
 Use the delisting_test data and predict if the
company will delist from the PSE or not.
Applying a Model
• Applying the model to the same data set
 Starting from the Model we made from “building
the model”, look for “Apply Model” operator
 Drag and drop on the canvas.
 Connect the “mod” from the “MODEL” operator
to the “mod” socket of the “Apply Model”
operator.
 Connect “exa” from the “MODEL” operator to the
“exa” socket of the “Apply Model” operator
 Run the model
Applying a Model
• Applying the model to a different data set
 Starting from the Model we made from “building
the model”, look for “Apply Model” operator
 Drag and drop on the canvas.
 Connect the “mod” from the “MODEL” operator
to the “mod” socket of the “Apply Model”
operator.
 Connect “exa” from the “source” operator to the
“exa” socket of the “Apply Model” operator
 Run the model
Activity 6:
• Perform applying the model to the model we
build.
Model Testing and
Performance Evaluation
Model Testing
• It is a process in finding out how the model
performs in a given data set.
• Could be done using a “labeled” data set
• Will give us the idea on how we could improve
the model
Ways of doing Model Testing
• Using the result in the “Apply Model” operator,
compare the predicted result with the actual
result. Comparison could be done “manually” or
using the “Performance” operator/s
• Use the “Validation” operator/s
 Two mostly used “Validating” operators are the
(a) split – validation, and
(b) cross – validation
Split – Validation Vs Cross - Validation

• Split – Validation
– the data analyst will determine how the data will be split
into “training data” set and “testing data” set.
– The training data is where the model will learn and build the
model; while, the testing data (hidden) is where we will
check the “knowledge” we had acquired from the training.
– Question? How much is to be used in training and testing?

• Cross – Validation
– the cases will be split into random k groups so that each
group is approximately equal in sizes.
– A model will be made from each of the group and will be
tested to the “omitted” case from each group
– The problem of affecting the error in arbitrarily assignment
to groups
How to Perform Model Testing?
• Using the First Method
 Starting with the “Applying the Model” processes,
we could manually compare the predicted value
with that of the actual value.
 Use the “Performance” operator to automatically
find the performance of the model
• The “Performance” operator is dependent on the
model that we build and the goal of the analytics
 Use the performance of the model to compare
and improve the model
How to Perform Model Testing?
• Split – Validation
 With the model we build from “Applying the
Model” processes, we will introduce “Split –
Validation” operator.
 Set the splitting ratio that you will use in the
parameter tabs.
 Double click the operator to go to its sub –
process.
 In the training area, drag and drop the algorithm
that you will use.
How to Perform Model Testing?

• Split – Validation (cont.)

 Search “apply model” operator in the operator
tabs. Drag and drop it in the “testing” area of the
sub – process.
 Look for the “Performance” operator and also
drag and drop it in the “testing” area.
• The “Performance” operator depends on what
algorithm you used. For example, if you use a
“Regression” model; then, the correct
“Performance” operator is that one of “Regression
Performance”.
 Connect the ports and go out of the sub –
process.
 Connect the ports and knobs and click “run”.
How to Perform Model Testing?

• Cross - Validation
 With the model we build from “Applying the
Model” processes, we will introduce “Cross –
Validation” operator.
 Set the number of “folds” and the “sampling type”
that you will use in the parameter tabs.
 Double click the operator to go to its sub –
process.
 In the training area, drag and drop the algorithm
that you will use.
How to Perform Model Testing?

• Cross – Validation (cont.)

 Search “apply model” operator in the operator
tabs. Drag and drop it in the “testing” area of the
sub – process.
 Look for the “Performance” operator and also
drag and drop it in the “testing” area.
• The “Performance” operator depends on what
algorithm did you use. For example if you use a
“Regression” model then the correct “Performance”
operator is that one of “Regression Performance”.
 Connect the ports and go out of the sub –
process.
 Connect the ports and knobs and click “run”.
Activity 7:
• Using the models we made in the previous
exercises, perform model testing.
Then what?

• The use of split – validation and/or cross –

validations are dependent on the goal and
objective of your study.
• Improve your model based from the
performance of the model you are using by
adjusting some parameter/s.
• Be careful of overfitting the model in the given
data set.
• How to verify if there is an overfitting?
How to verify if there is an
overfitting?
• What is “Validation”?
 It is one of the processes to find if there is an
overfitting in the model.
 It uses a new data set that is not used in the
model building.
 The process comes after the testing of the model
How to perform “Validation”?
• Using the process that we did in the “testing”,
we will load a new data set.
• Perform the cleaning and pre – analysis steps.
 Cleaning
 Selecting Attributes
 Setting the Role for the “Label”
• Search for the “Apply Model” from the operator
tabs. Drag and Drop to the canvas.
• Connect the “Mod” of the “Validation” operators
to the “Mod” of the “Apply Model” operator.
How to perform “Validation”?
• Connect the new data set to the “Apply Model”
operator.
• Search for the “Performance” operator. Drag
and drop to the canvas.
• Connect the necessary ports and knobs.
• Click “Run”.
• Evaluate the result.
Activity 8:
• Using the models we made in the previous
exercises, perform model validations.
Finding the Right Model?
• The process allows the researcher to find which
model performed well.
• The goal is to compare the models.
• Learn what are the best models for a given data
set.
Review
• Classification Models (Is this A or B?)
 Logistic Regression
 Decision Tree
 Random Forest
 Naïve Bayes
 ANN
 SVM
• Regression Model (How much or How many?)
 Linear Regression
 Non – Linear Regression
 ANN
 SVM
Review
• Clustering Models(How is this organized? What
belongs to each other?)
 K - Mean
 X mean
 DBSCAN
 GMM (Gaussian Mixture)
 Hierarchical
 ANN
 SVM
Review
• Association and Correlation Models(What
Happens Together? What Changes Together?)
 Correlation
 Clustering models
• Anomaly Detection Models (Is this Weird?)
 Outlier detections
 Classification models
 Regression models
 Classification models
**** Little Help?
https://mod.rapidminer.com/
Finding the Right Model
• Methods that one could possibly do:
 Individually create the model and compare the
performance of each model.
 Simultaneously perform the process in one
canvas
 Use the “Compare ROC” operator.

We will perform the second and the third options.

Finding the Right Model
• Perform pre – analysis processes in the data.
• Select the target variable by the “Set Role”
operator
• Search for the “Multiply” operator and introduce
to the canvas.
• Introduce a number (depends on the number of
model to be used) of “Testing” operators using a
“Split – Validation” or “Cross – Validation”
• In the sub-process of each “Validation”
operator, introduce your models.
Finding the Right Model (Cont)
• Put the necessary “Apply Model” and
“Performance” operators.
• Rename each “Validation” operator to
distinguish one from the other.
• Connect the ports and knobs
• Click Run.
• Evaluate the models
Finding the Right Model Using
ROC
• Perform pre – analysis processes in the data.
• Select the target variable by the “Set Role”
operator.
• Search for “Compare ROC” operator.
• Drag and Drop to the canvas and connect the
ports and knobs
• Get inside the sub – process of the “Compare
ROC” operator
• Drag and Drop the “Models” that you want to
compare.
Finding the Right Model Using
ROC
• Connect the ports and knobs and get outside
the sub – process.
• Connect the “Compare ROC” operator to the
result knob.
• Click run
ROC Result
Activity 9:

1. Find the best model to be used in the customer

churn data using “Compare ROC” method.
2. Using the multiple validation method, find the
best model for the flight data.
Optimization of Model Parameter
• The purpose of the process is to find the
parameter that will optimize the performance of
the model.
• It will allow the researcher not to guess the
parameter. It will give the performance of the
model; thus, allowing the researcher to find the
correct parameter to apply in the data set.
Optimizing the Model Parameter
• Load and apply pre – analysis processes.
• Select the target attributes using the “Set Role”
operator.
• Search for “Optimize Parameter (Grid)”
• Drag and drop it to the canvas
• Get inside the “Optimize Parameter” sub –
process, and paste the “Cross Validation”
process.
 The cross – validation process should contain
the model that we want to optimize.
Optimizing the Model Parameter
• Inside the “Cross Validation” will be similar with
that of the process we have in the “Testing”.
• Go to the sub – process of the Validation
operator and set the model we will be using in
the testing; then, apply model and performance
operators in the testing area.
• Click “Edit Parameter Settings” in the parameter
tabs.
• Highlight the operator we want to optimize.
• Select the parameters we want to optimize.
Optimizing the Model Parameter
• Supply the thresholds that the machine will be
using in the optimization.
• Click ok
• Click “Run”
• Evaluate the result
• Apply the parameters based from the optimized
results.
Activity 10
• Perform Parameter Optimization for the models
that we had done.
• Using the “Multiple Validation” method of model
selection, apply the parameters and perform
model selection activity.
How to Automate Model
Selection and Optimization?
• Goal
 To find the best model especially when all the
other models are performing as well.
 Lessen the probability of guessing and
snowballing the process
 To make the process more elegant
How to Automate Model
Selection and Optimization?
• The Process (using multiply operator):
 Starting with the “parameter optimization”, we
will create copies of it and change the model for
each of the copies in the validation operators.
 Change the parameter setting that you want to
optimize.
 Use the “Multiply” operator to create a multiple
source for each of the “Optimized Parameter”
operators
 Connect the operators and click run.
 Evaluate the results.
Full Automated Model Selection
and Optimization?
• The Process:
 Starting with the process we made from the
Automate Model Selection and Optimization
using “multiply” operator, we search for “Select
Subprocess” operators.
 Double click the “Select Subprocess” to get
inside of the operator.
 Cut the first “Optimized Operator” and paste it in
“selection 1”. Cut the 2nd “Optimized Operator”
and paste it in “selection 2”. Cut the 3rd
“Optimized Operator” and paste it in “selection
3”. And so on.
 Connect the processes; then, go to the main
process.
Full Automated Model Selection
and Optimization?
 Search for “Optimized Parameter (Grid)”
operator and bring the “Select Subprocess”
operator inside of it by cut and paste.
 In the “Optimized Parameter” operator, click on
“Edit Parameter Setting” .
 Highlight “Select Subprocess” in the operator to
populate the “Parameters” view.
 Transfer “Select_Which” to the Selected
Parameter using the arrow.
 In the “Grid”, set the min to 1 and max to the
number of models we want to optimize.
 Click ok and connect the ports and knobs
 Click run
Activity 10:
• Perform Automate Model Selection and
Optimization in the Churn Data
Capstone Activity
• Using the data in the capstone folder, CREATE
a Machine Learning Project

Improve Model Accuracy With Data Pre-Processing
No ratings yet
Improve Model Accuracy With Data Pre-Processing
11 pages
Data Preprocessing in Machine Learning
No ratings yet
Data Preprocessing in Machine Learning
27 pages
Fire Detection Major Project Report
100% (3)
Fire Detection Major Project Report
33 pages
RapidMiner Minibook
No ratings yet
RapidMiner Minibook
121 pages
Mid-Term-Exam Sample Solutions S2 2020 PDF
No ratings yet
Mid-Term-Exam Sample Solutions S2 2020 PDF
4 pages
Amit Kumar Tyagi - Data Science and Data Analytics - Opportunities and Challenges-Chapman and Hall - CRC (2021)
100% (1)
Amit Kumar Tyagi - Data Science and Data Analytics - Opportunities and Challenges-Chapman and Hall - CRC (2021)
483 pages
Idsa For Quiz 1
No ratings yet
Idsa For Quiz 1
21 pages
1 Tailieuthamkhao MachineLearning
No ratings yet
1 Tailieuthamkhao MachineLearning
151 pages
Data Mining Introduction
No ratings yet
Data Mining Introduction
35 pages
Data Mining Slide
No ratings yet
Data Mining Slide
35 pages
Data management
No ratings yet
Data management
36 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
Lect 1
No ratings yet
Lect 1
38 pages
2 Data Prep
No ratings yet
2 Data Prep
95 pages
3 DM
No ratings yet
3 DM
36 pages
Data Mining
No ratings yet
Data Mining
33 pages
FAM_QUESTION_BANK_CT[1]
No ratings yet
FAM_QUESTION_BANK_CT[1]
14 pages
1635838720082
No ratings yet
1635838720082
35 pages
Data Mining Course Overview
No ratings yet
Data Mining Course Overview
38 pages
Data Mining Lab Manual 2 2
No ratings yet
Data Mining Lab Manual 2 2
63 pages
Course 5: Quantitative Techniques For Decision Making - Ii (Machine Learning Techniques)
No ratings yet
Course 5: Quantitative Techniques For Decision Making - Ii (Machine Learning Techniques)
5 pages
Overview of Data Mining Process
No ratings yet
Overview of Data Mining Process
43 pages
DMiningKuliah 2A DPreparation
No ratings yet
DMiningKuliah 2A DPreparation
32 pages
FDP Day1
No ratings yet
FDP Day1
35 pages
Que Es Datamin
No ratings yet
Que Es Datamin
52 pages
Preprocessing - M2
No ratings yet
Preprocessing - M2
53 pages
6 Data Preprocessing
No ratings yet
6 Data Preprocessing
37 pages
8 Data Mining Concepts 2
No ratings yet
8 Data Mining Concepts 2
75 pages
Lesson Data Mining
No ratings yet
Lesson Data Mining
75 pages
Concepts (PPT) - Data Preprocessing
No ratings yet
Concepts (PPT) - Data Preprocessing
19 pages
CSE2021 - MODULE 1ppt
No ratings yet
CSE2021 - MODULE 1ppt
62 pages
2 - Preprocessing
No ratings yet
2 - Preprocessing
74 pages
Rapid Miner Tutorial
100% (1)
Rapid Miner Tutorial
15 pages
PGP-Data Science - Course Module With Internship Module
No ratings yet
PGP-Data Science - Course Module With Internship Module
17 pages
Sent-Machine Learning For Data Science
100% (1)
Sent-Machine Learning For Data Science
463 pages
Data Prep and Cleaning For Machine Learning
No ratings yet
Data Prep and Cleaning For Machine Learning
22 pages
Data Preparation
100% (1)
Data Preparation
87 pages
4CL
No ratings yet
4CL
76 pages
Assignment Solution 074
No ratings yet
Assignment Solution 074
8 pages
DMlecture1
No ratings yet
DMlecture1
39 pages
Download Complete Data Mining for Business Intelligence Concepts Techniques and Applications in Microsoft Office Excel r with XLMiner r 2nd ed Edition Patel PDF for All Chapters
100% (19)
Download Complete Data Mining for Business Intelligence Concepts Techniques and Applications in Microsoft Office Excel r with XLMiner r 2nd ed Edition Patel PDF for All Chapters
60 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
Data Mining
No ratings yet
Data Mining
73 pages
Download full Data Mining for Business Intelligence Concepts Techniques and Applications in Microsoft Office Excel r with XLMiner r 2nd ed Edition Patel ebook all chapters
100% (6)
Download full Data Mining for Business Intelligence Concepts Techniques and Applications in Microsoft Office Excel r with XLMiner r 2nd ed Edition Patel ebook all chapters
61 pages
PPT 1.1.5
No ratings yet
PPT 1.1.5
20 pages
AI351 Lecture 1
No ratings yet
AI351 Lecture 1
32 pages
SQL Server 2008 For Business Intelligence: UTS Short Course
No ratings yet
SQL Server 2008 For Business Intelligence: UTS Short Course
43 pages
Lec 1
No ratings yet
Lec 1
33 pages
1. Introduction to Data Mining & Classification
No ratings yet
1. Introduction to Data Mining & Classification
58 pages
BANA 560 - Lecture - 2 - Data - Mining - Overview - Data - Exploration
No ratings yet
BANA 560 - Lecture - 2 - Data - Mining - Overview - Data - Exploration
38 pages
ML Lect1
100% (1)
ML Lect1
51 pages
01-Introduction To Data Mining
No ratings yet
01-Introduction To Data Mining
43 pages
20231019142012303_29402Data Preprocessing - Data cleaning
No ratings yet
20231019142012303_29402Data Preprocessing - Data cleaning
12 pages
Data Mining Notes
No ratings yet
Data Mining Notes
43 pages
The Data Arena.
No ratings yet
The Data Arena.
11 pages
Data Mining Intro
No ratings yet
Data Mining Intro
46 pages
03 Data Preparation
No ratings yet
03 Data Preparation
28 pages
2 DMiningKuliah 2A DPreparation
No ratings yet
2 DMiningKuliah 2A DPreparation
32 pages
Chapter 02 Overview - 4
No ratings yet
Chapter 02 Overview - 4
43 pages
Lec 1
No ratings yet
Lec 1
48 pages
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
Communication Models
100% (1)
Communication Models
32 pages
Mechanics and Rules
No ratings yet
Mechanics and Rules
11 pages
Writing The Title For Qualitative Research
No ratings yet
Writing The Title For Qualitative Research
13 pages
Theatre and Performance Arts Onstage
No ratings yet
Theatre and Performance Arts Onstage
81 pages
Formality in Writing: Aspect of Professional and Academic Language
No ratings yet
Formality in Writing: Aspect of Professional and Academic Language
62 pages
Prediction of CNC Machining Parameters For Teak Wood by Using SVM Method 1
No ratings yet
Prediction of CNC Machining Parameters For Teak Wood by Using SVM Method 1
5 pages
Anomaly Detection in Computer Networks Using Linear Svms
No ratings yet
Anomaly Detection in Computer Networks Using Linear Svms
5 pages
ML Algorithms
No ratings yet
ML Algorithms
12 pages
Machine Learning
No ratings yet
Machine Learning
25 pages
Cloud-Based Industrial Cyber-Physical System For Data-Driven Reasoning. A Review and Use Case On An Industry 4.0 Pilot Line
No ratings yet
Cloud-Based Industrial Cyber-Physical System For Data-Driven Reasoning. A Review and Use Case On An Industry 4.0 Pilot Line
9 pages
Machine Learning Lecture - 2 and Lecture - 3
No ratings yet
Machine Learning Lecture - 2 and Lecture - 3
59 pages
A Research On Bitcoin Price Prediction Using Machine Learning Algorithms
No ratings yet
A Research On Bitcoin Price Prediction Using Machine Learning Algorithms
5 pages
Machine Learning Based Predicting House Prices Using Regression Techniques
No ratings yet
Machine Learning Based Predicting House Prices Using Regression Techniques
7 pages
Cs 229, Public Course Problem Set #2 Solutions: Kernels, SVMS, and Theory
No ratings yet
Cs 229, Public Course Problem Set #2 Solutions: Kernels, SVMS, and Theory
8 pages
FVM11
No ratings yet
FVM11
8 pages
Optimal Fuzzy Inference System Incorporated With Stability Index Tracing - An Application For Effective Load Shedding
No ratings yet
Optimal Fuzzy Inference System Incorporated With Stability Index Tracing - An Application For Effective Load Shedding
9 pages
Optical Character Recognition of Amharic Documents
No ratings yet
Optical Character Recognition of Amharic Documents
15 pages
Analytics For Improving Talent Acquisition Processes ICADABAI2015l
No ratings yet
Analytics For Improving Talent Acquisition Processes ICADABAI2015l
16 pages
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
No ratings yet
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
71 pages
Identifying Depression On Twitter: Moin Nadeem, Mike Horn., Glen Coppersmith, PHD, University of
No ratings yet
Identifying Depression On Twitter: Moin Nadeem, Mike Horn., Glen Coppersmith, PHD, University of
9 pages
Kernel SVM For Image Classification
No ratings yet
Kernel SVM For Image Classification
20 pages
Speech Emotion Recognition Using Deep Learning
No ratings yet
Speech Emotion Recognition Using Deep Learning
8 pages
Artificial Intelligence For Drug Toxicity and Safety: Columbia University Medical Center New York, NY
No ratings yet
Artificial Intelligence For Drug Toxicity and Safety: Columbia University Medical Center New York, NY
31 pages
Fast Methods For Deep Learning Based Object Detection
No ratings yet
Fast Methods For Deep Learning Based Object Detection
43 pages
4 Ijaema December 4812
No ratings yet
4 Ijaema December 4812
7 pages
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
No ratings yet
Dr. S. Vairachilai Department of CSE CVR College of Engineering Mangalpalli Telangana
18 pages
2022 - Predicting Tunnel Squeezing Using Support Vector Machine Optimized by Whale Optimization Algorithm
No ratings yet
2022 - Predicting Tunnel Squeezing Using Support Vector Machine Optimized by Whale Optimization Algorithm
24 pages
Image Classification Handson-Image - Test
No ratings yet
Image Classification Handson-Image - Test
5 pages
Stock Market Analysis
100% (1)
Stock Market Analysis
19 pages
An Accelerometer - Based Leak Detection System
No ratings yet
An Accelerometer - Based Leak Detection System
16 pages
Hatdog 1.2
No ratings yet
Hatdog 1.2
18 pages
Detecting Jute Plant Disease Using Image Processing and Machine Learning
No ratings yet
Detecting Jute Plant Disease Using Image Processing and Machine Learning
6 pages

Introduction To Machine Learning and Data Mining: Arturo J. Patungan, Jr. University of Sto. Tomas Strandasia

Uploaded by

Introduction To Machine Learning and Data Mining: Arturo J. Patungan, Jr. University of Sto. Tomas Strandasia

Uploaded by

Introduction to

Machine Learning and

1. Create a Data set within Rapidminer.

1. Removing Cases with missing data

2. Imputing Missing Data

3. Addressing data with wrong encoding and

Case: You are a car dealer and you want to build

*** The process is the same with the other

• Split – Validation (cont.)

• Cross – Validation (cont.)

• The use of split – validation and/or cross –

We will perform the second and the third options.

1. Find the best model to be used in the customer

You might also like