0% found this document useful (0 votes)

44 views55 pages

CCST9017 (2023-24lecture11printed Version) MachineLearning

Machine learning and AI concepts were discussed, including: 1) Types of machine learning such as supervised learning, unsupervised learning, and reinforcement learning were introduced. 2) Deep learning using artificial neural networks and stochastic gradient descent was also covered. 3) Applications of machine learning and AI like speech recognition, image recognition, and medical analysis were mentioned.

Uploaded by

meganyaptan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views55 pages

CCST9017 (2023-24lecture11printed Version) MachineLearning

Uploaded by

meganyaptan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 55

CCST9017

Hidden Order in Daily Life:

A Mathematical Perspective
Lecture 11
AI and Machine Learning

Dr. Zhiwen Zhang

Department of Mathematics, HKU
Contents
 Types of Machine learning
Supervised learning, Unsupervised learning, Reinforcement
learning
 Deep learning
Artificial neural networks, deep neural networks
Stochastic gradient descent
 Applications and Artificial intelligence (AI)
Automatic speech recognition, Image recognition, Drug discovery,
Medical Image Analysis, Mobile advertising, Financial
transactions, etc.
What is Learning?

 “Learning denotes changes in a system that ... enable a system to do

the same task … more efficiently the next time.” - Herbert Simon
 “Learning is making useful changes in our minds.” - Marvin Minsky
 “Machine learning refers to a system capable of the autonomous
acquisition and integration of knowledge.”
Machine learning

Machine learning is an application of artificial intelligence (AI)
that provides systems the ability to automatically learn and
improve from experience without being explicitly programmed.
 Machine learning algorithms build a mathematical model based
on sample data, known as "training data", to make predictions or
decisions without being explicitly programmed to do so.
 Machine learning is closely related to computational statistics,
which focuses on making predictions using computers.
 The study of mathematical optimization delivers methods,
theory and application domains to the field of machine learning.
Why Machine Learning?
 No human experts
 industrial/manufacturing control
 mass spectrometer analysis, drug design, astronomic discovery
 Black-box human expertise
 face/handwriting/speech recognition
 driving a car, flying a plane
 Rapidly changing phenomena
 credit scoring, financial modeling
 diagnosis, fraud detection
 Need for customization/personalization
 personalized news reader
 movie/book recommendation
Example: Spam Filter
Example: Digit Recognition
Related Fields
decision game
theory control theory
AI
theory
information
biological theory
evolution
Machine
probability
Learning
& philosophy
statistics
optimization
Data Mining statistical psychology
mechanics
computational
complexity
theory neurophysiology

Machine learning is primarily concerned with the accuracy

and effectiveness of the computer system.
Machine learning and our focus
 Like human learning from past experiences. A
computer does not have “experiences”.
 A computer system learns from data, which
represent some “past experiences” of an application
domain.
 Our focus: learn a target function that can be used
to predict the values of a discrete class attribute,
e.g., approve or not-approved, and high-risk or low
risk.
 For example, A credit card company receives thousands of
applications for new cards. Each application contains information
about an applicant, including age, Marital status, annual salary, etc.
Problem: whether an application should be approved?
Machine Learning Problems
 Supervised Learning: Data and corresponding labels
are given

 Unsupervised Learning: Only data is given, no labels

provided

 Semi-supervised Learning: Some (if not all) labels

are present

 Reinforcement Learning: An agent interacting with

the world makes observations, takes actions, and is
rewarded or punished; it should learn to choose actions
in such a way as to obtain a lot of reward.
Supervised vs. unsupervised
Learning
 Supervised learning: classification is seen as
supervised learning from examples.
 Supervision: The data (observations,
measurements, etc.) are labeled with pre-defined
classes. It is like that a “teacher” gives the classes
(supervision).
 Test data are classified into these classes too.
 Unsupervised learning (clustering)
 Class labels of the data are unknown
 Given a set of data, the task is to establish the
existence of classes or clusters in the data
Algorithms
 Supervised learning
 Classification (discrete labels): linear classifier (e.g.
Support vector machine), Decision tree algorithm.
 Regression (real values)

 Unsupervised learning
 Clustering: K-nearest neighbors,
 Probability distribution estimation: Naïve Bayes,
Hidden Markov models (HMM).
 Reinforcement learning
 Decision making (robot, chess machine)
The data and the goal

 Data: A set of data records (also called

examples, instances or cases) described by
 k attributes: A1, A2, … Ak.
 a class: Each example is labelled with a pre-
defined class.
 Goal: To learn a classification model from the
data that can be used to predict the classes
of new (future, or test) cases/instances.
Example: data (loan application)
Approved or not
An example: the learning task

 Learn a classification model from the data

 Use the model to classify future loan applications
into
 Yes (approved) and
 No (not approved)
 What is the class for following case/instance?
Decision tree
 Decision tree learning is one of the most widely
used techniques for classification.
 Its classification accuracy is competitive with other
methods, and it is very efficient.
 The classification model is a tree, called
decision tree.
 C4.5 is an algorithm used to generate
a decision tree developed by Ross Quinlan,
ranking #1 in the Top 10 Algorithms in Data
Mining.
A decision tree from the loan data

Decision nodes and leaf nodes (classes)

Is the decision tree unique?
No. Here is a simpler tree.
We want smaller tree and accurate tree.
 Easy to understand and perform better.

Finding the best tree is

NP-hard.
All current tree building
algorithms are
heuristic algorithms.
Choose an attribute to partition data

 The key to building a decision tree - which attribute

to choose in order to branch.
 The objective is to reduce impurity or uncertainty in
data as much as possible.
 A subset of data is pure if all instances belong to the
same class.
 The heuristic in C4.5 is to choose the attribute with
the maximum Information Gain or Gain Ratio based
on information theory.
Another example for decision
tree
Decide whether to wait for a table at a restaurant, based on
the following attributes:
1. Alternate: is there an alternative restaurant nearby?
2. Bar: is there a comfortable bar area to wait in?
3. Fri/Sat: is today Friday or Saturday?
4. Hungry: are we hungry?
5. Patrons: number of people in the restaurant (None, Some, Full)
6. Price: price range ($, $$, $$$)
7. Raining: is it raining outside?
8. Reservation: have we made a reservation?
9. Type: kind of restaurant (French, Italian, Thai, Burger)
10. WaitEstimate: estimated waiting time (0-10, 10-30, 30-60, >60)
Attribute(feature)-based representations
 Examples described by feature(attribute) values
 (Boolean, discrete, continuous)

 E.g., situations where I will/won't wait for a table:

 Classification of examples is positive (T) or negative (F)

Decision trees
 One possible representation for hypotheses
 E.g., here is the “true” tree for deciding whether to wait:
Choosing an attribute
 Idea: a good attribute splits the examples into subsets
that are (ideally) "all positive" or "all negative"

 Patrons? is a better choice


Attribute Selection Measure:
Information Gain (C4.5)
 Select the attribute with the highest information gain
 Let pi be the probability that an arbitrary tuple in D belongs
to class Ci, estimated by |Ci, D|/|D|
 Expected information (entropy) needed to classify a tuple in
D: m
I ( D )    pi log2 ( pi )
i 1

 Information needed (after using A to split D into v partitions)

v |D |
to classify D:
Info A ( D)  
j
 I (D j )
j 1 | D |
 Information gained by branching on attribute A
Gain(A)  Info(D)  Info A(D)
Information Gain
For the training set, p = n = 6, I(6/12, 6/12) = 1 bit
Consider the attributes Patrons and Type (and others too):
2 4 6 2 4
IG( Patrons )  1  [ I (0,1)  I (1,0)  I ( , )]  .0541 bits
12 12 12 6 6
2 1 1 2 1 1 4 2 2 4 2 2
IG(Type )  1  [ I ( , )  I ( , )  I ( , )  I ( , )]  0 bits
12 2 2 12 2 2 12 4 4 12 4 4
Patrons has the highest IG of all attributes and so is chosen by the
decision tree algorithm as the root
One typical decision tree
 Decision tree learned from the 12 examples:

 Substantially simpler than “true” tree---a more complex

hypothesis isn’t justified by small amount of data
Decision Tree Based Classification
 Advantages:
 Easy to construct/implement
 Extremely fast at classifying unknown records
 Models are easy to interpret for small-sized trees
 Accuracy is comparable to other classification
techniques for many simple data sets
 Disadvantages
 Computationally expensive to train
 Some decision trees can be overly complex that do not generalise the data well.
 Overfitting: A decision tree may overfit the trarning data and give wrong testing
results.
Support vector machine
 Support vector machine (SVM) was invented by V. Vapnik and his co-
workers in 1970s in Russia. SVM is one of the most popular Supervised
Learning algorithms, which is used for Classification as well as
Regression problems.
 SVMs are linear classifiers that find a hyperplane to separate two class
of data, positive and negative.
 Kernel functions are used for nonlinear separation.
 SVM not only has a rigorous theoretical foundation, but also performs
classification more accurately than most other methods in applications,
especially for high dimensional data.
 It is perhaps the best classifier for text classification. SVM also be
applied in classification of images, satellite data, etc.
Basic concepts
 Let the set of training examples D be
{(x1, y1), (x2, y2), …, (xr, yr)},
where xi = (x1, x2, …, xn) is an input vector in a
real-valued space X  Rn and yi is its class label
(output value), yi  {1, -1}.
1: positive class and -1: negative class.
 SVM finds a linear function of the form (w: weight
vector), f(x) = w  x + b, which is called a
Support vector machine.
 1 if  w  x i   b  0
yi  
 1 if  w  x i   b  0
The hyperplane
 The hyperplane that separates positive and negative
training data is
w  x + b = 0
 It is also called the decision boundary (surface).
 So many possible hyperplanes, which one to choose?
An example: two-class problem

Class 2
 Many decision
boundaries can
separate these two
classes
 Which one should
Class 1
we choose?
Bad Decision Boundaries

Class 2 Class 2

Class 1 Class 1

SVM looks for the separating hyperplane with the largest

margin.
Optimal decision boundary:
margin should be maximized
 The decision boundary should be as far away from the
data of both classes as possible 2
m
 We should maximize the margin, m w.w
Support vectors
datapoints that the
margin pushes up
against
Class 2

The maximum margin linear

classifier is the linear classifier
Class 1
m with the maximum margin.
This is the simplest kind of
SVM (Called an Linear SVM)
The Optimization Problem
 Let {x1, ..., xn} be our data set and let yi  {1,-1} be
the class label of xi
 The decision boundary should classify all points
correctly A constrained optimization problem

yi ( w  x i   b  1, i  1, 2, ..., r summarizes
w  xi + b  1 for yi = 1
w  xi + b  -1 for yi = -1.
Lagrangian of Original Problem

 The Lagrangian is Lagrangian multipliers

 Note that ||w||2 = wTw

 Setting the graient of w.r.t. w and b to zero, we have

i0
The Dual Optimization Problem
 We can transform the problem to its dual Dot product of X

’s  New variables

(Lagrangian multipliers)
 This is a convex quadratic programming (QP) problem
 Global maximum of  can always be found
i

well established tools for solving this optimization problem

(e.g. cplex)
 Note:
A Geometrical Interpretation
Class 2

Support vectors
8=0.6 10=0
’s with values
7=0 different from zero
2=0 (they hold up the
5=0
separating plane)!
1=0.8
4=0
6=1.4
9=0
3=0
Class 1
Non-Linear SVM
 How could we generalize this procedure to non-linear data?

 Vapnik in 1992 showed that transforming input data xi into a higher

dimensional makes the problem easier.

 We know that data appears only as dot products (xi∙xj)

 Suppose we transform the data to some (possibly infinite

dimensional) space H via a mapping function Φ such that the
data appears of the form Φ(xi)Φ(xj)

 Why?
 Linear operation in H is equivalent to non-linear operation in

input space.
Non-linear SVMs: Feature Space
General idea: the original input space (x) can be mapped to some higher-
dimensional feature space (φ(x) )where the training set is separable:

x=(x1,x2) 2x1x2

Φ: x → φ(x)

φ(x) =(x12,x22,2x1x2)
x22
x12
If data are mapped into higher a space of sufficiently high dimension,
then they will in general be linearly separable;
N data points are in general separable in a space of N-1 dimensions or
more!!!
Choosing the Kernel Function
 Probably the most tricky part of using SVM.
 The kernel function is important because it creates the kernel
matrix, which summarizes all the data
 Many principles have been proposed (diffusion kernel, Fisher
kernel, string kernel, …)
 There is even research to estimate the kernel matrix from
available information
 In practice, a low degree polynomial kernel or RBF kernel with a
reasonable width is a good initial try
 Note that SVM with RBF kernel is closely related to RBF neural
networks, with the centers of the radial basis functions
automatically chosen for SVM
Applications of SVMs
 Bioinformatics
 Machine Vision
 Text Categorization
 Handwritten Character Recognition
 Time series analysis
Lots of very successful applications!!!
Unsupervised Learning
 Supervised learning: discover patterns in the
data that relate data attributes with a target
(class) attribute.
 These patterns are then utilized to predict the values
of the target attribute in future data instances.
 Unsupervised learning: The data have no target
attribute.
 We want to explore the data to find some intrinsic
structures (hidden knowledge) in them.
Clustering
 Clustering is a technique for finding similarity groups in data,
called clusters. I.e.,
 it groups data instances that are similar to (near) each other in
one cluster and data instances that are very different (far
away) from each other into different clusters.
 Clustering is often called an unsupervised learning task
as no class values denoting an a priori grouping of the data
instances are given, which is the case in supervised learning.
 Clustering is one of the most utilized data mining
techniques. It has a long history, and used in almost every
field, e.g., medicine, psychology, botany, sociology, biology,
archeology, marketing, insurance, libraries, etc.
What is clustering for?
 Let us see some real-life examples
 Example 1: groups people of similar sizes together to make
“small”, “medium” and “large” T-Shirts.
 Example 2: In marketing, segment customers according to
their similarities, to do targeted marketing. Help marketers
discover distinct groups in their customer bases
 Example 3: Given a collection of text documents, we want
to organize them according to their content similarities, to
produce a topic hierarchy.
 In recent years, due to the rapid increase of online documents, text
clustering becomes important.
What Is a Good Clustering?
 A good clustering method will produce clusters
with
 High intra-class similarity
 Low inter-class similarity
 Minimal domain knowledge required to determine input
parameters
 Discovery of clusters with arbitrary shape
 Ability to deal with noise and outliers
 Interpretability and usability
Similarity and Dissimilarity
Between Objects: distance metrics
• Minkowski distance
Xj = (xj1, xj2, …, xjp)
q q q
d (i, j )  q
xi1  x j1  xi 2  x j 2  ...  xip  x jp dij = ?

Xi = (xi1, xi2, …, xip)

• Euclidean distance
q = 2 d (i, j )  xi1  x j1 2  xi 2  x j 2 2  ...  xip  x jp 2

• Manhattan distance
q=1 d (i, j )  xi1  x j1  xi 2  x j 2  ...  xip  x jp
When to use what distance
• The choice of distance measure should be based on
the particular application : What sort of similarities
would you like to detect?
• Euclidean distance – takes into account the magnitude
of the differences of the expression levels.
• In many case it is necessary to normalize and/or
standardize genes or arrays in order to compare the
amount of variation of two different genes or arrays
from their respective central locations.
Notion of a Cluster can be Ambiguous

Six Clusters
How many clusters?

Two Clusters Four Clusters

K-means clustering algorithm
 Partitioning method: Construct a partition of a database
D of n objects into a set of k clusters
 Global optimal: exhaustively enumerate all partitions
 Heuristic methods: e.g. k-means algorithms (MacQueen, 1967), where each
cluster is represented by the center of the cluster.

 Given k, the k-means algorithm consists of four steps:

 Select initial centroids at random.
 Assign each object to the cluster with the nearest centroid.
 Compute each centroid as the mean of the objects assigned to it.
 Repeat previous 2 steps until no change.
K-means clustering algorithm
 Example
10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

10 10

9 9

8 8

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

51
Weaknesses of k-means
 The algorithm is only applicable if the mean is
defined.
 For categorical data, k-mode - the centroid is

represented by most frequent values.

 The user needs to specify k.
 The algorithm is sensitive to outliers
 Outliers are data points that are very far away

from other data points.

 Outliers could be errors in the data recording or

some special data points with very different values.

Weaknesses of k-means: Problems with
outliers
Weaknesses of k-means
 The k-means algorithm is not suitable for discovering
clusters that are not hyper-ellipsoids (or hyper-spheres).
Some Comments
• Despite weaknesses, k-means is still the most popular
algorithm due to its simplicity, efficiency, and other
clustering algorithms also have their own lists of
weaknesses.
• No clear evidence that any other clustering algorithm
performs better in general although they may be more
suitable for some specific types of data or applications.
• Clustering methods are descriptive techniques, not
interpretative let alone predictive
“It is a long way from clustering genes to
finding their functional roles and moreover, to
understanding the underlying biological process”

(eBook PDF) Programming Language Pragmatics, 4th Editionpdf download
100% (4)
(eBook PDF) Programming Language Pragmatics, 4th Editionpdf download
58 pages
Chapter Six Machine Learning
No ratings yet
Chapter Six Machine Learning
39 pages
AI notes Week 11
No ratings yet
AI notes Week 11
68 pages
Model#MVWX655DW0 Washer W10677676 - Tech Sheet
50% (2)
Model#MVWX655DW0 Washer W10677676 - Tech Sheet
22 pages
8df79 en DiSEqC For Technicians
100% (1)
8df79 en DiSEqC For Technicians
12 pages
EE216 Electircal Engineering
100% (1)
EE216 Electircal Engineering
2 pages
Beginning Spring Boot 3 2nd Edition Siva Prasad Reddy Katamreddy [K. Siva Prasad Reddy] download
No ratings yet
Beginning Spring Boot 3 2nd Edition Siva Prasad Reddy Katamreddy [K. Siva Prasad Reddy] download
50 pages
CSC-325-AI-Lecture07-Supervised-Learning
No ratings yet
CSC-325-AI-Lecture07-Supervised-Learning
59 pages
Final Report Cloud Classroom With E Learning System
No ratings yet
Final Report Cloud Classroom With E Learning System
22 pages
Fraud Beginner Guide (1)
33% (3)
Fraud Beginner Guide (1)
10 pages
2021 Lecture10 BasicML
No ratings yet
2021 Lecture10 BasicML
76 pages
Learning
No ratings yet
Learning
51 pages
Learning AI
No ratings yet
Learning AI
34 pages
Chapter 02_DM tasks_Part I_Classification
No ratings yet
Chapter 02_DM tasks_Part I_Classification
58 pages
logcat_1731540495781
No ratings yet
logcat_1731540495781
16 pages
Nozzle Clumping Detection Bambu Lab Wiki
No ratings yet
Nozzle Clumping Detection Bambu Lab Wiki
1 page
Inbound 2346734632581697743
No ratings yet
Inbound 2346734632581697743
18 pages
Database IMPLEMENTATION TOOLS
No ratings yet
Database IMPLEMENTATION TOOLS
54 pages
Lapois PDF
No ratings yet
Lapois PDF
110 pages
Basics of Machine Learning and Classifications: Dr. Helal Uddin Ahmed
No ratings yet
Basics of Machine Learning and Classifications: Dr. Helal Uddin Ahmed
18 pages
Unit-2 Material (1)
No ratings yet
Unit-2 Material (1)
52 pages
Module 3_ Machine Learning Algorithms
No ratings yet
Module 3_ Machine Learning Algorithms
17 pages
C-9302C Interposing Relay Module Issue4.03
No ratings yet
C-9302C Interposing Relay Module Issue4.03
2 pages
JMET 2006 Section 2 Solution
No ratings yet
JMET 2006 Section 2 Solution
5 pages
异质图的社会化推荐00995
No ratings yet
异质图的社会化推荐00995
9 pages
XR07CX - Level 2
No ratings yet
XR07CX - Level 2
4 pages
"3 GB Cul and 4 GB Cul": Unlimited Broadband Combo (Data + Voice) Plans
No ratings yet
"3 GB Cul and 4 GB Cul": Unlimited Broadband Combo (Data + Voice) Plans
1 page
3. Decision Tree -1.Pptx
No ratings yet
3. Decision Tree -1.Pptx
31 pages
Socomec Price
No ratings yet
Socomec Price
6 pages
Introduction to AI
No ratings yet
Introduction to AI
51 pages
Supervised Learning Part1
No ratings yet
Supervised Learning Part1
42 pages
BTech 2022
No ratings yet
BTech 2022
18 pages
05 Classification Part1
No ratings yet
05 Classification Part1
35 pages
Fortigate 80F Series: Data Sheet
No ratings yet
Fortigate 80F Series: Data Sheet
6 pages
ML-Lec-06-Supervised Learning-Decision Trees
No ratings yet
ML-Lec-06-Supervised Learning-Decision Trees
45 pages
AE8006
No ratings yet
AE8006
1 page
Funeral
No ratings yet
Funeral
6 pages
Introduction To ML
No ratings yet
Introduction To ML
31 pages
Supervised Learning
No ratings yet
Supervised Learning
41 pages
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
No ratings yet
Machine Learning: Mona Leeza Email: Monaleeza - Bukc@bahria - Edu.pk
60 pages
1. Machine Learning - Introduction
No ratings yet
1. Machine Learning - Introduction
138 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
1. Machine Learning - Introduction
No ratings yet
1. Machine Learning - Introduction
73 pages
Paper Trading Stock Stimulator
No ratings yet
Paper Trading Stock Stimulator
5 pages
Configure A VPN Connection
No ratings yet
Configure A VPN Connection
1 page
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
MQTT Protocol: Lecturer: Dr. Bui Ha Duc Dept. of Mechatronics Email: Ducbh@hcmute - Edu.vn
No ratings yet
MQTT Protocol: Lecturer: Dr. Bui Ha Duc Dept. of Mechatronics Email: Ducbh@hcmute - Edu.vn
16 pages
Topic 1.0 Indices Surds and Logarithms - Nota Pelajar
No ratings yet
Topic 1.0 Indices Surds and Logarithms - Nota Pelajar
68 pages
Machine
No ratings yet
Machine
61 pages
Stock Market Forecasting Using Deep Learning and Technical Analysis A Systematic Review
No ratings yet
Stock Market Forecasting Using Deep Learning and Technical Analysis A Systematic Review
11 pages
Data Mining Classification Algorithms: Credits: Padhraic Smyth
No ratings yet
Data Mining Classification Algorithms: Credits: Padhraic Smyth
54 pages
Artificial Intelligence: Slide 6
100% (1)
Artificial Intelligence: Slide 6
42 pages
Nmap
No ratings yet
Nmap
20 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
01 - ML - Introduction (1)
No ratings yet
01 - ML - Introduction (1)
65 pages
Bike Buyer Prediction Using Classification Algorithm
No ratings yet
Bike Buyer Prediction Using Classification Algorithm
19 pages
2015 05 08 16 57 05
No ratings yet
2015 05 08 16 57 05
4 pages
SPSS With Job Description Data
No ratings yet
SPSS With Job Description Data
48 pages
Machine Learning
No ratings yet
Machine Learning
9 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
68 pages
Lect6 PDF
No ratings yet
Lect6 PDF
66 pages
2013-03-15 "Joule Thief" Powered by .040 V Thermocouple - RustyBolt - Info - Wordpress
No ratings yet
2013-03-15 "Joule Thief" Powered by .040 V Thermocouple - RustyBolt - Info - Wordpress
1 page
Unit 4 Classification
No ratings yet
Unit 4 Classification
87 pages
Cisco Meraki - ECMS1 Solutions Manual 2
No ratings yet
Cisco Meraki - ECMS1 Solutions Manual 2
38 pages
Lecture 06 Part A - Macine Learning
No ratings yet
Lecture 06 Part A - Macine Learning
77 pages
PAN India - MR Companies
No ratings yet
PAN India - MR Companies
20 pages
Dallas Top PDF Edgewater Avenue
No ratings yet
Dallas Top PDF Edgewater Avenue
8 pages
Segmentation
100% (1)
Segmentation
51 pages
Lecture 4: Divide and Conquer: Van Emde Boas Trees
No ratings yet
Lecture 4: Divide and Conquer: Van Emde Boas Trees
7 pages
Android Application For Crop Yield Prediction and Crop Disease Detection
No ratings yet
Android Application For Crop Yield Prediction and Crop Disease Detection
4 pages
Ant Colony Optimization Algorithms: Interaction Toolbox Print/export Languages
No ratings yet
Ant Colony Optimization Algorithms: Interaction Toolbox Print/export Languages
25 pages
49 Machine Learning
No ratings yet
49 Machine Learning
300 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Object Detection and Avoidance in Unmanned Ground Vehicle Using Arduino1
No ratings yet
Object Detection and Avoidance in Unmanned Ground Vehicle Using Arduino1
4 pages
Content-Based Image Retrieval Using Deep Learning
No ratings yet
Content-Based Image Retrieval Using Deep Learning
44 pages
Biomatrics
No ratings yet
Biomatrics
39 pages
Convex Hull Algorithms
No ratings yet
Convex Hull Algorithms
4 pages
The Automotive Standard ISO 26262 The Innovative D
No ratings yet
The Automotive Standard ISO 26262 The Innovative D
9 pages
Morphological PCB
No ratings yet
Morphological PCB
5 pages
Study of Van Emde Boas Tree With Application To Dijkstra: Advanced Problem Solving
No ratings yet
Study of Van Emde Boas Tree With Application To Dijkstra: Advanced Problem Solving
16 pages
Twin Technology
No ratings yet
Twin Technology
19 pages
Segmentation and Object Recognition Using Edge Detection Techniques
No ratings yet
Segmentation and Object Recognition Using Edge Detection Techniques
9 pages
Neural Networks
No ratings yet
Neural Networks
13 pages
Presentation From December 12, 2000 Dinner Meeting
No ratings yet
Presentation From December 12, 2000 Dinner Meeting
26 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Van Emde Boas Trees
No ratings yet
Van Emde Boas Trees
5 pages
Random Forest
No ratings yet
Random Forest
18 pages
AHDAdv Cust Guide
No ratings yet
AHDAdv Cust Guide
361 pages
7 - Classification
No ratings yet
7 - Classification
71 pages

CCST9017 (2023-24lecture11printed Version) MachineLearning

Uploaded by

CCST9017 (2023-24lecture11printed Version) MachineLearning

Uploaded by

CCST9017

Hidden Order in Daily Life:

Dr. Zhiwen Zhang

 “Learning denotes changes in a system that ... enable a system to do

Machine learning is primarily concerned with the accuracy

 Unsupervised Learning: Only data is given, no labels

 Semi-supervised Learning: Some (if not all) labels

 Reinforcement Learning: An agent interacting with

 Data: A set of data records (also called

 Learn a classification model from the data

Decision nodes and leaf nodes (classes)

Finding the best tree is

 The key to building a decision tree - which attribute

 E.g., situations where I will/won't wait for a table:

 Classification of examples is positive (T) or negative (F)

 Patrons? is a better choice

 Information needed (after using A to split D into v partitions)

 Substantially simpler than “true” tree---a more complex

SVM looks for the separating hyperplane with the largest

The maximum margin linear

 The Lagrangian is Lagrangian multipliers

 Note that ||w||2 = wTw

’s  New variables

well established tools for solving this optimization problem

 Vapnik in 1992 showed that transforming input data xi into a higher

 We know that data appears only as dot products (xi∙xj)

 Suppose we transform the data to some (possibly infinite

Xi = (xi1, xi2, …, xip)

Two Clusters Four Clusters

 Given k, the k-means algorithm consists of four steps:

represented by most frequent values.

from other data points.

some special data points with very different values.

You might also like