0% found this document useful (0 votes)

6 views

Week 9 Lecture - Revision Test-dual-translated

The document outlines key concepts in regression analysis, including cost functions, regularization, and types of logistic regression. It also explains Principal Component Analysis (PCA) for dimensionality reduction and details the Random Forest classification algorithm. Additionally, it highlights the importance of model complexity and generalization in predictive modeling.

Uploaded by

599146824

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Week 9 Lecture - Revision Test-dual-translated

Uploaded by

599146824

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

Week 9

KQC7015 Test
Date : 22 December 2024
Time : 1.00 – 2.00 afternoon (1 hour)
Venue : Block Y, Department OF Electrical Engineering
What is C

Like most statistical models, regression

seeks to minimize a cost function.

So let’s first start by thinking about what a

cost function is.

A cost function tries to measure how wrong

you are.

So if my prediction was right then there

should be no cost, if I am just a tiny bit
wrong there should be a small cost.
Lambda (λ) controls the
trade-off between allowing
the model to increase it's
Parameter C = 1/λ.
complexity as much as it
wants with trying to keep it
simple.
Tuning
parameters for For example, if λ is very low
regression or 0, the model will have
enough power to increase
If we increase the value of
λ, the model will tend to
it's complexity (overfit) by underfit, as the model will
assigning big values to the become too simple.
weights for each parameter.

Parameter C will work the

For big values of C, we
other way around. For small
reduce the power of
values of C, we increase the
regularization which implies
regularization strength
that the model is allowed to
which will create simple
increase it's complexity, and
models which underfit the
therefore, overfit the data.
data.
Regularization helps us tune and control our model complexity,
ensuring that our models are better at making (correct) classifications
— or more simply, the ability to generalize.

If we don’t apply regularization, our classifiers can easily become too

complex and overfit to our training data, in which case we lose the
ability to generalize to our testing data (and data points outside the
testing set as well).
 Similarly, without applying regularization we also run the risk
of underfitting.

 In this case, our model performs poorly on the training our — our classifier
is not able to model the relationship between the input data and the
output class labels.
Types of Logistic Regression
1. Binary • The categorical response has only two 2
Logistic possible outcomes.
• Example: Spam or Not
Regression

2. Multinomial • Three or more categories without ordering.

• Example: Predicting which food is preferred
Logistic more (very healthy, non-healthy, moderate
Regression healthy)

3. Ordinal
• Three or more categories with ordering.
Logistic • Example: Movie rating from 1 to 5
Regression
EXAMPLES :

Spam Detection : Predicting if an email is Spam or not

Credit Card Fraud : Predicting if a given credit card transaction is fraud or
not
Health : Predicting if a given mass of tissue is benign or malignant
Marketing : Predicting if a given user will buy an insurance product or not
Banking : Predicting if a customer will default on a loan.
Principle Component Analysis (PCA)

 A dimensionality-reduction method by transforming a large set of variables into

a smaller one that still contains most of the information in the large set.
 Reducing the number of variables → at the expense of accuracy,
 to trade a little accuracy for simplicity.
 smaller data sets are easier to explore and visualize
 easier to analyze data and faster for machine learning algorithms

11
HOW DO YOU DO A PCA?

1. Standardize the range of continuous initial variables

2. Compute the covariance matrix to identify correlations
3. Compute the eigenvectors and eigenvalues of the covariance matrix
to identify the principal components
4. Create a feature vector to decide which principal components to keep
5. Recast the data along the principal components axes

12
1. Standardization
❖ Standardize the range of the continuous initial variables → contributes equally to the
analysis.
❖ Why it is critical to perform standardization prior to PCA, is that the latter is quite sensitive
regarding the variances of the initial variables.
❖ If there are large differences between the ranges of initial variables, those variables with
larger ranges will dominate over those with small ranges
❖ (For example, a variable that ranges between 0 and 100 will dominate over a variable that
ranges between 0 and 1), which will lead to biased results.
❖ Mathematically, this can be done by subtracting the mean and dividing by the standard
deviation for each value of each variable.

𝑉𝐴𝐿𝑈𝐸 −𝑀𝐸𝐴𝑁
❖ Z=
𝑆𝑇𝐴𝑁𝐷𝐴𝑅𝐷 𝐷𝐸𝑉𝐼𝐴𝑇𝐼𝑂𝑁

❖ Once the standardization is done, all the variables will be transformed to the same scale.
13
2. COVARIANCE MATRIX COMPUTATION

➢ To understand how the variables of the input data set are varying from the
mean with respect to each other
➢ sometimes, variables are highly correlated ( contain redundant
information).

➢ To identify these correlations → compute the covariance matrix.

➢ Covariance matrix is a p × p symmetric matrix
(where p is the number of dimensions)

➢ For example, for a 3-dimensional data set with 3 variables x, y, and z, the
covariance matrix is a 3×3 matrix of this from:

14
Since the covariance of a variable with itself is its variance (Cov(a,a)=Var(a)), in the
main diagonal (Top left to bottom right) we actually have the variances of each initial
variable.
And since the covariance is commutative (Cov(a,b)=Cov(b,a)), the entries of the
covariance matrix are symmetric with respect to the main diagonal, which means that
the upper and the lower triangular portions are equal.

What do the covariances that we have as entries of the matrix tell us about the
correlations between the variables?

It’s actually the sign of the covariance that matters :

•if positive then : the two variables increase or decrease together (correlated)
•if negative then : One increases when the other decreases (Inversely correlated)

15
3. COMPUTE THE EIGENVECTORS AND EIGENVALUES OF THE
COVARIANCE MATRIX TO IDENTIFY THE PRINCIPAL
COMPONENTS

 Eigenvectors and eigenvalues → compute from the covariance

matrix → principal components of the data.

 Principal components
▪ new variables - linear combinations or mixtures of the initial
variables. are uncorrelated and most of the information within the
initial variables is squeezed or compressed into the first
components.
▪ So, the idea is 10-dimensional data gives you 10 principal
components, but PCA tries to put maximum possible information in
the first component, then maximum remaining information in the
second and so on.
16
Reduce dimensionality without losing much information,
by discarding the components with low information
and considering the remaining components as your
new variables.

Principal components are less interpretable and don’t

have any real meaning since they are constructed as
linear combinations of the initial variables.

PCs represent the directions of the data that explain

a maximal amount of variance - the lines that capture
most information of the data.

The larger the variance carried by a line, the larger

the dispersion of the data points along it,
and the larger the dispersion along a line, the more
the information it has. 17
 The first principal component accounts for
the largest possible variance in the data set.

 For example, let’s assume that the scatter plot of

our data set is as shown, can we guess the first
principal component ?
 Yes, it’s approximately the line that matches the
purple marks because it’s the line in which the
projection of the points (red dots) is the most
spread out.

 Or mathematically speaking, it’s the line that

maximizes the variance (the average of the
squared distances from the projected points (red
dots) to the origin).

18
 2nd PC → with the condition that it is uncorrelated with (i.e.,
perpendicular to) the 1st PC and that it accounts for the next highest
variance.

 This continues until a total of p principal components have been

calculated, equal to the original number of variables.

 Every eigenvector has an eigenvalue. And their number is equal to the

number of dimensions of the data.

 For example, for a 3-dimensional data set, there are 3 variables,

therefore there are 3 eigenvectors with 3 corresponding eigenvalues.

19
 The eigenvectors of the Covariance matrix are
actually the directions of the axes where there is the most
variance(most information) and that we call PC.

 And eigenvalues are simply the coefficients attached to

eigenvectors, which give the amount of variance carried in each PC.

 By ranking your eigenvectors in order of their eigenvalues, highest

to lowest, you get the PCs in order of significance.

20
4: FEATURE VECTOR
i. Choose whether to keep all these components or discard those of lesser
significance (of low eigenvalues),

ii. Form with the remaining ones a matrix of vectors that we call Feature vector.

iii. Feature vector is simply a matrix that has as columns the eigenvectors of the
components

iv. This makes it the first step towards dimensionality reduction, because if we
choose to keep only p eigenvectors (components) out of n, the final data set
will have only p dimensions.
21
22
5. RECAST THE DATA ALONG THE PRINCIPAL COMPONENTS
AXES

❖ In the previous steps, you just select the PCs and form the feature vector,
but the input data set remains always in terms of the original axes.

❖ In this step, which is the last one, the aim is to use the feature vector
formed using the eigenvectors of the covariance matrix, to reorient the
data from the original axes to the ones represented by the PC.

❖ This can be done by multiplying the transpose of the original data set by
the transpose of the feature vector.

 FinalDataSet = FeatureVector𝑇 * 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑖𝑧𝑒𝑑𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙𝐷𝑎𝑡𝑎𝑆𝑒𝑡𝑇

23
2-D convolution
Convolution in higher dimensions
Pooling and invariance
Max pooling provides invariance to small shifts of
the input tensor
This New
value value
fallen off added
after
after
shift
shift
Stride and tiling
Max Pooling Layer

It is a sample based
Max pooling layer helps reduce discretization process. It is
the spatial size of the convolved similar to the convolution layer
features and also helps reduce but instead of taking a dot
over-fitting by providing an product between the input and
abstracted representation of the kernel we take the max of
them. the region from the input
overlapped by the kernel.

Below is an example which

shows a maxpool layer’s
operation with a kernel having
size of 2 and stride of 1.
Max pooling step — 2
Random forest method

 The random forest is a classification algorithm consisting of many decisions trees.

 It uses feature randomness when building each individual tree to try to create an uncorrelated forest of trees whose
prediction by committee is more accurate than that of any individual tree.
 Random forest is created by randomly splitting the data.

 Each decision tree is formed using feature selection indicators like information gain, gain
ratio of each feature.

 Each tree is dependent on an independent sample.

 Considering it to be a classification problem, then each tree computes votes and the highest
votes class is chosen.

 If its regression, the average of all the tree's outputs is declared as the result.
Assumptions for Random Forest

 There should be some actual values in the feature variable of

the dataset so that the classifier can predict accurate results
rather than a guessed result.

 The predictions from each tree must have very low

correlations.
Random Forest works in two-phases:
1) first is to create the random forest by combining N decision tree,
2) second is to make predictions for each tree created in the first phase.

The Working process can be explained in the below steps and diagram:

 Step-1: Select random K data points from the training set.

 Step-2: Build the decision trees associated with the selected data points
(Subsets).
 Step-3: Choose the number N for decision trees that you want to build.
 Step-4: Repeat Step 1 & 2.

 Step-5: For new data points, find the predictions of each decision tree, and
assign the new data points to the category that wins the majority votes.
Four sectors where Random forest mostly used:

Land Marketing
Banking: Medicine:
Use: :

Banking With the

sector help of this
We can Marketing
mostly uses algorithm,
identify the trends can
this disease
areas of be
algorithm trends and
similar land identified
for the risks of the
use by this using this
identificatio disease can
algorithm. algorithm.
n of loan be
risk. identified.

Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
The Restaurant Management System Using
89% (96)
The Restaurant Management System Using
123 pages
L 10 Principal Component Analysis 09052024 072206pm
No ratings yet
L 10 Principal Component Analysis 09052024 072206pm
37 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
ACPusingR
No ratings yet
ACPusingR
25 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Lecture 9 - Data Reduction
No ratings yet
Lecture 9 - Data Reduction
36 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Multivariate Statistics Principal Component Analysis (PCA)
No ratings yet
Multivariate Statistics Principal Component Analysis (PCA)
41 pages
Unit 4 Part 2
No ratings yet
Unit 4 Part 2
17 pages
PCA
100% (1)
PCA
33 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
Dimensionality Reduction by Pca: Non - Feasible
No ratings yet
Dimensionality Reduction by Pca: Non - Feasible
26 pages
RES805-RM-Module 2
No ratings yet
RES805-RM-Module 2
26 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
W4.2 DataPreProcessing-PCA (1)
No ratings yet
W4.2 DataPreProcessing-PCA (1)
22 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Steps for PCA
No ratings yet
Steps for PCA
5 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
PCA_Notes
No ratings yet
PCA_Notes
3 pages
1501589578da-mod15-Q1-e-text
No ratings yet
1501589578da-mod15-Q1-e-text
9 pages
Unit-3
No ratings yet
Unit-3
28 pages
DR Pca
No ratings yet
DR Pca
22 pages
16. Principal Component Analysis
No ratings yet
16. Principal Component Analysis
27 pages
Pac
No ratings yet
Pac
70 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
10_Autoencoders
No ratings yet
10_Autoencoders
42 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
79 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
Presentation a i Std 2
No ratings yet
Presentation a i Std 2
63 pages
Chapter2 PCA
No ratings yet
Chapter2 PCA
65 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
11668a5f867641748200d0bfd6a889a3_hst951_7
No ratings yet
11668a5f867641748200d0bfd6a889a3_hst951_7
32 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Multivariate
100% (1)
Multivariate
78 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Principal Components Analysis (PCA) : 2.1 Outline of Technique
No ratings yet
Principal Components Analysis (PCA) : 2.1 Outline of Technique
21 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
No ratings yet
FALLSEM2023-24 - ITE2011 - ETH - VL2023240102356 - 2023-09-01 - Reference-Material-I (3 Files Merged)
191 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Dimensionality Reduction (Pca)
No ratings yet
Dimensionality Reduction (Pca)
32 pages
Basic Theory
No ratings yet
Basic Theory
4 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
09_PCA
No ratings yet
09_PCA
19 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Script Kiddie - HackTheBox Walk Through - IT SECURITY DZ
No ratings yet
Script Kiddie - HackTheBox Walk Through - IT SECURITY DZ
9 pages
A Demonstration On: "Deadlocks"
No ratings yet
A Demonstration On: "Deadlocks"
12 pages
Bit Error Rate (Ber) Comparison in Awgn Channel of BFSK and QPSK Using Matlab and Simulink
No ratings yet
Bit Error Rate (Ber) Comparison in Awgn Channel of BFSK and QPSK Using Matlab and Simulink
5 pages
Upgrade Path To Become A Tech Partner On The Whatsapp Business Platform
No ratings yet
Upgrade Path To Become A Tech Partner On The Whatsapp Business Platform
10 pages
Data Structure Record
No ratings yet
Data Structure Record
94 pages
Practical PHP and MySQL Website Databases 1st Edition Adrian W. West - The latest updated ebook version is ready for download
100% (2)
Practical PHP and MySQL Website Databases 1st Edition Adrian W. West - The latest updated ebook version is ready for download
47 pages
Unit Iii Linear Data Structures: Syllabus
No ratings yet
Unit Iii Linear Data Structures: Syllabus
19 pages
Performance and Scalability Numbers For All Switch Models: - Extreme Standby Router Protocol ™ (ESRP)
No ratings yet
Performance and Scalability Numbers For All Switch Models: - Extreme Standby Router Protocol ™ (ESRP)
5 pages
Gir 3000 Manual
No ratings yet
Gir 3000 Manual
44 pages
CEAT Question Bank
No ratings yet
CEAT Question Bank
8 pages
Be Project Report Format
No ratings yet
Be Project Report Format
58 pages
Pre School Syllabus Outline Session 25-26 (1)
No ratings yet
Pre School Syllabus Outline Session 25-26 (1)
2 pages
Noida Institute of Engineering and Technology
No ratings yet
Noida Institute of Engineering and Technology
47 pages
Module - 4
No ratings yet
Module - 4
106 pages
Vikas-Malleshappa-QA Resume
No ratings yet
Vikas-Malleshappa-QA Resume
5 pages
Lec 5
No ratings yet
Lec 5
34 pages
Ya Tawwab Tub Alaina - Google Search
No ratings yet
Ya Tawwab Tub Alaina - Google Search
1 page
Rossmoyne SHS Year 8 Final 2024
No ratings yet
Rossmoyne SHS Year 8 Final 2024
4 pages
4.Sv Object Oriented Programming
No ratings yet
4.Sv Object Oriented Programming
35 pages
Photoshop Training: by Brandy Shelton EDU 749
No ratings yet
Photoshop Training: by Brandy Shelton EDU 749
9 pages
AD1884 Analog Devices
No ratings yet
AD1884 Analog Devices
16 pages
MT6765 Android Scatter
No ratings yet
MT6765 Android Scatter
15 pages
The Science of Project Management: Project Controls Systems Integration
No ratings yet
The Science of Project Management: Project Controls Systems Integration
9 pages
Technical Report On The SS7 Vulnerabilities and Their Impact On DFS Transactions F 1 1
No ratings yet
Technical Report On The SS7 Vulnerabilities and Their Impact On DFS Transactions F 1 1
36 pages
D.A.V. Group of Schools: Common Periodic Test Ii - 2022-2023
No ratings yet
D.A.V. Group of Schools: Common Periodic Test Ii - 2022-2023
3 pages
CSC 201 Lecture Note
No ratings yet
CSC 201 Lecture Note
59 pages
Sample Survey Questionnaire
100% (1)
Sample Survey Questionnaire
8 pages
Bills _ Billing _ Global
No ratings yet
Bills _ Billing _ Global
10 pages
Project Plan For: Communitywide Health Insurance Customer Management System
No ratings yet
Project Plan For: Communitywide Health Insurance Customer Management System
5 pages

Week 9 Lecture - Revision Test-dual-translated

Uploaded by

Week 9 Lecture - Revision Test-dual-translated

Uploaded by

Week 9

Like most statistical models, regression

So let’s first start by thinking about what a

A cost function tries to measure how wrong

So if my prediction was right then there

Parameter C will work the

If we don’t apply regularization, our classifiers can easily become too

2. Multinomial • Three or more categories without ordering.

Spam Detection : Predicting if an email is Spam or not

 A dimensionality-reduction method by transforming a large set of variables into

1. Standardize the range of continuous initial variables

➢ To identify these correlations → compute the covariance matrix.

It’s actually the sign of the covariance that matters :

 Eigenvectors and eigenvalues → compute from the covariance

Principal components are less interpretable and don’t

PCs represent the directions of the data that explain

The larger the variance carried by a line, the larger

 For example, let’s assume that the scatter plot of

 Or mathematically speaking, it’s the line that

 This continues until a total of p principal components have been

 Every eigenvector has an eigenvalue. And their number is equal to the

 For example, for a 3-dimensional data set, there are 3 variables,

 And eigenvalues are simply the coefficients attached to

 By ranking your eigenvectors in order of their eigenvalues, highest

 FinalDataSet = FeatureVector𝑇 * 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑖𝑧𝑒𝑑𝑂𝑟𝑖𝑔𝑖𝑛𝑎𝑙𝐷𝑎𝑡𝑎𝑆𝑒𝑡𝑇

Below is an example which

 The random forest is a classification algorithm consisting of many decisions trees.

 Each tree is dependent on an independent sample.

 There should be some actual values in the feature variable of

 The predictions from each tree must have very low

 Step-1: Select random K data points from the training set.

Banking With the

You might also like