0% found this document useful (0 votes)

545 views

Cheat Sheet

Binning and scaling are techniques used for data discretization and normalization. Binning partitions data into bins of equal width or frequency, while scaling transforms data values into a specific range like 0-1. Common scaling methods include z-score normalization and decimal scaling. Dissimilarity matrices contain measures of distance or dissimilarity between data points, like Euclidean distance, Manhattan distance, and Minkowski distance. Entropy is a measure of randomness in data, with higher entropy indicating more random data. Information gain is used to find the optimal split point in entropy-based discretization by maximizing the difference in entropy.

Uploaded by

jelmood

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

545 views

Cheat Sheet

Uploaded by

jelmood

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Binning: Scaling: u[u0,um] to v[v0,vm] Norms: Dissimilarity matrix:

1- Sort ascending
2- Partition data:
Equal-width (interval/distance) 0-1 scaling: v=(u-u0)/(um-u0) L1 norm (p=1):
Range=Max-Min, Interval Z-score normalization:
Length(L)=Range/No.Bins v=(u-µ)/σ, z=(X-µ)/σ L2 norm (p=2):
Bins:Bin1=[min,min+L)…BinMax=[m Decimal scaling:v=u/10k Minkowsky distances:
ax-L,max)→different bin sizes k max(|v|)≤1,v∈[-1,1]
Equal-freq (equal-depth): each Manhattan (city block):
bin contains (L) samples Median = mid-point
3- Smooth: mean, median, Range = Max(X)-Min(X) Euclidean:
boundary (based on assumption)
Euclidean norm (length) Quartiles: Entropy: The more random the data, the higher
• Q2 = median information, the higher entropy, the lower
Scaling: • Q1,3 = median of the right/left half of Q2: probability
Normalization (ǁxǁ=1): x~=x/ǁxǁ o If total count is odd, we take the middle value to be
Q2, then split into right and left (excluding the Q2
value)
o If total count is even, we take the average of the
middle 2 values to be Q2. Then we split into right and
left (including those 2 values)
• Python way: Qi = i(n+1)/4 => gives the position of Qi, Entropy based discretization:
then compensate with the real value at that positions. best split τ to max info
o If the position is i.5 (or a fraction), take the average
between i and i+1 p, q: probabilities of each class
H(S)= –p log2 p – q log2 q
Empirical cumulative Sample variance:
or distribution function: Select best τ (mid-points) then split:
If xTy = 0 → orthogonal or normal vector S1: value ≤ τ, S2: value> τ
Info of τ: H(S1,S2)=(|S1|/|S|)H(S1)+(|S2|/|S|)H(S2)
Standard deviation Info gain: G(τ,S)=H(S)-H(S1,S2), find max Info gain
Kronecker delta:
where I(.) is binary ξσ2 =σ
Projection of y on x: indicator function:
ux: unit vector of x, ux=x/ǁxǁ Also: where
(n-1) for samples, (n) for populations
Bivariate Joint Distribution
Empirical Probability
Linear independence: Mass Function:
Statistical Independence: condition:
where α≠0, Orthogonal vectors are linear
(but not vice versa) Hence: (cdf) and
Probabilities: Pr(X < 5) = number of conditioned (pdf)
samples/total Mean of multivariate vector: Pearson Correlation Coefficient:
Probability Distribution Functions (Discrete):
o Probability Mass Function (pmf)
Total variance: sum of variances of X1, X2…
Linear Transformation (Eigenvectors): Given X,
Set of probabilities, so the sum = 1 µX, ƩY
Covariance:
o Cumulative Distribution Function (cdf)

Covariance Matrix:

Probability Distribution Function (Continuous):

o Cumulative Distribution Function (cdf)

o Probability Density Function (pdf) Calculate Eigenvectors

Eigen Values:
T=a+d
D=ad-bc
c≠0? b≠0? b=0 & c=0

Total:
Verify by Av=Lv
To obtain probabilities -> integrate: Generalized: Then normalize using Euclidean norm: divide
each eigenvector [a b] by
Matrix Inverse:
Ratio:
if p(x1)> p(x2), then probability that X is closer to
x1 is higher than x2.
Normal Distributions (Gaussians) Univariate Categorical Attribute Multivariate Bernoulli
X is a normal dist. if its pdf is gaussian Sample mean:
Bernoulli Variable:

pmf:
, i.e. X is has a normal dist.
Standard Normal Distribution:
Mean: Covariance between Xi, Xj:
Variance:
where E[Xi Xj] = 0 (never overlap)
Sample mean:
Sample variance:
n1: xi = 1, n0: xi = 0, n: total
z-score = 0 is the mean
Linear transformation:
p: prob. of success, K: success count Transforming x onto u = y

Joint pdf of multivariate normal RV => Number of ways (possible y=

combinations) Approximating x → x~:

Mahalanobis distance: Distance of x from The projection of onto the subspace spanned by
the mean normalized by covariance X = X1+X2…, µX= µX2 + µX1…
ui (i = 1, …, r):
PCA:
If covariance matrix is identity matrix Total variance along u: Orthogonal projection matrix (m x m)

Error vector:
Mean:
Center the data:
with m attributes.
Covariance matrix: Kernel Function
Squared norm of each point:
Number of matching symbols of 1’s: Eigenvectors:

Euclidean distance:
Hamming distance: m – s For 3x1 matrices:

Cosine similarity: Xr → projection on the same feature space

Jaccard similarity coefficient: Yr → projection on a different feature space
In this case, it is a six-dimension vector
Eigenvector of max eigenvalue → 1st pc, 2nd Sum of any 2 kernel functions is a kernel
max→2nd pc function: K=K1+K2
Mean square error = smallest eigen value / m Polynomial Kernel
Mean in feature space: Eigenvector of min eigenvalue → 1st minor
component
Quadratic kernel (p=2), for m=2:
Projection of x onto u1:
Norm of the mean: Gaussian (RBF) Kernel

Where is the squared Euclidean distance

Centering in feature space: where K: centered kernel matrix Kernel matrix is symmetric
c1: eigenvector of K
Projection of (Kj is the jth column of K): When calculating the quadratic kernel where
c=1:
The ith pc: where ◦ denotes the
Hadamard (entry-wise) product
Kernel PCA Kernel PCA Algorithm: Norms of a point:
Compute kernel matrix: K=[Kij] → Kij = K(xi,xj)
Center the matrix:
Distance in feature space:
Eigenvalue K:
where u1: principal eigenvector, Σfi Scale:
covariance in feature space
Fraction of total variance, choose r such that: K(x,y) is a similarity measure in feature space

Reduce dimensionality (r < n)

SVD LDA Fisher’s LDA: finding the w vector
Matrix X to factorize to: Finding vector w to project x on that maximizes
separation between classes ω1 and ω2 (yi = wT xi) &
mxn = mxm, mxn, nxn Between-class scatter:
U: left singular vectors , m1 = projected means of class ω1
V: right singular vectors Distance between means m1, m2 defines class
Delta (Δ): diag(singular values) (no. non- separability
zero singular values=rank of matrix X) Optimum LD vector is the dominant
Linear expansion of X using SVD: eigenvector of S-1 B → eigenvector of the
largest |λi|
B: between-class scatter
Then project data on the selected
X can be approximated by using the best (mxm rank-one)
singular values (δi) in descending order: Within-class scatter: Variances within each class eigenvector
For 2 classes only:
Ur can be used to project X onto subspace: s12, s22: sum of squared deviation of means: LDA vector , then
normalize:
LDA Algorithm Multi-class LDA
1-Join X1 and X2 (2x4 becomes 2x8) Total squared deviation from the means: Given the global mean xbar:
2-Apply → K = 8x8
3-Means: m1: mean of each X1 portion in
K rows, m2: similar, m: mean(m1,m2)
4-Between-class scatter B: Apply S: within-class scatter matrix (mxm symmetric) LDA Classifier
→ 8x8 Discriminant Functions g(x) Project x on LDA vector
Minimum Distance Classifier: find the minimum distance
5-Within-class scatter S 8x8: Apply
We take the max g(x) → i.e. the min distance
Nearest Centroid Classifier Minimum Mahalanobis Distance Classifier
Square Euclidean distance: Find the squared Mahalanobis distance
6-Compute dominant eigenvector between x and µi of class wi

Multiple classes:
Binary (if + → x belongs to w1, if - → x belongs to w2):
K-Nearest Neighbor (KNN) Classifier
7- X belongs to the class of the closest K
Bayes Rule Quadratic Discriminant Analysis (QDA) neighbors

LDA classifier is a special case: Σ1 = Σ2 = S/n (Q = 0) Distance-Weighted KNN

Rank the K-neighbors:
If Σ1 = Σ2 = σ2 I, MMDC is reduced to minimum furthest = 1, nearest = K
Log likelihood: centroid classifier gi(x) = Sum(class ωi weights)/Sum(all weights)
Multiclass Optimal classifiers for normal patterns
Prior probabilities: πi = P(ωi)
If Σi = Σ (i belongs to k) → linear discriminant function: Maximum Likelihood estimates:
Posteriori Rule

If Σi = σ2 I, P(ωi) = π, (i belongs to k) → minimum

For binary: Euclidean distance classifier

or
Binary Classification of Gaussian Patterns: Logistic Regression (Maximum
entropy/maxent)
Softmax: Multiclass logistic regression Logistic sigmoid function:
Posterior modeled with softmax function: Where

Cross-entropy error function for target

output yi=[yi1, yi2,…, yik] If Σ1 = Σ2 (Gaussian):
If → Mahalanobis
If Σ1 = Σ2 → linear classifier:
Gradient with respect to wi:
If Σ1 = Σ2 = σ2 I, P(ω1) = P(ω2) →minimum dist. classifier Given training set:
Naïve Bayes Classifier Normal pdf:

Decision trees: Sigmoid Output:

1) Extract class-specific data subset find thresholds
that maximizes Weight vector w:
info for multiple
2) 3) classes
Gradient of E(w):
4) Classify x: → More robust to outliers than least squares
Distances: Pearson Correlation Coefficient Distance
Minkowsky (Lp norm)

Single linkage: nearest neighbor method

City block (Manhattan/sum of absolute
difference)
Complete linkage: furthest neighbor method

Euclidean distance (L2 norm) Group average: unweighted and weighted

n: number of points, d

Chebychev distance (L norm) Weighted:

Centroid: Distance between p, q and a third cluster k: Hierarchical Clustering Strategies
Canberra distance Agglomerative: bottom-up approach
(most used)
Median: Distances are given equal weights Divisive: top-down (computationally
intensive)
Quadratic Distance:
Ward’s method (sum of squared errors, scatter of
the error)
Cosine

Optimization Algorithms
µi is the mean of cluster Ci
K-Means Mixture scatter matrix (T)
1-Assign two initial means
2-Cluster 2 groups based on their distances to each mean
3-Calculate the means of the new clusters → the new means
4-Repeat 2 until the new means = the old means
Buckshot Algorithm
1-Randomly select subsample with N1 objects Generalized Formula:
2-Apply group-average hierarchical clustering
3-Use the result as seeds for K-mean
Optimization criteria:
min. within-class scatter, max between-class scatter
1st criterion:

2nd criterion:

3rd criterion:

4th criterion:

Recalculate all distances when merging points using the specified method
Associations: A → B Apriori
Support: P(A,B) i.e. A&&B / total • F1 = {all 1-itemset}.Support >= minsup
• C2 = All possible 2-itemset combinations from F1
Confidence: P(A,B|A) i.e. A&&B/count(A) • F2 = C2.Support >= minsup
Minsup: minimum support • C3 = Join F2 (using same 1st elements, different last element)
Minconf: minimum confidence • F3 = Pruning C3 (downward closure): each 2-itemset combination of each C3 item must exist
in F2. Then add the 3-itemset to F3
returns Xs (items) that are common in all • Then from F2 & F3 (i.e. all F >= F2), find all possible associations (2m-2 associations), calculate
confidence, > minconf
transactions
returns
transactions that has at least one X
Support in this context is the count, e.g. {A:4, B:5}, {A,B}:4
Class Association Rule (CAR)
X →y (X: itemset, y: label) same rules apply

Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
100% (1)
Notes For 18.6501x, Fundamentals of Statistics: v0.2 (2019 April 24)
14 pages
7 Data Science / Machine Learning Cheat Sheets in One
100% (1)
7 Data Science / Machine Learning Cheat Sheets in One
9 pages
Murphy Book Solution
No ratings yet
Murphy Book Solution
100 pages
Siebel System: The Role of The Cfo: Submitted By:-Shweta Jain 09609061 SHASHANK GAUR 09609059
0% (1)
Siebel System: The Role of The Cfo: Submitted By:-Shweta Jain 09609061 SHASHANK GAUR 09609059
8 pages
Descriptive Statistics Cheat Sheet
No ratings yet
Descriptive Statistics Cheat Sheet
1 page
Scipy Cheat Sheet Python For Data Science: Linear Algebra
No ratings yet
Scipy Cheat Sheet Python For Data Science: Linear Algebra
1 page
Probability and Statistics Cheat Sheet
100% (2)
Probability and Statistics Cheat Sheet
28 pages
Model Perf Cheat Sheet
No ratings yet
Model Perf Cheat Sheet
2 pages
A Comprehensive Statistics Cheat Sheet For Data Science 1685659812
No ratings yet
A Comprehensive Statistics Cheat Sheet For Data Science 1685659812
39 pages
The Matplotlib User's Guide
No ratings yet
The Matplotlib User's Guide
868 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Numpy Cheat Sheet Python For Data Science: Inspecting Your Array Sorting Arrays
No ratings yet
Numpy Cheat Sheet Python For Data Science: Inspecting Your Array Sorting Arrays
1 page
Complex Analysis Theorem Cheat Sheet
100% (1)
Complex Analysis Theorem Cheat Sheet
2 pages
Boolean Algebra: Name Reads As Logic Gate OCR Notation Alternative Notation Examples Truth Table Notes
No ratings yet
Boolean Algebra: Name Reads As Logic Gate OCR Notation Alternative Notation Examples Truth Table Notes
2 pages
Numpy Python Cheat Sheet
No ratings yet
Numpy Python Cheat Sheet
1 page
Cheat Sheet Collection
100% (1)
Cheat Sheet Collection
15 pages
Probability CheatSheet LGT
No ratings yet
Probability CheatSheet LGT
2 pages
Python Cheat Sheet: Print Print ("Hello World") Input Input ("What's Your Name")
100% (1)
Python Cheat Sheet: Print Print ("Hello World") Input Input ("What's Your Name")
16 pages
Ai Cheat Sheet Machine Learning With Python Cheat Sheet
100% (3)
Ai Cheat Sheet Machine Learning With Python Cheat Sheet
2 pages
All in One CheatSheet
100% (1)
All in One CheatSheet
52 pages
M3001 Cheat Sheet
No ratings yet
M3001 Cheat Sheet
4 pages
Bokeh Cheat Sheet
No ratings yet
Bokeh Cheat Sheet
1 page
Financial Markets
No ratings yet
Financial Markets
8 pages
Cheat Sheet
100% (3)
Cheat Sheet
3 pages
Pandas Cheat Sheet
100% (4)
Pandas Cheat Sheet
2 pages
AI Deep Learning Cheat Sheets-From BecomingHuman - Ai PDF
100% (3)
AI Deep Learning Cheat Sheets-From BecomingHuman - Ai PDF
25 pages
Pandas Tutorial 1: Pandas Basics (Reading Data Files, Dataframes, Data Selection)
No ratings yet
Pandas Tutorial 1: Pandas Basics (Reading Data Files, Dataframes, Data Selection)
15 pages
Javascript Cheatsheet: Musa Al-Hassy
No ratings yet
Javascript Cheatsheet: Musa Al-Hassy
9 pages
Machine Learning Algorithm Cheat Sheet
No ratings yet
Machine Learning Algorithm Cheat Sheet
1 page
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
STAT3006 Lecture Notes 2021 Aug8 2021
No ratings yet
STAT3006 Lecture Notes 2021 Aug8 2021
110 pages
Weekly Homework X
No ratings yet
Weekly Homework X
15 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
Materi 5 - 2
No ratings yet
Materi 5 - 2
25 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
5 - Feature Generation
No ratings yet
5 - Feature Generation
15 pages
I2ml3e Chap5
No ratings yet
I2ml3e Chap5
26 pages
SVM
No ratings yet
SVM
40 pages
LINFO2275 Questions d Examen-4
No ratings yet
LINFO2275 Questions d Examen-4
34 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Unit 2 - Gaussian Models
No ratings yet
Unit 2 - Gaussian Models
67 pages
Solution
No ratings yet
Solution
148 pages
PPA Data Preparation
No ratings yet
PPA Data Preparation
31 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Notation
No ratings yet
Notation
3 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
SVM Class
No ratings yet
SVM Class
33 pages
Learning Book 11 Feb
No ratings yet
Learning Book 11 Feb
322 pages
Mathematics For Machine Learning
No ratings yet
Mathematics For Machine Learning
134 pages
Pattern Summary Final
No ratings yet
Pattern Summary Final
28 pages
Notes For Multivariate Statistics With R
No ratings yet
Notes For Multivariate Statistics With R
189 pages
Data Mining Lecture 1 - Summary
No ratings yet
Data Mining Lecture 1 - Summary
3 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
PATTERN FILE[1]
No ratings yet
PATTERN FILE[1]
29 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
4
No ratings yet
4
26 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Design of Machinery An Introduction to the Synthesis and Analysis of Mechanisms and Machines 4th Edition Robert L. Norton pdf download
100% (2)
Design of Machinery An Introduction to the Synthesis and Analysis of Mechanisms and Machines 4th Edition Robert L. Norton pdf download
65 pages
Hortatory Exposition Text
No ratings yet
Hortatory Exposition Text
3 pages
Manual Aa400
No ratings yet
Manual Aa400
225 pages
Bom
No ratings yet
Bom
3 pages
YP WZ加油机非税英文使用说明书
No ratings yet
YP WZ加油机非税英文使用说明书
32 pages
MATLAB Program For Solution Power Flow Gauss-Seidel Method - EE1404 - Power System Simulation Laboratory
0% (1)
MATLAB Program For Solution Power Flow Gauss-Seidel Method - EE1404 - Power System Simulation Laboratory
4 pages
Envi Sci - Module 1
No ratings yet
Envi Sci - Module 1
6 pages
P13 - SAK - Pragmatisme Dalam Desain Arsitektur Oleh Michael Rusli, ST (Dosen Tamu)
No ratings yet
P13 - SAK - Pragmatisme Dalam Desain Arsitektur Oleh Michael Rusli, ST (Dosen Tamu)
114 pages
BQ 2022
No ratings yet
BQ 2022
321 pages
Barreca 2021 Coatings A Bio-Based Editor
No ratings yet
Barreca 2021 Coatings A Bio-Based Editor
18 pages
Product Recommendation Toyota Forklift Trucks, Diesel 7FDA50
No ratings yet
Product Recommendation Toyota Forklift Trucks, Diesel 7FDA50
2 pages
IELTS Writing Maximiser Unlocked
No ratings yet
IELTS Writing Maximiser Unlocked
6 pages
Tugas Makalah Bioteknologi Opt
No ratings yet
Tugas Makalah Bioteknologi Opt
19 pages
Diagnostic Trouble Codes (DTC) : DTC P2724 Pressure Control Solenoid 1 (PCS1) Stuck On
No ratings yet
Diagnostic Trouble Codes (DTC) : DTC P2724 Pressure Control Solenoid 1 (PCS1) Stuck On
1 page
Reinforcing Steel Bars Price List: Structural (Astm Grade 33)
No ratings yet
Reinforcing Steel Bars Price List: Structural (Astm Grade 33)
1 page
SOA - Salary Loan
No ratings yet
SOA - Salary Loan
1 page
Battery Component Price Report
No ratings yet
Battery Component Price Report
5 pages
Lesson 2 Slide
No ratings yet
Lesson 2 Slide
25 pages
CMMI v1.4 PDF
100% (1)
CMMI v1.4 PDF
204 pages
Ig Cell Tabela Atualizada
No ratings yet
Ig Cell Tabela Atualizada
28 pages
Multipath Propagation PDF
No ratings yet
Multipath Propagation PDF
18 pages
Anthropology Assignment
No ratings yet
Anthropology Assignment
9 pages
List of Lakes by Area
No ratings yet
List of Lakes by Area
8 pages
ACCU - 220 - TRABAJO FINAL Ingles
No ratings yet
ACCU - 220 - TRABAJO FINAL Ingles
5 pages
Spss Exercises
100% (1)
Spss Exercises
5 pages
Exam Experiment S2 Sample Solution
0% (1)
Exam Experiment S2 Sample Solution
9 pages
Monday Tuesday Wednesday Thursday Saturday
No ratings yet
Monday Tuesday Wednesday Thursday Saturday
3 pages
EES May 2015
No ratings yet
EES May 2015
80 pages
Hammer Union Dixon
No ratings yet
Hammer Union Dixon
1 page

Cheat Sheet

Uploaded by

Cheat Sheet

Uploaded by

Binning: Scaling: u[u0,um] to v[v0,vm] Norms: Dissimilarity matrix:

Probability Distribution Function (Continuous):

o Probability Density Function (pdf) Calculate Eigenvectors

Joint pdf of multivariate normal RV => Number of ways (possible y=

Cosine similarity: Xr → projection on the same feature space

Where is the squared Euclidean distance

Reduce dimensionality (r < n)

LDA classifier is a special case: Σ1 = Σ2 = S/n (Q = 0) Distance-Weighted KNN

If Σi = σ2 I, P(ωi) = π, (i belongs to k) → minimum

Cross-entropy error function for target

Decision trees: Sigmoid Output:

Single linkage: nearest neighbor method

Euclidean distance (L2 norm) Group average: unweighted and weighted

Chebychev distance (L norm) Weighted:

You might also like