PPB ML Notes
PPB ML Notes
Phenil Buch
[email protected]
17 July, 2019
Contents I
1 Contents
2 Python
5 My Projects
Python Introduction I
Python Introduction II
Pandas I
Pandas II
Pandas III
Sci-kit Learn I
• Data Pre-processing
• Standardization - Standard Scalar - Subtract Mean and di-
vide by Standard Deviation a.k.a. Z-Score. Always Stan-
dardize after generating Polynomial Features
• Normalization - Normalizer - Range between -1 to 1
• Discretization and Binarization -KBinsDiscretizer and Bi-
narizer - Turn Continous values into Categorical ones
• Encoding Categorical Variables - Label Encoder
• Imputing Missing Values - Imputer method mean
• Generating Polynomial Features and Word Vectors - Poly-
nomialFeatures, CountVectorizer, TfidfVectorizer
Sci-kit Learn II
• Supervised Learning Estimators - Naive Bayes Classi-
fier, K-Nearest Neighbors, Support Vector Machines, Lin-
ear Regression
• Unsupervised Learning Estimators - Principal Compo-
nent Analysis PCA and K-means for Unsupervised Learning
Tasks like Clustering, Dimension Reduction, Embedding,
Representing the data with a distribution (Density Estima-
tion).
• Classification Metrics - Accuracy Score, Classification
Report (Precision, Recall, F1, Support), Confusion Matrix
• Regression Metrics - MAE Mean Absolute Error, MSE
Mean Squared Error, R-squared Error
Keras I
Keras II
Keras III
Keras IV
Keras V
TensorFlow I
TensorFlow II
• Step 1. Create a Computational Graph using Variables and
Placeholders. By creating computational graph, we mean
defining the nodes. Tensorflow provides different types of
nodes for a variety of tasks. Each node takes zero or more
tensors as inputs and produces a tensor as an output.
• TensorFlow has Variable nodes which can hold variable
data. They are mainly used to hold and update param-
eters of a training model. A graph can be parameterized
to accept external inputs, known as placeholders. A place-
holder is a promise to provide a value later.
• Step 2. In order to run the computational graph, we need
to create a session. We can invoke the run method of
session object to perform computations on any node.
Phenil Buch [email protected] Machine Learning Notes
Contents Python Mathematics for Deep Learning Deep Learning Concepts My Projects
TensorFlow III
PyTorch I
PyTorch II
OpenCV
• Face Recognition Algorithms
• Haar Cascades
• Eigenfaces
• Fisherfaces
• ML Algos available - Normal Bayes Classifier, K-Nearest
Neighbors, Support Vector Machines, Decision Trees, Boost-
ing, Gradient Boosted Trees, Random Trees, Extremely
randomized trees
• Image Filters - Averaging, Gaussian Filtering, Median Fil-
tering, Bilateral Filtering
• Video Filters - Color Conversion, Thresholding, Smooth-
ing, Morphology, Gradients, Canny Edge Detection, Con-
tours, Histograms
Phenil Buch [email protected] Machine Learning Notes
Contents Python Mathematics for Deep Learning Deep Learning Concepts My Projects
Linear Algebra I
Linear Algebra II
• We usually measure the size of vectors using a function
called a norm.
• The most widely used kind of matrix decomposition
is called eigen-decomposition, in which we decompose
a matrix into a set of eigenvectors and eigenvalues.
Eigen-decomposition is only defined for square matrice-
sEigenvectors are vectors and Eigenvalues are scalars.
• The Singular value decomposition provides another way
to factorize a matrix, into singular vectors and singular
values.
• The Moore Penrose PseudoInverse - Normally, Matrix
Inversion is not defined for matrices that are not square.
Probability I
Probability II
Probability III
Probability IV
• A distribution over multiple random variables is called a
Joint Probability Distribution. We can write a collec-
tion of random variables as a vector x. A joint distribution
over x specifies the probability of any particular setting of
all the random variables contained in x.
• We denote the Conditional Probability Distributions
as the probability of an event given that another event has
already been observed.
• Chain Rule and Bayes Rule gives the relationship be-
tween conditional probability and joint probability. P(x, y)
= P(x given y).P(y) and P(x given y) = P(y given x).P(x)
/ P(y)
Probability V
Probability VI
Probability VII
Probability VIII
Calculus
CNN I
CNN II
CNN III
• RCNN works with regions of the image which probably
contains objects. YOLO looks at the entire image at once
and divides it into grids.
• textbfObject Detection Algorithms - Bounding Box Detec-
tion or Landmark Detection. Anchor Boxes, Intersection
Over Union IoU and Non-Max Suppression (Removal of
Duplicate Bounding Boxes using IoU)
• Face Verification and Face Recognition - One-Shot
Learning (learn similarity function for verification with lim-
ited dataset), Siamese Network (learning how to encode
images to then quantify how different two images are kinda
like word embeddings but for images) , Triplet Loss (loss
CNN IV
RNN I
RNN II
RNN III
TimeSeries I
• A time series can be taken on any variable that changes
over time. In investing, it is common to use a time series
to track the price of a security over time.
• Time series analysis can be used to examine how the changes
associated with the chosen data point compare to shifts in
other variables over the same time period.
• Time-intervals between Datapoints can be either regular or
irregular
• The data typically arrives in time order but may need to
be re-ordered properly as part of data cleaning
• Dependence: Dependence refers to the association of two
observations with the same variable, at prior time points.
Phenil Buch [email protected] Machine Learning Notes
Contents Python Mathematics for Deep Learning Deep Learning Concepts My Projects
TimeSeries II
TimeSeries III
• ARIMA stands for autoregressive integrated moving av-
erage. This method is also known as the Box-Jenkins
method.
• Tools for investigating time-series data include:
• Consideration of the autocorrelation function and the spec-
tral density function
• Cross-correlation functions and cross-spectral density func-
tions
• Performing a Fourier transform to investigate the series in
the frequency domain
• Principal component analysis (or empirical orthogonal func-
tion analysis)
• Artificial neural networks
TimeSeries IV
Thank You