0% found this document useful (0 votes)
69 views

BMI 704 - Machine Learning Lab

This document provides an overview of machine learning topics including supervised learning, unsupervised learning, algorithms, and packages. Supervised learning involves predicting outcomes using labeled training data and evaluating models. Unsupervised learning explores patterns in unlabeled data through techniques like principal component analysis and clustering methods including k-means and hierarchical clustering. Popular algorithms and packages for implementing these methods in R are also discussed.

Uploaded by

jakekei5258
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

BMI 704 - Machine Learning Lab

This document provides an overview of machine learning topics including supervised learning, unsupervised learning, algorithms, and packages. Supervised learning involves predicting outcomes using labeled training data and evaluating models. Unsupervised learning explores patterns in unlabeled data through techniques like principal component analysis and clustering methods including k-means and hierarchical clustering. Popular algorithms and packages for implementing these methods in R are also discussed.

Uploaded by

jakekei5258
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

BMI 704 – Machine Learning

Lab
030719
Topics
• Introduction to Supervised Learning
• Introduction to Unsupervised Learning

• Algorithms and Packages


Supervised Learning
• Outcome
• You know the outcome (labelled variables; Y)
Your model
• Continuous or binary

• Features
• i.e. variables (Xs)
• Inputs you are using to predict outcome

• Model Models
• 1) Pick a guy Diabetes = 0.5*age + 0.2*sex + 2.1*BMI + …
• 2) sub his features into the model
Height = 0.2*age + 0.8*sex + 1.3*weight + …
• 3) now you know his outcome
Where is the predicting model come from?
• 1) Pick an algorithm
• Linear model
• Y = X1 + X2 + X3

• 2) Split your data set into train and test (e.g. 80/20,
70/30)

• 3) Build your model using the training data set


• Cross validation find best model parameters

• 4) Run your optimized model using the test data set

• 5) Report model performance and your results


Measurement of how well your algorithm did?
Loss function
• Objective metric, max or
min

Simple Regression
• R2 - amount of variance
explained

Multiple regression with


varying model size
• Adjusted R2
• AIC/BIC/Cp
Measurement of how well your algorithm did?
Classification (Y = binary)
• Receiver operating
characteristic (ROC) curve
and area under the curve
(AUC)

• If Y = 1 or 0;
• High sensitivity:
• Y = 1; ➙ Y^ = 1
• High specificity:
• Y = 0; ➙ Y^ = 0
Which model (algorithm) should you use?
Unsupervised Learning
• Not interest in predicting Y but exploratory analysis (Xs)
• discovering patterns
• Find subgroups that you don’t know
• Visualize the results

• Hard to validate results

• Principle component analysis


• X1, X2, X3, X4 … Xn
• ➙ create latent variables (PCs)

• A few latent variables to capture the most of the information of the data
• i.e. the variance explained

• Variance explained: PC1 > PC2 > PC3 …


Score plot loading plot

loading x%
Score x%
Unsupervised Learning
• Clustering
• PCA looks to find a low-dimensional representation of the observations that
explain a good fraction of the variance;
• Clustering looks to find homogeneous subgroups among the observations.

• K-means clustering
• hierarchical clustering
K-means clustering
• partitioning a data set into K distinct, non-overlapping clusters.
• Specify how many clusters do you want
• The algorithm looks for
local optimum
• Run a few times to see
the different
hierarchical clustering
• tree-based representation of the
observations, called a
dendrogram.
• bottom-up clustering
Algorithms and Packages
• ML Algorithms (many, many, many!)
• Basics: linear-based
• Shrinkage Methods
• Lasso and Ridge regression
• ElasticNet
• Non-linear methods
• Spline
• Support Vector Machines
• Tree based methods
• Decision trees
• Random Forests
• Packages in R
• Individual packages for each algorithm - glmnet
• Meta packages – caret
Unsupervised Learning (con’t)
• Clustering
• Partitional methods
• K-means: partition {x1,…xn} into K clusters where K is
predefined.
• Build a new partition by associating each point with the nearest
centroid
• Compute the centroid (mean point) for each set. Repeat until
converge.
• “kmeans” function in R.
Unsupervised Learning
• Not interest in predicting but discovering patterns
• Find subgroups that you don’t know
• Visualize the results
• Principle component
• Clustering
• Hierarchical clustering– Build a hierarchy of clusters
• Agglomerative: A “bottom up” approach. You start with each element in a separate
cluster, then merge them according to a given property.
• Divisive: A “top down” approach. All elements start in one all-inclusive cluster, then you
split recursively.

You might also like