Week 4 - Intro to ML
Week 4 - Intro to ML
Analyzing images to detect anomalies or Detecting brain tumors in brain scans Classifying topics using NLP
faults in production line
Forecasting Sales Chatbots for customized and quick Recommending products based on buyer
interactions with customers behaviors
• Supervised learning, the objective is to learn
a function to predict an output variable Y
based on observed input variables (also called
features) x1, . . . , xp. We develop methods that
Types of learn this function based on labelled data
which we call the training data.
machine
learning • Unsupervised learning, we are given only
inputs and the goal is to find “interesting”
patterns in this data. It is used for clustering
Supervised learning
In supervised learning, the output or response variable can be of any
type. However, most methods address two main classes of supervised
learning problems:
• In regression, the response is a quantitative scalar (such as the
income of a worker).
• Regression:
• Predicting income of an individual
• Number of Covid-19 patients in the next 2
months
• House prices
• Sales forecasting – units / value
• Classification:
• Cancer – No cancer
• Fraud – Secure
• Churn – No churn
• Good customer – bad customer
Supervised Learning
Predict house prices
•Y=
• X = f(X1, X2, X3, ….Xn)
Unsupervised Learning
• The training data is unlabelled
1. Problem formulation.
5. Model evaluation.
6. Communicate results.
Evaluating model performance
• Training set: for exploratory data analysis, model building, model
estimation, model selection, etc.
• A higher proportion of training data leads to more accurate model estimation, but
higher variance in estimating the expected loss.
• The split of the data into the training and test sets is often random, but sometimes
there are reasons to consider alternative schemes.
• The validation or the test set should be as
representative of the data that the model
will “see” in production
Data
Mismatch • Don’t test it on apples and productionize to
predict oranges.
Key concepts
• Overfitting.
• Accuracy vs interpretability.
Simple vs complex model
• Having a very simple model can lead to bad predictions both on train
set and test.
• An overfit model has small training errors, but may predict poorly. In
essence, it has memorized the training set.
• We here consider the unadjusted highway MPG for 2010 cars as the
response variable, and a single predictor, engine displacement.
Example