0% found this document useful (0 votes)
35 views

Framing A Machine Learning Problem

Uploaded by

Edward Bwogi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views

Framing A Machine Learning Problem

Uploaded by

Edward Bwogi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Framing a

Machine Learning Problem

Facilitators:
Rahman, Brian, Eva, Andrew, George,
Mark, Peter, Confred
Today`s Agenda


Defining a ML problem and proposing a
solution;
 Identifying good ML problems
 Deciding on ML
 Formulating a problem as an ML problem

ML Bootcamp Sept 16 - Oct 7, 2023


Defining a ML problem and proposing a
solution

ML Bootcamp Sept 16 - Oct 7, 2023


Defining a ML problem

ML – process of training a software (or model)
to make predictions by learning from data

Branches of ML
 Supervised learning
 Unsupervised / self-supervised learning
 Reinforcement learning

ML Bootcamp Sept 16 - Oct 7, 2023


Kinds of ML problems

Supervised and unsupervised ML problems fall under
multiple categories

ML Problem Type Description Example

Classification Predict label for previously Identify image of dog from that of
unseen example cat, bicycle from motor bike
Regression Predict numerical values Predicting price of houses

Clustering Group similar examples Most relevant documents


(unsupervised)
Association rule Infer likely association If you buy a bed, you are likely to
learning patterns in data buy a mattress too (unsupervised)
Structured output Create complex output Image recognition bounding
boxes
Ranking Identify position on a scale Search result ranking in a search
or status ML Bootcamp Sept 16 - Oct 7,engine
2023

Check Your Understanding


https://developers.google.com/machine-
learning/problem-framing/cases#check-
your-understanding

ML Bootcamp Sept 16 - Oct 7, 2023


The ML Mindset


"Machine Learning changes the way
you think about a problem. The focus
shifts from a mathematical science to a
natural science, running experiments
and using statistics, not logic, to
analyse its results." - Peter Norvig -
Google Research Director

ML Bootcamp Sept 16 - Oct 7, 2023


Experimental Design


Scientific method
 It is helpful to think of the ML process as an
experiment where we run test after test after test
to converge on a workable model
 Like an experiment, the process can be exciting,
challenging, and ultimately worthwhile

ML Bootcamp Sept 16 - Oct 7, 2023


Step Example

1. Set the research goal I want to predict how heavy traffic will be
on a given day.
2. Make a hypothesis I think the weather forecast is an
informative signal for traffic prediction!
3. Collect the required data Collect historical traffic data and weather
data on each day
4. Test your hypothesis Train a model using this data to predict
traffic.
5. Analyze the results you get Is this model better than existing systems
for traffic prediction?
6. Draw a conclusion I should (not) use this model to make
traffic predictions, because of X, Y, and Z.
7. Refine your hypothesis and Time of year could be a helpful signal for
repeat traffic
ML Bootcamp Sept prediction?
16 - Oct 7, 2023
Identifying good problems for ML

Characteristics of a good ML problem
 Clear use case
 * Start with the problem, not the solution. Make sure you aren't treating ML as a
hammer for your problems
 Focus on problems that would be difficult to solve with traditional
programming e.g,

Smart Reply – automated email reply, saves user time

Google Photos – find a specific photo by keyword search without
manual tagging

* ML solves problems by examining patterns in data/adapting with them
 Ask yourself the following questions,

What is the problem being faced?

Would it be a good problem for ML?
ML Bootcamp Sept 16 - Oct 7, 2023
Identifying good problems for ML

Characteristics of a good ML problem...
 Know the problem before focusing on the data
 * Be prepared to have your assumptions challenged
 Once you`ve clear understanding of problem, list potential
solutions to test in order to generate the best model

Understand that you`ll have to try out a few solutions before you
land on a good working model
 EDA helps you understand your data, but you can't yet
claim that patterns you find generalize until you check
those patterns against previously unseen data

Failure to check could lead you in the wrong direction or reinforce
stereotypes or bias
ML Bootcamp Sept 16 - Oct 7, 2023
Identifying good problems for ML

Characteristics of a good ML problem...
 Data, data, more data
 * ML requires a lot of relevant data
 Data collected specifically for your task is most useful

In practice, secondary data is used in majority of applications
 How much is a lot? - depends on the ML problem

but more data will improve your model (e.g, robustness) and
it's predictive power. A good rule of thumb is to have at least
000`s of examples for basic linear models, and 100`s of
000`s for neural networks. If you have less data, consider a
non-ML solution first and/or transfer learning methods
ML Bootcamp Sept 16 - Oct 7, 2023
Identifying good problems for ML

Characteristics of a good ML problem...
 Predictive Power
 * Your features should contain predictive power
 Ensure your data set contains relevant features that
correlate with the phenomenon being investigated

e.g, is bedroom count a good predictor for house prices?

Don`t try out features arbitrarily without a hypothesis
 Your goal is to build a model that generalizes well to
previously unseen samples and this is possible only
if you use the right features
ML Bootcamp Sept 16 - Oct 7, 2023
Identifying good problems for ML

Characteristics of a good ML problem...
 Predictions vs. Decisions
 * Aim to make decisions, not just predictions
 Your product take action on output of ML model

ML better at making decisions than deriving insight from
data (for the latter, use statistical approaches)

Ensure predictions allow you to take a useful action e.g,
a model that predicts likelihood of clicking certain videos
could allow a system to pre-fetch the videos most likely to
be clicked

ML Bootcamp Sept 16 - Oct 7, 2023


Examples of prediction / decision pairs

Prediction Decision
What video the learner Show those videos in the
wants to watch next recommendation bar

Probability someone will If P(click) > 0.12, prefetch


click on a search result. the web page

What fraction of a video If a small fraction, don't


ad the user will watch show the user the ad

ML Bootcamp Sept 16 - Oct 7, 2023


Hard ML problems

Clustering
 What does each cluster
mean in an unsupervised
learning problem? E.g, if
your model indicates that
the user is in the blue
cluster, you'll have to
determine what the blue
cluster represents
 Semi-supervised learning
may help

ML Bootcamp Sept 16 - Oct 7, 2023


Hard ML problems...

Anomaly detection
 how do you decide what constitutes an anomaly
to get labeled data?

ML Bootcamp Sept 16 - Oct 7, 2023


Hard ML problems...

Causation
 ML can identify correlations – mutual
relationships or connections between two or
more things. Determining causation (one event
or factor causing another) is harder. It is easy to
see that something happened, but much harder
to understand why it happened
 You can't determine causation from only
observational data – you need to run
experiments

ML Bootcamp Sept 16 - Oct 7, 2023


Hard ML problems...

No data
 if you have no data to train a model, then ML
cannot help you. Without data, use a simple,
heuristic, rule-based system
 Some new products with no training data start
with a heuristic rule system, and obtain training
data only after users interact with it

ML Bootcamp Sept 16 - Oct 7, 2023


Deciding to use ML

Set yourself up for success by thinking about these
things before trying to frame a problem for ML
 Start clearly / simply – what would you like the ML model to
do for you?

e.g. I want the ML model to predict the price of a house
 What is your ideal outcome?

e.g tourism recommendations – my ideal outcome is to suggest
tourism destinations that tourists find attractive and worth their
time and money
 Success and failure metrics

Quantify it, measurable, what output would you like the ML model
to produce (based on type of ML problem),
ML Bootcamp Sept 16 - Oct 7, 2023
Formulate problem as an ML problem
1) Suggested approach for framing ML problem
1) Articulate your problem
2) Start simple
3) Identify your data sources
4) Design your data for the model
5) Determine where data will comes from
6) Determine easily obtained inputs
7) Ability to Learn
8) Think about potential Bias
ML Bootcamp Sept 16 - Oct 7, 2023
Articulate your problem

Is it a classification, regression, clustering,
anomaly detection problem?

ML Bootcamp Sept 16 - Oct 7, 2023


Articulate your problem

Write down a succint problem statement
 e.g. Our problem is best framed as 3-class, single-
label classification, which predicts whether a video
will be in one of three classes—{very popular,
somewhat popular, not popular}—28 days after
being uploaded

ML Bootcamp Sept 16 - Oct 7, 2023


Start simple

Simply the problem further if possible e.g,
 We will predict whether an uploaded video is likely
to become popular or not (binary classification)
 We will predict an uploaded video’s popularity in
terms of the number of views it will receive within a
28 day window (regression)

Start by using the simplest model (baseline) possible for
your ML problem

ML Bootcamp Sept 16 - Oct 7, 2023


Identify your data sources

Provide answers to the following questions about your
labels:
 How much labeled data do you have?
 What is the source of your label?
 Is your label closely connected to the decision you will be
making?

Example
 Our data set consists of 100,000 examples about past
uploaded videos with popularity data and video descriptions.

ML Bootcamp Sept 16 - Oct 7, 2023


Design your Data for the Model

Identify the data that your ML system should
use to make predictions (input -> output),
Title Channel Upload time Uploaders recent Output
videos (label)
My silly cat Alice 2018-03-21 08:00 Another cat video, Very popular
yet another cat
A snake video Bob 2018-04-03 12;00 None Not popular

ML Bootcamp Sept 16 - Oct 7, 2023


Determine Where Data Comes From

Assess how much work it will take to develop a data
pipeline to construct each column for a row. When does
the example output become available for training
purposes?

Example
 We applied the labels {very popular, somewhat popular, not
popular} to each video that fell within a determined range of
views and "thumbs ups" and determined keyword descriptions
for each video. Hand-generating descriptions is not sustainable,
so we are considering adding a keyword description to the
upload form.

ML Bootcamp Sept 16 - Oct 7, 2023


Determine Easily Obtained Inputs

Pick 1-3 inputs that are easy to obtain and that
you believe would produce a reasonable, initial
outcome
 Consider the engineering cost to develop a data
pipeline to prepare the inputs, and the expected
benefit of having each input in the model

ML Bootcamp Sept 16 - Oct 7, 2023


Ability to Learn

Will the ML model be able to learn? List aspects
of your problem that might cause difficulty
learning. For example:
 The data set doesn't contain enough positive labels.
 The training data doesn't contain enough examples.
 The labels are too noisy.
 The system memorizes the training data, but has
difficulty generalizing to new cases.

ML Bootcamp Sept 16 - Oct 7, 2023


Think About Potential Bias

Many datasets are biased in some way. These
biases may adversely affect training and the
predictions made e.g,
 A biased data source may not translate across
multiple contexts
 The training sets may not be representative of the
ultimate users of the models and may therefore
provide them with a negative experience

ML Bootcamp Sept 16 - Oct 7, 2023


Conclusion

It is important to frame your problem properly
for ML


Not all problems require or need to be solved
using ML

ML Bootcamp Sept 16 - Oct 7, 2023


Quiz

Complete the quiz at this link
https://elearning.umu.ac.ug/mod/quiz/attempt.
php?attempt=15240&cmid=17874

ML Bootcamp Sept 16 - Oct 7, 2023

You might also like