Fast ai class notes
Fast ai class notes
lecture 3
https://forums.fast.ai/t/lesson-3-official-topic/96254/9
1.0-Tips
learning tips by tanisq
learning tips by me
2.0-Resources
Important resources
https://forums.fast.ai/t/lesson-3-notebooks/104821
https://forums.fast.ai/t/lesson-3-official-topic/96254
https://forums.fast.ai/t/best-practices-for-setting-up-fastai-on-mackbook-
pro-with-m1-max-chip/98961/4
lecture 3 1
https://forums.fast.ai/t/best-practices-for-setting-up-fastai-on-mackbook-
pro-with-m1-max-chip/98961/3
math resources
3.0 - Missions
missions list
4.0- Book
chapter 4 - first iteraiton
lecture 3 2
so the idea - just lets look at a specific section of the image.
so we look at a sction startin 4 pixels from the top and left - so we view
it as a numpy array, which is a numbered representation of the image
lecture 3 3
gradient
pixels
lecture 3 4
Tensor
lecture 3 5
lecture 3 6
dimensions
lecture 3 7
Metric
Over fit
lecture 3 8
lecture 3 9
Stochasit graident descent
testing the effectivnes of any current weight assigment in terms of
preformance, and provide a mechanisem for altering the weight assigment
to improve prefromance - and do that automatic, so a mchine would learn
from it expirence
lecture 3 10
weights and pixels
lecture 3 11
lecture 3 12
lecture 3 13
lecture 3 14
Derivitative
Lets assume i have a loss function ok, which depends upon a paramater
than, because its our loss function, wetry some arbitrary input to it, and
see what is the result - what is the loss value
now, we would like to adujst the paramaterby alittle bit, and see what
happens
so its the same as the slope - think of it, like you change it,
lecture 3 15
book - baidc idea
no matter how complex the nerual functions became, the basic idea is still
simpler
This basic idea goes all the way back to Isaac Newton, who pointed out that
we can
optimize arbitrary functions in this way. Regardless of how complicated our
functions
become, this basic approach of gradient descent will not significantly change.
The
only minor changes we will seess later in this book are some handy ways we
can make it
faster, by finding better steps.
Using calculaus
just allow us to calculate quickly how the change in paramaters affect the
loss value
lecture 3 16
This is dervitateive
how much a change itn its paratmer, will change the result of the original
function
this is calculuas - if we know how the function will change, the loss
function, we know what will need to do to the paramtaers to make the
loss value smaller
lecture 3 17
so just calculate the derivtative for one wieght (parmataer), and treat all
other paramaters as constant, as not changed
caclulating gradients
calculating the gradients for the paramaters values at a single point of
time
Learning rate
we get the gradients - are the slope of the function, if its very large, we
have more adjusment
lecture 3 18
Distingiush
inputs - wht inputs of the function
output - relationships
important
Loss function
lecture 3 19
The process
1. start with a function that is your model, with a given praramtaers
3. calculate the prediction for your function for your inputs with the current
paramater
a. than, you will have the prediction for your inputs (the valdation set),
with the current paramaters
4. than, see how far your preditions are from your actuals using the loss
function
we input the data points to the function and plot the output
than we create a function - this in real life will be the neural function, but
now, lets assume its a quadratic
lecture 3 20
if the shape of the prediction close to our actuals look (patterns)
and then, for the model, you can see each time, you can plot the
prediction, and see it the predictions pattern / shape is simmilar to
the actual shape
and you will improve it each time, so at the end, you will have a
simmilar function shape for the predictions as of the actuals
summary
at first output from our input wont have anything to do with what we want
lecture 3 21
using the label data
the model gives us outputs
we compare them with the targets (we have the labels for those inputs)
than we calculat the loss value - how wrong our prediction from the
actuals where
5.0- Lecture
lecture 3 22
Run notebooks and expirement:
run every code cell from the book - expirementing with different
inputs/outputs
main concepts
training piece → than gets the model
feed the model inputs and it spits output based on the model
Queastions
lecture 3 23
what does it mean “inference” seconds in training a model
Model.pkl
fit function to data
Derivitative
if you increase the input, the output increase /decrease - basiclly, the slop
of the function
Tensor
works with a list, with anything basiclly.
Optimization
gradienct desecent - caluclate. the gradient(paaramaters), and decrease
the loss
then optimize
lecture 3 24
values adding together
lecture 3 25
model choosing is the last thing
once you clean , gather the data, and augmented it
so the idea is to find a function that fit our data? so neural networks is
just a function
השיפוע גבוה או לא, האם בערך עצמו,אז הנגזרת בעצם תגיד לכל ערך
Our goal - is to make the loss small, we want to make the loss
smaller
lecture 3 26
if we know how our function will change -
we have a paramater, and the function, that tell us how rapidly the
function change in this paramater
lecture 3 27
the magnitue tell us that at this point , the slop is fairly steep, so
the loss change significant when we adjust w - each time we
adjust w, the output change significatly
so lets remove by some value * the slope, and see what happens,
lecture 3 28
why use the slope?
lecture 3 29
Paramaters
where are they come from, and how they helped us
start with flexible function, and get her to refognize the patterns in the
data we give it
then the goal - is to find the most appropritate function to our data -
so we will test different function, to try the best one
so here - we have the abillity to create any quadratic function, and we will
look for the best one for our data
So the idea:
we try to map the data to a shape of a function - and try to make basiclly
the function describe the data as much as possible, with little noise as
possible
the goal - find the best function that match the data we have!!!!!
the steps
lecture 3 30
1. plot your data
3. try to change the paramaters, and see if the function describe (or fit) your
data better
a. change the paramaters: you can increase or decrease and see what
improving our function
then reitrate, for all the paramaters again, until you find the best
paramater value
The question
if i move the paramater, does the model improve or disimprove?
this is called loss functions - the value that it output, is the loss valuue
- how good the model is.
so the goal - try to find the paramaters that reduce the loss value - that
improve the model
The derivitative:
if i increase the paramater - does the loss get better or worse?
we get the paramaters,insert them to the loss function and get the
loss value with this paramaters - the measurment for how good
lecture 3 31
our model FOR THIS PARAMATERS
but the question is, how the paramater, connected to the slop
of the function?
lecture 3 32
loss function - depends upon the paramaters
lecture 3 33
Clarfinyg questions
lecture 3 34
lecture 3 35
lecture 3 36
paramaters vs values
lecture 3 37
loss function - measure erros in this outputs!
lecture 3 38
What is the function
what is the function you find paramaters for?
lecture 3 39
each time we have a paramater, we put in the function (in our
model)
Linear algebra
matrix multiplication
Dependeant variable
lecture 3 40
the thing we try to predict - is it a bird? is he survived? etc
gradient vs derivative
depends on how “ ”תלולis the slope - if the slope is very “”תלול, meaning,
if we change this paramater, the loss value increase/decreaese a lot - we
need to increase the paramater a lot
when its תלולה, to get to the minimum point, you will need to increase x
a lot
but if you are right next to x=0, where its the minimum value, next to it,
the שיפועis very small, so you need to increase very little if you are
very close to x=0
but for x = 0.7, the slope will be very קטןso you increase little
gradient descenet
lecture 3 41
partial in python
lecture 3 42
lecture 3 - second iteration
lecture 3 43
rectified linear function
infinintly flexible function
the idea - if we added more than one linear function togetehr, we can get
to any function we please.
lecture 3 44
lecture 3 45
adjusting paramaters
what you did when plotting, changing the “m”
rectified linear
we create the flexible function just using addition of recfitifed linear
function
lecture 3 46
then we can add many values we want - you could match any function
Slope
will decrease as yo uapproach the minimal value
Matrix multiplication
why - we need to have to do a lot of mx + b, and add them up
we will do a lot of theme, m are the paramaters right? we might have millon
paramaters
and we will have a lot of variable - and we mulyiply all of the variables, for
example, each pixel of an image, time a cofficeint, and add them togeter
so the cofficient will be the paramtaer, and then we will have the actual
inputs
lecture 3 47
6.0-Blog
what we did
Part 1
understand how to use different archeticture for your models, to get the
best archeticture
Part 2
Learn about how a neural net actually works
Main ideas
lecture 3 48