0% found this document useful (0 votes)
4 views

Fast ai class notes

Uploaded by

omerrob13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Fast ai class notes

Uploaded by

omerrob13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

💡

lecture 3

important - tanish, on kaggle datasets

https://forums.fast.ai/t/lesson-3-official-topic/96254/9

1.0-Tips
learning tips by tanisq

learning tips by me

asking question about the lecture and what you learned:

how can i use it?

what did i actually learned?

what can i do with that information

will help you on how to think!

2.0-Resources
Important resources
https://forums.fast.ai/t/lesson-3-notebooks/104821
https://forums.fast.ai/t/lesson-3-official-topic/96254

seeting fast ai in mac

https://forums.fast.ai/t/best-practices-for-setting-up-fastai-on-mackbook-
pro-with-m1-max-chip/98961/4

lecture 3 1
https://forums.fast.ai/t/best-practices-for-setting-up-fastai-on-mackbook-
pro-with-m1-max-chip/98961/3

math resources

derivitative - really important to understand the concept:


https://www.khanacademy.org/math/differential-calculus/dc-diff-intro

3.0 - Missions
missions list

4.0- Book
chapter 4 - first iteraiton

showing whats inside the path, the actual folder

Everything in a computer is a number

lecture 3 2
so the idea - just lets look at a specific section of the image.

everything is number , so we view the number representation of the


section of the image

so we look at a sction startin 4 pixels from the top and left - so we view
it as a numpy array, which is a numbered representation of the image

lecture 3 3
gradient

pixels

so the image as total of 768 pixels

lecture 3 4
Tensor

lecture 3 5
lecture 3 6
dimensions

a stack - think about a “‫ “ שטוחה‬- meaning, each matriexes is “‫”שטוחה‬, and


they stack on each other

lecture 3 7
Metric

Over fit

lecture 3 8
lecture 3 9
Stochasit graident descent
testing the effectivnes of any current weight assigment in terms of
preformance, and provide a mechanisem for altering the weight assigment
to improve prefromance - and do that automatic, so a mchine would learn
from it expirence

lecture 3 10
weights and pixels

lecture 3 11
lecture 3 12
lecture 3 13
lecture 3 14
Derivitative
Lets assume i have a loss function ok, which depends upon a paramater

so our loss function is the quadratic function, yes?

than, because its our loss function, wetry some arbitrary input to it, and
see what is the result - what is the loss value

now, we would like to adujst the paramaterby alittle bit, and see what
happens

so its the same as the slope - think of it, like you change it,

lecture 3 15
book - baidc idea

no matter how complex the nerual functions became, the basic idea is still
simpler

This basic idea goes all the way back to Isaac Newton, who pointed out that
we can
optimize arbitrary functions in this way. Regardless of how complicated our
functions
become, this basic approach of gradient descent will not significantly change.
The
only minor changes we will seess later in this book are some handy ways we
can make it
faster, by finding better steps.

Using calculaus
just allow us to calculate quickly how the change in paramaters affect the
loss value

lecture 3 16
This is dervitateive
how much a change itn its paratmer, will change the result of the original
function

dervitative calculate the change, rather than value

For instance, the


derivative of the quadratic function at the value 3 tells us how rapidly the
function
changes at the value 3
You may remember from your high school calculus class that the derivative of
a func‐
tion tells you how much a change in its parameters will change its result.

this is calculuas - if we know how the function will change, the loss
function, we know what will need to do to the paramtaers to make the
loss value smaller

we calculate b ased on every weight

lecture 3 17
so just calculate the derivtative for one wieght (parmataer), and treat all
other paramaters as constant, as not changed

than do the rest for all of it

caclulating gradients
calculating the gradients for the paramaters values at a single point of
time

basiclly just claculating all the current parmaters gradient

Learning rate
we get the gradients - are the slope of the function, if its very large, we
have more adjusment

lecture 3 18
Distingiush
inputs - wht inputs of the function

output - relationships

parmatares - are the parmaters of your model.


they are basiclly
define your function

the paramaetrs are you function signature

the input always changeing

important

Loss function

lecture 3 19
The process
1. start with a function that is your model, with a given praramtaers

2. intialize the paramaters

3. calculate the prediction for your function for your inputs with the current
paramater

a. than, you will have the prediction for your inputs (the valdation set),
with the current paramaters

b. of course, you could see how far off you

4. than, see how far your preditions are from your actuals using the loss
function

a. calculate the loss

5. calculate teh gradients to improve the loss

6. steps the paramaters (change them based on the learning rate)

The main idea


we started with syntetic function, and plotting the data points

this is our actuals, so we have the “oridinal” function, that we use to


create the data points

we input the data points to the function and plot the output

than we create a function - this in real life will be the neural function, but
now, lets assume its a quadratic

than we caclulate the prediction based on the function we create (our


model), and plotting the prediction, and see if it is close to our actuals -

lecture 3 20
if the shape of the prediction close to our actuals look (patterns)

than, we improved the loss

the goal is to find the best possible quadratic

that when we insert the input to it

the output will actually resemble the output of the actuals

so you do that on the validation set so you actually have the


output of the actuals, you can plot it

and then, for the model, you can see each time, you can plot the
prediction, and see it the predictions pattern / shape is simmilar to
the actual shape

and you will improve it each time, so at the end, you will have a
simmilar function shape for the predictions as of the actuals

summary
at first output from our input wont have anything to do with what we want

lecture 3 21
using the label data
the model gives us outputs

we compare them with the targets (we have the labels for those inputs)

we do that using the loss function (mse), this is how we compare,


using the predictions vs actuals

than we calculat the loss value - how wrong our prediction from the
actuals where

then we improve the model by changing weidths

5.0- Lecture

tips how to learn

lecture 3 22
Run notebooks and expirement:
run every code cell from the book - expirementing with different
inputs/outputs

reproduce results - use the clean

what is it for - whats it is going to output, if anything

repeat with different dataset

lecture 3 - first iterations

main concepts
training piece → than gets the model

feed the model inputs and it spits output based on the model

error rate- accuracy

Queastions

lecture 3 23
what does it mean “inference” seconds in training a model

Model.pkl
fit function to data

Derivitative
if you increase the input, the output increase /decrease - basiclly, the slop
of the function

Tensor
works with a list, with anything basiclly.

Optimization
gradienct desecent - caluclate. the gradient(paaramaters), and decrease
the loss

This is deep learning - deep learning is a


metaphor for life → do one iteration, and
improve over time.
just do this

then optimize

lecture 3 24
values adding together

gradient desecent to optimize the paramaters

and samples of inputs and outputs you want

the computer draws the owl

using gradient descenet, to set some parametrs, to make a wiggle


functions, which is addition of vectors?

lecture 3 25
model choosing is the last thing
once you clean , gather the data, and augmented it

you can reason yourself about a model - depends on the task

you need it the most accurate? the fastest? etc

train the model first!

fit function to data


we start with a flexible function (neural network), and we get it to do a
praticualr thing - recognize the pattern in our data

so the idea is to find a function that fit our data? so neural networks is
just a function

loss function - a measure of how good the function

for each paramater - if we move it up, is it makes it better? or


not?

Derivitative - if you increaes the input

‫ אז הנגזרת תיהיה גדולה‬, ‫בהתחלה השיפוע נניח מאוד גבוה‬

‫ הנגזרת מתקרבת לאפס‬,‫ככל שהשיפוע יקטן‬

‫ השיפוע גבוה או לא‬,‫ האם בערך עצמו‬,‫אז הנגזרת בעצם תגיד לכל ערך‬

slope === gradient

how rapidaly the function change at value 3 - ‫כלומר בערך של‬


‫הפרמטר הזה‬
‫?האם הפונקציה משתנה מהר‬

Our goal - is to make the loss small, we want to make the loss
smaller

lecture 3 26
if we know how our function will change -

we have a paramater, and the function, that tell us how rapidly the
function change in this paramater

our goal is to make the loss smaller

so, we will change the paramataer a bit

and see if it is make our loss function output a better

lecture 3 27
the magnitue tell us that at this point , the slop is fairly steep, so
the loss change significant when we adjust w - each time we
adjust w, the output change significatly

so lets remove by some value * the slope, and see what happens,

lecture 3 28
why use the slope?

because, the slope will allow us to undersatnd how much of a


step we can take:

in big slope, it means, that it changes very rapidally, so we


might take smaller step

on more general slope, we should take bigger step,


because the change is so small each adjusmet, so we can
take bigger step

lecture 3 - firs iteration

What we did so far


the goal - we just did detector / calssifier

trainning pipece and model.pkl


model → you feed input and spits output

lecture 3 29
Paramaters
where are they come from, and how they helped us

machine learning models → fit functions into data

start with flexible function, and get her to refognize the patterns in the
data we give it

start with a function, yes?

then the goal - is to find the most appropritate function to our data -
so we will test different function, to try the best one

so here - we have the abillity to create any quadratic function, and we will
look for the best one for our data

So the idea:
we try to map the data to a shape of a function - and try to make basiclly
the function describe the data as much as possible, with little noise as
possible

the goal - find the best function that match the data we have!!!!!

the steps

lecture 3 30
1. plot your data

2. plot the starting function

3. try to change the paramaters, and see if the function describe (or fit) your
data better

a. change the paramaters: you can increase or decrease and see what
improving our function

increase the paramater

decrease the paramater

then reitrate, for all the paramaters again, until you find the best
paramater value

The question
if i move the paramater, does the model improve or disimprove?

than we need to have a meausrment of how good is the model - to


know the effect of the parmaters on the model

this is called loss functions - the value that it output, is the loss valuue
- how good the model is.

so the goal - try to find the paramaters that reduce the loss value - that
improve the model

The derivitative:
if i increase the paramater - does the loss get better or worse?

the loss - we want smallest loss as possible.

the derivitative - if you increasae the input, the output increase or


decrease, and by how much (the slop/gradient )

so this is the idea:

we want the smallest lost

we get the paramaters,insert them to the loss function and get the
loss value with this paramaters - the measurment for how good

lecture 3 31
our model FOR THIS PARAMATERS

then we want to adjust the paramaters, so we will check:

if i increase paramater “a” → the loss is a function, so if i


increase paramater “a”, does the loss value improve or not
improving?

we adjust the paramaters, it make the loss value go down or


up?

but the question is, how the paramater, connected to the slop
of the function?

so the biggerst question:

how the paramaters, are working with the loss function?

like, how the paramater, like we have million paramaters

how does the loss function connected to it?

ok i understand if the slope of this single paramater is negative,


than we will need to increase it to go to the minimum point

but if f’(x) is the derivtative of f(x) which is the loss function

we put the paramater in the derivtative of the loss function?


why?

lecture 3 32
loss function - depends upon the paramaters

lecture 3 33
Clarfinyg questions

lecture 3 34
lecture 3 35
lecture 3 36
paramaters vs values

conclusion - most important

lecture 3 37
loss function - measure erros in this outputs!

lecture 3 38
What is the function
what is the function you find paramaters for?

this is the main idea - values getting added together, and


gradient descent optimize the paramaters and sample of inputs
and outputs you want: 49:00

Why small number increase


for each increase in the respecitve paramater by one unit, what
will happen to the loss?

everything is about fit functions to data

lecture 3 39
each time we have a paramater, we put in the function (in our
model)

the paramater is the x value

than, we measure its slope

than we change the x value

so the x value will be left or right from the blue circle

so the paramater - is just the inputs, and we change them,


but the function , is the same, and we all the time measure
the slope of the paramater

Linear algebra
matrix multiplication

Dependeant variable

lecture 3 40
the thing we try to predict - is it a bird? is he survived? etc

gradient vs derivative

each time we increase the paramater

depends on how “‫ ”תלול‬is the slope - if the slope is very “‫”תלול‬, meaning,
if we change this paramater, the loss value increase/decreaese a lot - we
need to increase the paramater a lot

think about the function f(X)=x^2

when its ‫תלולה‬, to get to the minimum point, you will need to increase x
a lot

but if you are right next to x=0, where its the minimum value, next to it,
the ‫ שיפוע‬is very small, so you need to increase very little if you are
very close to x=0

so in x =-50, you will increase a lot

but for x = 0.7, the slope will be very ‫ קטן‬so you increase little

gradient descenet

calculate the gradient (derivative)

and do a decsent - decrease the loss

lecture 3 41
partial in python

the idea - we take a general function, and create specizliaed versions by


fixing some paramaters

israeli term of gradient descent

lecture 3 42
lecture 3 - second iteration

Matrix multiplicaiton and rectified linear

what is the function that we find paramaters for

relationship between a paramater and wether a pixel is part of a


basssed hum is quadratic is unly hielikley

we try to improve paramaters for a function, right?

so we have a function, we try to improve paramaters for.

the function is neural net

the paramaters are the function

so its unlikely that the paramaters will be resulted in a quadratic


function

lecture 3 43
rectified linear function
infinintly flexible function

the idea - if we added more than one linear function togetehr, we can get
to any function we please.

lecture 3 44
lecture 3 45
adjusting paramaters
what you did when plotting, changing the “m”

is changing the actual parmaters

and that infninelty flexible function can create fucking everything!

rectified linear
we create the flexible function just using addition of recfitifed linear
function

adding - create the bump, downoard and inwar swap

lecture 3 46
then we can add many values we want - you could match any function

and use gradient descent for the paramaters

Slope
will decrease as yo uapproach the minimal value

thats why we need to have the learning rate

Matrix multiplication
why - we need to have to do a lot of mx + b, and add them up

we will do a lot of theme, m are the paramaters right? we might have millon
paramaters

and we will have a lot of variable - and we mulyiply all of the variables, for
example, each pixel of an image, time a cofficeint, and add them togeter

than we do it for each paramater

so the cofficient will be the paramtaer, and then we will have the actual
inputs

lecture 3 47
6.0-Blog
what we did

Part 1
understand how to use different archeticture for your models, to get the
best archeticture

Part 2
Learn about how a neural net actually works

Main ideas

lecture 3 48

You might also like