0% found this document useful (0 votes)

4 views

Fast ai class notes

Uploaded by

omerrob13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

Fast ai class notes

Uploaded by

omerrob13

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 48

💡

lecture 3

important - tanish, on kaggle datasets

https://forums.fast.ai/t/lesson-3-official-topic/96254/9

1.0-Tips
learning tips by tanisq

learning tips by me

asking question about the lecture and what you learned:

how can i use it?

what did i actually learned?

what can i do with that information

will help you on how to think!

2.0-Resources
Important resources
https://forums.fast.ai/t/lesson-3-notebooks/104821
https://forums.fast.ai/t/lesson-3-official-topic/96254

seeting fast ai in mac

https://forums.fast.ai/t/best-practices-for-setting-up-fastai-on-mackbook-
pro-with-m1-max-chip/98961/4

lecture 3 1
https://forums.fast.ai/t/best-practices-for-setting-up-fastai-on-mackbook-
pro-with-m1-max-chip/98961/3

math resources

derivitative - really important to understand the concept:

https://www.khanacademy.org/math/differential-calculus/dc-diff-intro

3.0 - Missions
missions list

4.0- Book
chapter 4 - first iteraiton

showing whats inside the path, the actual folder

Everything in a computer is a number

lecture 3 2
so the idea - just lets look at a specific section of the image.

everything is number , so we view the number representation of the

section of the image

so we look at a sction startin 4 pixels from the top and left - so we view
it as a numpy array, which is a numbered representation of the image

lecture 3 3
gradient

pixels

so the image as total of 768 pixels

lecture 3 4
Tensor

lecture 3 5
lecture 3 6
dimensions

a stack - think about a “‫ “ שטוחה‬- meaning, each matriexes is “‫”שטוחה‬, and

they stack on each other

lecture 3 7
Metric

Over fit

lecture 3 8
lecture 3 9
Stochasit graident descent
testing the effectivnes of any current weight assigment in terms of
preformance, and provide a mechanisem for altering the weight assigment
to improve prefromance - and do that automatic, so a mchine would learn
from it expirence

lecture 3 10
weights and pixels

lecture 3 11
lecture 3 12
lecture 3 13
lecture 3 14
Derivitative
Lets assume i have a loss function ok, which depends upon a paramater

so our loss function is the quadratic function, yes?

than, because its our loss function, wetry some arbitrary input to it, and
see what is the result - what is the loss value

now, we would like to adujst the paramaterby alittle bit, and see what
happens

so its the same as the slope - think of it, like you change it,

lecture 3 15
book - baidc idea

no matter how complex the nerual functions became, the basic idea is still
simpler

This basic idea goes all the way back to Isaac Newton, who pointed out that
we can
optimize arbitrary functions in this way. Regardless of how complicated our
functions
become, this basic approach of gradient descent will not significantly change.
The
only minor changes we will seess later in this book are some handy ways we
can make it
faster, by finding better steps.

Using calculaus
just allow us to calculate quickly how the change in paramaters affect the
loss value

lecture 3 16
This is dervitateive
how much a change itn its paratmer, will change the result of the original
function

dervitative calculate the change, rather than value

For instance, the

derivative of the quadratic function at the value 3 tells us how rapidly the
function
changes at the value 3
You may remember from your high school calculus class that the derivative of
a func‐
tion tells you how much a change in its parameters will change its result.

this is calculuas - if we know how the function will change, the loss
function, we know what will need to do to the paramtaers to make the
loss value smaller

we calculate b ased on every weight

lecture 3 17
so just calculate the derivtative for one wieght (parmataer), and treat all
other paramaters as constant, as not changed

than do the rest for all of it

caclulating gradients
calculating the gradients for the paramaters values at a single point of
time

basiclly just claculating all the current parmaters gradient

Learning rate
we get the gradients - are the slope of the function, if its very large, we
have more adjusment

lecture 3 18
Distingiush
inputs - wht inputs of the function

output - relationships

parmatares - are the parmaters of your model.

they are basiclly
define your function

the paramaetrs are you function signature

the input always changeing

important

Loss function

lecture 3 19
The process
1. start with a function that is your model, with a given praramtaers

2. intialize the paramaters

3. calculate the prediction for your function for your inputs with the current
paramater

a. than, you will have the prediction for your inputs (the valdation set),
with the current paramaters

b. of course, you could see how far off you

4. than, see how far your preditions are from your actuals using the loss
function

a. calculate the loss

5. calculate teh gradients to improve the loss

6. steps the paramaters (change them based on the learning rate)

The main idea

we started with syntetic function, and plotting the data points

this is our actuals, so we have the “oridinal” function, that we use to

create the data points

we input the data points to the function and plot the output

than we create a function - this in real life will be the neural function, but
now, lets assume its a quadratic

than we caclulate the prediction based on the function we create (our

model), and plotting the prediction, and see if it is close to our actuals -

lecture 3 20
if the shape of the prediction close to our actuals look (patterns)

than, we improved the loss

the goal is to find the best possible quadratic

that when we insert the input to it

the output will actually resemble the output of the actuals

so you do that on the validation set so you actually have the

output of the actuals, you can plot it

and then, for the model, you can see each time, you can plot the
prediction, and see it the predictions pattern / shape is simmilar to
the actual shape

and you will improve it each time, so at the end, you will have a
simmilar function shape for the predictions as of the actuals

summary
at first output from our input wont have anything to do with what we want

lecture 3 21
using the label data
the model gives us outputs

we compare them with the targets (we have the labels for those inputs)

we do that using the loss function (mse), this is how we compare,

using the predictions vs actuals

than we calculat the loss value - how wrong our prediction from the
actuals where

then we improve the model by changing weidths

5.0- Lecture

tips how to learn

lecture 3 22
Run notebooks and expirement:
run every code cell from the book - expirementing with different
inputs/outputs

reproduce results - use the clean

what is it for - whats it is going to output, if anything

repeat with different dataset

lecture 3 - first iterations

main concepts
training piece → than gets the model

feed the model inputs and it spits output based on the model

error rate- accuracy

Queastions

lecture 3 23
what does it mean “inference” seconds in training a model

Model.pkl
fit function to data

Derivitative
if you increase the input, the output increase /decrease - basiclly, the slop
of the function

Tensor
works with a list, with anything basiclly.

Optimization
gradienct desecent - caluclate. the gradient(paaramaters), and decrease
the loss

This is deep learning - deep learning is a

metaphor for life → do one iteration, and
improve over time.
just do this

then optimize

lecture 3 24
values adding together

gradient desecent to optimize the paramaters

and samples of inputs and outputs you want

the computer draws the owl

using gradient descenet, to set some parametrs, to make a wiggle

functions, which is addition of vectors?

lecture 3 25
model choosing is the last thing
once you clean , gather the data, and augmented it

you can reason yourself about a model - depends on the task

you need it the most accurate? the fastest? etc

train the model first!

fit function to data

we start with a flexible function (neural network), and we get it to do a
praticualr thing - recognize the pattern in our data

so the idea is to find a function that fit our data? so neural networks is
just a function

loss function - a measure of how good the function

for each paramater - if we move it up, is it makes it better? or

not?

Derivitative - if you increaes the input

‫ אז הנגזרת תיהיה גדולה‬, ‫בהתחלה השיפוע נניח מאוד גבוה‬

‫ הנגזרת מתקרבת לאפס‬,‫ככל שהשיפוע יקטן‬

‫ השיפוע גבוה או לא‬,‫ האם בערך עצמו‬,‫אז הנגזרת בעצם תגיד לכל ערך‬

slope === gradient

how rapidaly the function change at value 3 - ‫כלומר בערך של‬

‫הפרמטר הזה‬
‫?האם הפונקציה משתנה מהר‬

Our goal - is to make the loss small, we want to make the loss
smaller

lecture 3 26
if we know how our function will change -

we have a paramater, and the function, that tell us how rapidly the
function change in this paramater

our goal is to make the loss smaller

so, we will change the paramataer a bit

and see if it is make our loss function output a better

lecture 3 27
the magnitue tell us that at this point , the slop is fairly steep, so
the loss change significant when we adjust w - each time we
adjust w, the output change significatly

so lets remove by some value * the slope, and see what happens,

lecture 3 28
why use the slope?

because, the slope will allow us to undersatnd how much of a

step we can take:

in big slope, it means, that it changes very rapidally, so we

might take smaller step

on more general slope, we should take bigger step,

because the change is so small each adjusmet, so we can
take bigger step

lecture 3 - firs iteration

What we did so far

the goal - we just did detector / calssifier

trainning pipece and model.pkl

model → you feed input and spits output

lecture 3 29
Paramaters
where are they come from, and how they helped us

machine learning models → fit functions into data

start with flexible function, and get her to refognize the patterns in the
data we give it

start with a function, yes?

then the goal - is to find the most appropritate function to our data -
so we will test different function, to try the best one

so here - we have the abillity to create any quadratic function, and we will
look for the best one for our data

So the idea:
we try to map the data to a shape of a function - and try to make basiclly
the function describe the data as much as possible, with little noise as
possible

the goal - find the best function that match the data we have!!!!!

the steps

lecture 3 30
1. plot your data

2. plot the starting function

3. try to change the paramaters, and see if the function describe (or fit) your
data better

a. change the paramaters: you can increase or decrease and see what
improving our function

increase the paramater

decrease the paramater

then reitrate, for all the paramaters again, until you find the best
paramater value

The question
if i move the paramater, does the model improve or disimprove?

than we need to have a meausrment of how good is the model - to

know the effect of the parmaters on the model

this is called loss functions - the value that it output, is the loss valuue
- how good the model is.

so the goal - try to find the paramaters that reduce the loss value - that
improve the model

The derivitative:
if i increase the paramater - does the loss get better or worse?

the loss - we want smallest loss as possible.

the derivitative - if you increasae the input, the output increase or

decrease, and by how much (the slop/gradient )

so this is the idea:

we want the smallest lost

we get the paramaters,insert them to the loss function and get the
loss value with this paramaters - the measurment for how good

lecture 3 31
our model FOR THIS PARAMATERS

then we want to adjust the paramaters, so we will check:

if i increase paramater “a” → the loss is a function, so if i

increase paramater “a”, does the loss value improve or not
improving?

we adjust the paramaters, it make the loss value go down or

up?

but the question is, how the paramater, connected to the slop
of the function?

so the biggerst question:

how the paramaters, are working with the loss function?

like, how the paramater, like we have million paramaters

how does the loss function connected to it?

ok i understand if the slope of this single paramater is negative,

than we will need to increase it to go to the minimum point

but if f’(x) is the derivtative of f(x) which is the loss function

we put the paramater in the derivtative of the loss function?

why?

lecture 3 32
loss function - depends upon the paramaters

lecture 3 33
Clarfinyg questions

lecture 3 34
lecture 3 35
lecture 3 36
paramaters vs values

conclusion - most important

lecture 3 37
loss function - measure erros in this outputs!

lecture 3 38
What is the function
what is the function you find paramaters for?

this is the main idea - values getting added together, and

gradient descent optimize the paramaters and sample of inputs
and outputs you want: 49:00

Why small number increase

for each increase in the respecitve paramater by one unit, what
will happen to the loss?

everything is about fit functions to data

lecture 3 39
each time we have a paramater, we put in the function (in our
model)

the paramater is the x value

than, we measure its slope

than we change the x value

so the x value will be left or right from the blue circle

so the paramater - is just the inputs, and we change them,

but the function , is the same, and we all the time measure
the slope of the paramater

Linear algebra
matrix multiplication

Dependeant variable

lecture 3 40
the thing we try to predict - is it a bird? is he survived? etc

gradient vs derivative

each time we increase the paramater

depends on how “‫ ”תלול‬is the slope - if the slope is very “‫”תלול‬, meaning,
if we change this paramater, the loss value increase/decreaese a lot - we
need to increase the paramater a lot

think about the function f(X)=x^2

when its ‫תלולה‬, to get to the minimum point, you will need to increase x
a lot

but if you are right next to x=0, where its the minimum value, next to it,
the ‫ שיפוע‬is very small, so you need to increase very little if you are
very close to x=0

so in x =-50, you will increase a lot

but for x = 0.7, the slope will be very ‫ קטן‬so you increase little

gradient descenet

calculate the gradient (derivative)

and do a decsent - decrease the loss

lecture 3 41
partial in python

the idea - we take a general function, and create specizliaed versions by

fixing some paramaters

israeli term of gradient descent

lecture 3 42
lecture 3 - second iteration

Matrix multiplicaiton and rectified linear

what is the function that we find paramaters for

relationship between a paramater and wether a pixel is part of a

basssed hum is quadratic is unly hielikley

we try to improve paramaters for a function, right?

so we have a function, we try to improve paramaters for.

the function is neural net

the paramaters are the function

so its unlikely that the paramaters will be resulted in a quadratic

function

lecture 3 43
rectified linear function
infinintly flexible function

the idea - if we added more than one linear function togetehr, we can get
to any function we please.

lecture 3 44
lecture 3 45
adjusting paramaters
what you did when plotting, changing the “m”

is changing the actual parmaters

and that infninelty flexible function can create fucking everything!

rectified linear
we create the flexible function just using addition of recfitifed linear
function

adding - create the bump, downoard and inwar swap

lecture 3 46
then we can add many values we want - you could match any function

and use gradient descent for the paramaters

Slope
will decrease as yo uapproach the minimal value

thats why we need to have the learning rate

Matrix multiplication
why - we need to have to do a lot of mx + b, and add them up

we will do a lot of theme, m are the paramaters right? we might have millon
paramaters

and we will have a lot of variable - and we mulyiply all of the variables, for
example, each pixel of an image, time a cofficeint, and add them togeter

than we do it for each paramater

so the cofficient will be the paramtaer, and then we will have the actual
inputs

lecture 3 47
6.0-Blog
what we did

Part 1
understand how to use different archeticture for your models, to get the
best archeticture

Part 2
Learn about how a neural net actually works

Main ideas

lecture 3 48

Shayak
No ratings yet
Shayak
6 pages
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
FAI 4 Mathematical Concepts II
No ratings yet
FAI 4 Mathematical Concepts II
39 pages
Autodiff
No ratings yet
Autodiff
12 pages
mit18_s096iap23_lec06
No ratings yet
mit18_s096iap23_lec06
9 pages
FF Calculus 2
No ratings yet
FF Calculus 2
12 pages
12 - ASAP - NPTEL - Neural Network - Let4
No ratings yet
12 - ASAP - NPTEL - Neural Network - Let4
13 pages
Vanishing Gradient Problem
No ratings yet
Vanishing Gradient Problem
3 pages
Sms Essay 2
No ratings yet
Sms Essay 2
6 pages
Perceptrons
No ratings yet
Perceptrons
12 pages
ML Labs
No ratings yet
ML Labs
46 pages
Lecture 4
No ratings yet
Lecture 4
101 pages
Numerical Methods - An Introduction
No ratings yet
Numerical Methods - An Introduction
30 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
04 Numerical
No ratings yet
04 Numerical
46 pages
Neural Networks - Learning
No ratings yet
Neural Networks - Learning
26 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
ML Notes
No ratings yet
ML Notes
14 pages
To Numerical Methods: Prof. P. K. Jha
No ratings yet
To Numerical Methods: Prof. P. K. Jha
82 pages
Lecture12 Diff
No ratings yet
Lecture12 Diff
31 pages
nn2
No ratings yet
nn2
12 pages
Unit 2
No ratings yet
Unit 2
37 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
Note Set 1 - The Basics: 1.1 - Overview
No ratings yet
Note Set 1 - The Basics: 1.1 - Overview
24 pages
Polynomial Regression From Scratch in Python - by Rashida Nasrin Sucky - Towards Data Science
No ratings yet
Polynomial Regression From Scratch in Python - by Rashida Nasrin Sucky - Towards Data Science
1 page
Learning 3
No ratings yet
Learning 3
98 pages
Introduction To Numerical Modeling and Device Modeling Outline
No ratings yet
Introduction To Numerical Modeling and Device Modeling Outline
8 pages
DL UNIT-I
No ratings yet
DL UNIT-I
30 pages
Lec06 Derivatives
No ratings yet
Lec06 Derivatives
22 pages
Machine Learning and Pattern Recognition Week 8 - Backprop
No ratings yet
Machine Learning and Pattern Recognition Week 8 - Backprop
8 pages
Demystifying Deep Learning
No ratings yet
Demystifying Deep Learning
68 pages
AIML - Unit 4 Notes
No ratings yet
AIML - Unit 4 Notes
23 pages
Lecture 2: Basics and Definitions: Networks As Data Models
No ratings yet
Lecture 2: Basics and Definitions: Networks As Data Models
28 pages
FL LectureNotes
No ratings yet
FL LectureNotes
92 pages
Text Book
No ratings yet
Text Book
129 pages
Practical-5_2CEIT606_Artificial Intelligence
No ratings yet
Practical-5_2CEIT606_Artificial Intelligence
14 pages
50inference
No ratings yet
50inference
31 pages
ML 01
No ratings yet
ML 01
24 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
Lec 3
No ratings yet
Lec 3
21 pages
Lec 25
No ratings yet
Lec 25
15 pages
Course Code::: Neural Network and Fuzzy Logics ECE3008
No ratings yet
Course Code::: Neural Network and Fuzzy Logics ECE3008
11 pages
Backpropagation: Loading Data
No ratings yet
Backpropagation: Loading Data
12 pages
Tut02 - Calculus Crash Course
No ratings yet
Tut02 - Calculus Crash Course
24 pages
09: Neural Networks - Learning: Neural Network Cost Function
No ratings yet
09: Neural Networks - Learning: Neural Network Cost Function
9 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
5 Round-Off Errors and Truncation Errors
No ratings yet
5 Round-Off Errors and Truncation Errors
35 pages
CHE 358 Numerical Methods For Engineers: Dr. Martinson Addo Nartey
No ratings yet
CHE 358 Numerical Methods For Engineers: Dr. Martinson Addo Nartey
33 pages
ml
No ratings yet
ml
10 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Aravind Rangamreddy 500195259 cs3
No ratings yet
Aravind Rangamreddy 500195259 cs3
8 pages
4q37io 1
No ratings yet
4q37io 1
12 pages
Alice's Adventures in A Differentiable Wonderland
No ratings yet
Alice's Adventures in A Differentiable Wonderland
279 pages
Document (4)
No ratings yet
Document (4)
4 pages
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
No ratings yet
Week - 5 (Deep Learning) Q. 1) Explain The Architecture of Feed Forward Neural Network or Multilayer Perceptron. (12 Marks)
7 pages
NUMSOL-MERGE-FILES
No ratings yet
NUMSOL-MERGE-FILES
54 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Alice Book Volume 1
No ratings yet
Alice Book Volume 1
281 pages
Artificial Neural Networks: Multilayer Perceptrons Backpropagation
No ratings yet
Artificial Neural Networks: Multilayer Perceptrons Backpropagation
71 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
HGI-vs Mill Loading PDF
No ratings yet
HGI-vs Mill Loading PDF
8 pages
311-Article Text-836-1-10-20200622
No ratings yet
311-Article Text-836-1-10-20200622
16 pages
Booklet 5th - Mystery
No ratings yet
Booklet 5th - Mystery
34 pages
The Participles 2
100% (3)
The Participles 2
13 pages
G2Q1 - FIL - Sina Estella at Lisa at Isang Linggo Sa Klase Ni Gng. Reyes - 033016 - FINAL
100% (1)
G2Q1 - FIL - Sina Estella at Lisa at Isang Linggo Sa Klase Ni Gng. Reyes - 033016 - FINAL
24 pages
Introduction - Logician (INTP) Personality - 16personalities
No ratings yet
Introduction - Logician (INTP) Personality - 16personalities
7 pages
Msds Sodium Cyanide Solution
No ratings yet
Msds Sodium Cyanide Solution
14 pages
Wiring Diagram: Volvo
100% (1)
Wiring Diagram: Volvo
324 pages
ZCR As-I 3.0 Profibus Gateway User Manual v1.0 ML 200177548
No ratings yet
ZCR As-I 3.0 Profibus Gateway User Manual v1.0 ML 200177548
179 pages
Shop DWG 11.01.24
No ratings yet
Shop DWG 11.01.24
3 pages
Bus7102-Mathematics For Decision: Assignment
No ratings yet
Bus7102-Mathematics For Decision: Assignment
17 pages
Good Hope School 11-16-2A Ch.4 Approximation Errors CQ
No ratings yet
Good Hope School 11-16-2A Ch.4 Approximation Errors CQ
4 pages
COMM 210 Contemporary Business Thinking: Class 10 Performance, Measurements and Evaluation
No ratings yet
COMM 210 Contemporary Business Thinking: Class 10 Performance, Measurements and Evaluation
16 pages
5.1.1 Exam Questions
No ratings yet
5.1.1 Exam Questions
14 pages
Fluid Dynamics : Topic 2 M. C. Christianmercado
No ratings yet
Fluid Dynamics : Topic 2 M. C. Christianmercado
14 pages
Chapter 4 - Rev
No ratings yet
Chapter 4 - Rev
17 pages
The Book: Subtitle of The Book. City: Publishing Company.: If There Is No Author
No ratings yet
The Book: Subtitle of The Book. City: Publishing Company.: If There Is No Author
2 pages
Water Management in Crops
No ratings yet
Water Management in Crops
30 pages
Structuralism
No ratings yet
Structuralism
10 pages
Object Oriented Software Engineering
No ratings yet
Object Oriented Software Engineering
2 pages
Malaysia School Holiday 2010
No ratings yet
Malaysia School Holiday 2010
3 pages
2017 Influence of Loading Frequency and Role of Surface Micro-Defects On
No ratings yet
2017 Influence of Loading Frequency and Role of Surface Micro-Defects On
12 pages
A How To Business Plan Guide Template 1
No ratings yet
A How To Business Plan Guide Template 1
44 pages
Cerebaus - Mmopl.info SFM Screens Reports RptAutoExpVw - Asp SelSf DLMR07&SelM 2&SelY 2024
No ratings yet
Cerebaus - Mmopl.info SFM Screens Reports RptAutoExpVw - Asp SelSf DLMR07&SelM 2&SelY 2024
3 pages
The Economics of Renewable en
No ratings yet
The Economics of Renewable en
2 pages
Stress Management of Senior English Language Teachers On Online Pedagogy: A Multiple Case Study
No ratings yet
Stress Management of Senior English Language Teachers On Online Pedagogy: A Multiple Case Study
12 pages
L6 NVQ COMP CCOM Iss3-2
No ratings yet
L6 NVQ COMP CCOM Iss3-2
243 pages
Cambodia Standard of Audit
No ratings yet
Cambodia Standard of Audit
76 pages
PanoDreamer
No ratings yet
PanoDreamer
10 pages
DOC-20241205-WA0000
No ratings yet
DOC-20241205-WA0000
2 pages