0% found this document useful (0 votes)
30 views

ML Notes

This document discusses various machine learning concepts including learning systems, management of information and material flows, CRUD operations in systems, qualitative and quantitative data, the goals of machine learning, defining and processing training data, function approximation algorithms, approximation in different contexts, regression, classification vs clustering, gradient descent, decision trees, and k-fold cross validation. It provides examples to illustrate key machine learning techniques and challenges.

Uploaded by

Pooja Gangapure
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views

ML Notes

This document discusses various machine learning concepts including learning systems, management of information and material flows, CRUD operations in systems, qualitative and quantitative data, the goals of machine learning, defining and processing training data, function approximation algorithms, approximation in different contexts, regression, classification vs clustering, gradient descent, decision trees, and k-fold cross validation. It provides examples to illustrate key machine learning techniques and challenges.

Uploaded by

Pooja Gangapure
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

11/1- Introduction

18/1-

Learning System-
It is a system which provides the compilation of data about resources for learning. Every system
has its own Input-Processing-Output methods. When you are going to deal with the compilation
of data about resources for learning- for creating and storing all the learning resources, you
need some sort of access to those learning resources. In our system, resources are some sort of
software/hardware/internet connection/switches/hub (Any I/O devices).
Management is going to manage the Information flow, the material flow and money flow.
LMS enables you to create, manage and deliver. Ex. MS Word.
When a system allows CRUD (Create, Read, Update, Delete), it allows you to go for simulation.
Simulation basically means some sort of CRUD operations.
Ex- Moodle

Tom Mitchell – A computer program is said to learn from experience “E” with respect to some
class of tasks “T” and performance measure “P”, if its performance at tasks in T, as measured by
P, improves with experience E. Ex- Email Spam Detection

Dealing with numbers – quantitative (Statistics)


Dealing with numbers and letters and strings – Qualitative (Fuzzy logic)
//Fuzzy logic contains the multiple logical values and these values are the truth values of a
variable or problem between 0 and 1

GOAL OF ML (ML is a subset of AI)


Achieve thorough understanding about the nature of learning of process of both, human and
form of learning.

25/1-
Target- How to define a target? What will we need to decide a target? Which parameters
should be taken into consideration?

4 Functioning of target-
1. Regression – talks about how many variables, how many independent & dependent
variables, how to look into its dependability ,
2. Precision and recall (used to find out the accuracy of network)
3. Supervised/ Unsupervised
4. ?

Training data – extremely large dataset that is used to teach ML model


Processed data – information

Function approximation algorithm- It is technique for estimating an unknown underlining


function using the historical or available observation from the domain.
Current Data- OLTP (Online Transaction Processing)
Historical data- OLAP (Online Analytic Processing) -neural network as we are looking into
approximation

27/1-
Approximation – So approximation is used whenever a numerical value model or structure or
function is either unknown or difficult to compute. Approximation is used whenever there is
some sort of numerical value is their model is their structure is their function is there and its
nature is difficult to compete. In that case, we're going for approximation.

Approximation when the form of function is known (e.g., for loop or when you have to go to
your native place)

Approximation when the form of function is not known but numerically it is difficult to
compute the exact value like value of pi (e.g., while loop, where you don’t know the amount of
time the loop is going to run). Here we will produce an output which is close to the known
function. E.g. travel to Gujarat – Baroda, Surat, Gandhi Nagar, Ahmedabad

TAYLOR SERIES of a function is the sum of infinite terms which are computed using function
derivative. Here numerical computation is going to be expensive.
Examples-
S = 1/1! + 2/2! + 3/3! + …. n/n!
S = 1+ x/1! - x/2! + x/3! - ….

NEWTON’S METHOD can be used to approximate the roots of polynomial, making it a useful
technique for approximating quantities such as sqrt of different values or reciprocal of different
numbers.
Examples-
sqrt(3) = “approximated value” but sqrt(9) = “known”
X ~ P (X is directly proportional to P)
X ~ 1/P (X is indirectly proportional to P)
APPROXIMATION IN REGRESSION-
 Prediction of an output variable when given set of inputs. The function that truly maps
the input variable to outputs is not known. It is assumed that some linear and nonlinear
regression modeling can approximate the mapping of input to output.
 Predict future values/ predicts the values based on current scenario.
 Input is known and output is unknown. Prediction solely depends upon inputs.
 OLTP(Current data) or OLAP (Historical data, last 5 years/10 years)

Example 1-

Input = Calorie intake per day


Output = its equivalent blood sugar(100-140 )

Example 2-
Classification:
No. of students per class = 146 /3 ~= 50
Division wise = 3
Specialization = 7
Analysis = Value of classification is fixed

Can we fix the value of cluster? (Value of cluster depends upon logical condition)
ML Cluster 1 = 0 or 1-146
CAD Cluster 2 = 0 or 1-146
Total no of students = 146

Willing to join ML? Y or N = 0, 1-146


This concludes that the value of classification is possible to estimate but value of cluster is
relatively difficult.

CLASSIFICATION VS CLUSTERING

LEARNING ALGO 1- Gradient Descent


Going to generate output in the form of class label, let us say, O.(If multiple O1, O2, O3 …)

If you want the output to approximate to 1 and you’re getting .18, .199. .96 etc. You need to
introduce ‘W’. No of inputs should be equal to number of weights. As per the expectations,
that loop will executive till that many times. Otherwise it will recompute again and again.
This is how the neural network will work.

K-means Clustering
1/2-
INDUCTIVE CLASSIFICATION LOGIC –
learning system that learns first order logic. Helpful to classify things which belong to 2/+
classes. Used to classify unseen examples or interpretations.

Can be –
 Deductive (more to less or less to more) ; proven through observation ; difficult to
find the accuracy.
 Inductive ; extracts likely premise from specific and limited observations

HYPOTHESIS SPACE-
It has a general to specific ordering of hypothesis(myth/assumptions). Goal- find the best
fitting hypothesis for the training data.

More constraints more complex.

CANDIDATE ELIMINATION ALGO-


The function is known but it is difficult / numerically expensive to compute its exact value.
In this case approximation methods are used to find vales, which are close to function’s
actual values.
3/2-
DECISION TREE- Uses 3 representation to solve a problem in which each leaf node
corresponds to a class label attributes are represented on the internal load of the tree.
Visualization technique which will talk about how to visualize the outcome.
1. Decision node – denoting choice
2. Chance node – denoting the probability
3. End node – denoted by the outcome.

Consider whole training set as root.

Recursive induction of decision tree-


Does not use back propagation?
Example, A is input, B is output then A and B should not be connected. If it is connected to
a, then we call it as a back propagation.
Once output is generated we should not reiterate.
The tree decision points are in a top-down recursive way, so sometimes it will be referred as
a divide and conquer approach which will dissembles traditional if yes then do it, if not then
do B.
In ascending order
8/2-
Picking best splitting attribute-
Attributes – splitting attributes – 7-8 attributes ? How to split? How do decide root, right
and left node?
Computation – entropy & information gain also called as Gini Index

Information gain = (entropy before split) – (entropy after split.)


Or, = (overall entropy at parent node) – (sum of weighted entropy at each child node)

Attribute with maximum information is the best split attribute. 


Maximum Information Gain ; Minimum Entropy

//Entropy is the number of bits required to transmit a randomly selected event from a
probability distribution. It is used to make decisions

Entropy deals with a formula that is going to identify the minimum among the split nodes.
Trying to find out the path and based on the path, it is trying to find the split.

Computational complexity depends on the height of the tree. If the tree is large, the
complexity is more complex.
Time complexity is denoted by O(H). O is a function while H is a parameter.(H is a parameter
passed through the function)
Example-

AVL tree is a binary search tree with additional property that difference between height of
left subtree and right subtree.
I AM TIRED UGGGGGH

Noisy data – Data which have variable return types/ invalid attributes
Overfitting – occurs when the tree is designed so as to perfectly fit all samples in the
training data set.

15/2-
Iterative Dichotomizer 3 ALGORITHM –
Classification kind of algo. Used for building a decision tree.

Example-
Example2-
25/3-
0 Aspect – Part of records.
1. Training the network
2. Testing the network
Cross validation –
1. Training {80 records}
2. Testing {20 records}

1 Aspect – No of records
1. Training
2. Testing
3. Blind spot
K fold cross validation-

30/3-
Eg. x company produces 1000 units/day
If successful, null Hypothesis
If not, Alternate Hypothesis

95% level of significance


T-A = E
Target – actual = error
1000 – 950 = 50

Error-
Type 1- Alpha ; actual production of x company based on null hypo ; actual error
Type 2- Beta ; Calculate expected error- {assuming} ; expected error

Questions-
[Solved] QuestionNull Hypothesis Are teens better at math than adults? Age... | Course Hero
1. Are teens better at math than adult?
Ans. Age has no effect on mathematical ability.

2. Does taking aspirin everyday reduce the chance or having heart attack?
3. Do teens use cellphones to access the internet more than adults?
4. Do cats care about the color of their food?
5. Does chewing willow bark relieve pain ?

K fold cross validation-


Total number of S1,S2… must be equal to value of k

You might also like