2023-24_ML_NOTES_1
2023-24_ML_NOTES_1
NOTES
Introduction
Arthur Samuel, an early American leader in the field of computer gaming and artificial
intelligence, coined the term “Machine Learning” in 1959 while at IBM. He defined
machine learning as “the field of study that gives computers the ability to learn without
being explicitly programmed.” However, there is no universally accepted definition for
machine learning. Different authors define the term differently. We give below two more
definitions.
In the above definitions we have used the term “model” and we will be using this term
at several contexts later. It appears that there is no universally accepted one sentence
definition of this term. Loosely, it may be understood as some mathematical expression
or equation, or some mathematical structures such as graphs and trees, or a division of
sets into disjoint subsets, or a set of logical “if . . . then . . . else . . .” rules, or some
such thing. It may be noted that this is not an exhaustive list.
In other words, machine learning lets computers “create” programs (often, the
intent for developing these programs is making predictions) themselves. We
can say that machine learning is the process of turning data into programs
(Figure 1).
machine learning was first coined by Arthur Lee Samuel, a pioneer in the AI
field, in 19591.
One quotation that almost every introductory machine learning resource cites is
the following, which summarizes the concept behind machine learning nicely
and concisely:
Machine learning is the field of study that gives computers the ability to
learn without being explicitly programmed. 2
— Arthur L. Samuel, AI
pioneer, 1959
Now, before we introduce machine learning more formally, here is what some
other people said about the field:
1
Arthur L Samuel. “Some studies in machine learning using the game of
checkers”. In: IBM Journal of research and development 3.3 (1959), pp. 210–
229.
2
This is not a direct quote but a paraphrased version of Samuel’s sentence
”Programming computers to learn from experience should eventually eliminate
the need for much of this detailed programming effort.”
Inputs (observations)
A bit more concrete is Tom Mitchell’s description from his Machine Learning book3:
Definition of learning
A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks T, as measured by P, improves with experience
E.
Examples:
A computer program which learns from experience is called a machine learning program or
simply a learning program. Such a program is sometimes also referred to as a learner.
The learning process, whether by a human or a machine, can be divided into four components,
Data storage: Facilities for storing and retrieving huge amounts of data are an important
component of the learning process. Humans and computers alike utilize data storage as a
foundation for advanced reasoning.
• In a human being, the data is stored in the brain and data is retrieved using electrochemical
signals.
• Computers use hard disk drives, flash memory, random access memory and similar devices to
store data and use cables and other technology to retrieve data.
Abstraction
The second component of the learning process is known as abstraction. Abstraction is the process
of extracting knowledge about stored data. This involves creating general concepts about the data
as a whole. The creation of knowledge involves application of known models and creation of
new models.
The process of fitting a model to a dataset is known as training. When the model has been
trained, the data is transformed into an abstract form that summarizes the original information.
Generalization
The third component of the learning process is known as generalisation.The term generalization describes
the process of turning the knowledge about stored data into a form that can be utilized for future action.
These actions are to be carried out on tasks that are similar, but not identical, to those what have been seen
before. In generalization, the goal is to discover those properties of the data that will be most relevant to
future tasks.
Evaluation . It is the process of giving feedback to the user to measure the utility of the learned
knowledge. This feedback is then utilised to effect improvements in the whole learning process.
Application of machine learning methods to large databases is called data mining. In data
mining, a large volume of data is processed to construct a simple model with valuable use, for
example, having high predictive accuracy.
2. In finance, banks analyze their past data to build models to use in credit applications, fraud
detection, and the stock market.
3. In manufacturing, learning models are used for optimization, control, and troubleshooting.
5. In telecommunications, call patterns are analyzed for network optimization and maximizing
the quality of service.
6. In science, large amounts of data in physics, astronomy, and biology can only be analyzed fast
enough by computers. The World Wide Web is huge; it is constantly growing and searching for
relevant information cannot be done manually.
7. In artificial intelligence, it is used to teach a system to learn and adapt to changes so that the
system designer need not foresee and provide solutions for all possible situations.
8. It is used to find solutions to many problems in vision, speech recognition, and robotics.
10. Machine learning methods have been used to develop programmes for playing games
such as chess, backgammon and Go.
• Sports predictions
• Stock predictions
• Sentiment analysis
• Drug design
• Medical diagnoses
• ...
Labeled data
Learning Direct feedback
outcome/future
No feedback
Supervised Learning
Supervised learning is the subcategory of machine learning that focuses on learning a
classification (Figure 4), or regression model (Figure 5), that is, learning from labeled
training data (i.e., inputs that also contain the desired outputs or targets; basically,
“examples” of what we want to predict).
x2
x1
Figure 4: Illustration of a binary classification problem (plus and minus signs denote
class labels) and two feature variables, (x1 and x2). (Source: Raschka & Mirjalili:
Python Machine Learning, 3rd Ed.).
Figure 5: Illustration of a linear regression model with one feature variable (x1) and the target
variable y. The dashed-line indicates the functional form of the linear regression model. (Source:
Raschka & Mirjalili: Python Machine Learning, 3rd Ed.).
Supervised learning is the machine learning task of learning a function that maps an input to an
output based on example input-output pairs.
In supervised learning, each example in the training set is a pair consisting of an input object
(typically a vector) and an output value. A supervised learning algorithm analyzes the training
data and produces a function, which can be used for mapping new examples. In the optimal case,
the function will correctly determine the class labels for unseen instances. Both classification and
regression problems are supervised learning problems.
A wide range of supervised learning algorithms are available, each with its strengths and
weaknesses. There is no single learning algorithm that works best on all supervised learning
problems.
A “supervised learning” is so called because the process of algorithm learning from the training
dataset can be thought of as a teacher supervising the learning process. We know the correct
answers (that is, the correct outputs), the algorithm iteratively makes predictions on the training
data and is corrected by the teacher. Learning stops when the algorithm achieves an acceptable
level of performance.
Example :
Consider the following data regarding patients entering a clinic. The data consists of the gender
and age of the patients and each patient is labelled as “healthy” or “sick”.
Unsupervised learning
In contrast to supervised learning, unsupervised learning is a branch of machine learning
that is concerned with unlabeled data. Common tasks in unsupervised learning are clustering
analysis (assigning group memberships; Figure 6) and dimensionality reduction
(compressing data onto a lower-dimensional subspace or manifold).
x2
x1
Figure: Illustration of clustering, where the dashed lines indicate potential group
membership assignments of unlabeled data points. (Source: Raschka & Mirjalili:
Python Machine Learning, 3rd Ed.).
Unsupervised learning is a machine learning technique in which models are not supervised
using training dataset. Instead, models itself find the hidden patterns and insights from the
given data. It can be compared to learning which takes place in the human brain while learning
new things.
The most common unsupervised learning method is cluster analysis, which is used for
exploratory data analysis to find hidden patterns or grouping in data.
Example :
Consider the following data regarding patients entering a clinic. The data consists of the
gender and age of the patients.
Based on this data, can we infer anything regarding the patients entering the clinic?
Example: Suppose the unsupervised learning algorithm is given an input dataset containing
images of different types of cats and dogs. The algorithm is never trained upon the given dataset,
which means it does not have any idea about the features of the dataset. The task of the
unsupervised learning algorithm is to identify the image features on their own. Unsupervised
learning algorithm will perform this task by clustering the image dataset into the groups
according to similarities between images.
o Unsupervised learning is helpful for finding useful insights from the data.
o Unsupervised learning is much similar as a human learns to think by their own
experiences, which makes it closer to the real AI.
o Unsupervised learning works on unlabeled and uncategorized data which make
unsupervised learning more important.
o In real-world, we do not always have input data with the corresponding output so to
solve such cases, we need unsupervised learning.
Here, we have taken an unlabeled input data, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to the machine
learning model in order to train it. Firstly, it will interpret the raw data to find the hidden patterns
from the data and then will apply suitable algorithms such as k-means clustering, Decision tree,
etc.
Once it applies the suitable algorithm, the algorithm divides the data objects into groups
according to the similarities and difference between the objects.
o Clustering: Clustering is a method of grouping the objects into clusters such that objects
with most similarities remains into a group and has less or no similarities with the objects
of another group. Cluster analysis finds the commonalities between the data objects and
categorizes them as per the presence and absence of those commonalities.
Reinforcement learning
Reinforcement is the process of learning from rewards while performing a series of
actions.
In reinforcement learning, we do not tell the learner or agent (for example, a
robot), which action to take but merely assign a reward to each action and/or the
overall outcome. Instead of having “correct/false” labels for each step, the learner
must discover or learn a behavior that maximizes the reward for a series of actions.
In that sense, it is not a supervised setting.
RL is somewhat related to unsupervised learning; however, reinforcement learning
really is its own category of machine learning.
Reinforcement learning is the problem of getting an agent to act in the world so as to
maximize its rewards.A learner (the program) is not told what actions to take as in most
forms of machine learning, but instead must discover which actions yield the most
reward by trying them. In the most interesting and challenging cases, actions may affect
not only the immediate reward but also the next situations and, through that, all
subsequent rewards.
Environment
Reward
Action
Agent
Figure 7: Illustration of reinforcement learning (Source: Raschka & Mirjalili: Python Machine
Learning, 3rd Ed.).
For example, consider teaching a dog a new trick: we cannot tell it what to do, but we can
reward/punish it if it does the right/wrong thing. It has to find out what it did that made it get
the reward/punishment. We can use a similar method to train computers to do many tasks,
such as playing backgammon or chess, scheduling jobs, and controlling robot limbs.
Reinforcement learning is different from supervised learning. Supervised learning is learning
from examples provided by a knowledgeable expert.
Semi-supervised learning
It can be described as a mix between supervised and unsupervised learning.
In semi-supervised learning tasks, some training examples contain outputs, but
some do not.
We then use the labeled training subset to label the unlabeled portion of the
training set, which we then also utilize for model training.
There are two subtypes of AI: Artificial general intelligence (AGI) and narrow AI.
AGI refers to an intelligence that equals humans in several tasks, i.e., multi-
purpose AI. In contrast, narrow AI is more narrowly focused on solving a particular
task that humans are traditionally good at (e.g., playing a game, or driving a car – I
would not go so far and refer to “image classification” as AI).
13
David H Wolpert. “The lack of a priori distinctions between learning
algorithms”. In: Neural compu- tation 8.7 (1996), pp. 1341–1390.
Deep Learning
AI
Figure 14: Relationship between machine learning, deep learning, and artificial intelligence. Note that there is
also overlap between Machine learning and data mining, data science, statistics, etc. (not shown).
REGRESSION
Linear regression:
Linear regression algorithm shows a linear relationship between a dependent (y) and one or more
independent (y) variables, hence called as linear regression.
The linear regression model provides a sloped straight line representing the relationship between
the variables. Consider the below image:
Linear regression in simple term is answering a question on “How can I use X to predict Y?”
where X is some information that you have, and Y is some information that you want.
Let’s say you wanted a sell a house and you wanted to know how much you can sell it for.
You have information about the house that is your X and the selling price that you wanted to
know will be your Y.
Linear regression creates an equation in which you input your given numbers (X) and it
outputs the target variable that you want to find out (Y).
Linear Regression model representation
Linear regression is such a useful and established algorithm, that it is both a statistical model
and a machine learning model. Linear regression tries a draw a best fit line that is close to the
data by finding the slope and intercept.
Linear regression equation is,
Y=a+bx
In this equation:
y is the output variable. It is also called the target variable in machine learning or the
dependent variable.
x is the input variable. It is also referred to as the feature in machine learning or it is called the
independent variable.
a is the constant
The cost function is defined as the measurement of difference or error between actual values and
expected values at the current position and present in the form of a single real number.
Mean Squared Error represents the average of the squared difference between the original
and predicted values in the data set. It measures the variance of the residuals.
Root Mean Squared Error is the square root of Mean Squared error. It measures the
standard deviation of residuals.
The coefficient of determination or R-squared represents the proportion of the variance in the
dependent variable which is explained by the linear regression model. It is a scale- free score i.e.
irrespective of the values being small or large, the value of R square will be less than one.
Evaluation Metrics
Mean Squared Error(MSE) and Root Mean Square Error penalizes the large prediction errors
vi-a-vis Mean Absolute Error (MAE). However, RMSE is widely used than MSE to evaluate the
performance of the regression model with other random models as it has the same units as the
dependent variable (Y-axis).
R Squared & Adjusted R Squared are used for explaining how well the independent variables
in the linear regression model explains the variability in the dependent variable. R Squared value
always increases with the addition of the independent variables which might lead to the addition
of the redundant variables in our model. However, the adjusted R-squared solves this
problem.For comparing the accuracy among different linear regression models, RMSE is a
better choice than R Squared.
Logistic Regression
o Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.
o Logistic regression predicts the output of a categorical dependent variable. Therefore the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
o Logistic Regression is much similar to the Linear Regression except that how they are used.
Linear Regression is used for solving Regression problems, whereas Logistic regression is
used for solving the classification problems.
o In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic
function, which predicts two maximum values (0 or 1).
o The curve from the logistic function indicates the likelihood of something such as whether the
Machine Learning Concepts Summary
Machine Learning Fundamentals
cells are cancerous or not, a mouse is obese or not based on its weight, etc.
o Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.
o Logistic Regression can be used to classify the observations using different types of data and
can easily determine the most effective variables used for the classification. The below image
is showing the logistic function:
o In Logistic Regression y can be between 0 and 1 only, so for this let's divide the above
equation by (1-y):
o But we need range between -[infinity] to +[infinity], then take logarithm of the equation
it will become:
Dependent Variable:
The dependent Variable can have two or more possible outcomes/classes.
The dependent variables are nominal in nature means there is no any kind of ordering in target
dependent classes i.e. these classes cannot be meaningfully ordered.
The dependent variable to be predicted belongs to a limited set of items defined.
Basic Steps
This is very nice and easy, but finding the best margin, the optimization problem is not trivial (it is easy in 2D, when
we have only two attributes, but what if we have N dimensions with N a very big number).
Reference Textbooks :
Machine Learning –Saikat Dutt, Subramanian Chandramouli, Amit Kumar Das, Pearson
Foundations of Machine Learning, Mehryar Mohri, Afshin Rostamizadeh, Ameet
Talwalkar, MIT Press.
Kevin Murphy, Machine Learning: A Probabilistic Perspective, MIT Press,2012
Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of Statistical Learning,
Springer2009
Data Mining–Concepts and Techniques -Jiawei Han and Micheline
Kamber,Morgan Kaufmann