UNIT 1 Machine Learning
UNIT 1 Machine Learning
UNIT - I
Machine Learning
Machine learning is a branch of artificial intelligence that enables algorithms to uncover hidden patterns
within datasets, allowing them to make predictions on new, similar data without explicit programming for
each task. Traditional machine learning combines data with statistical tools to predict outputs, yielding
actionable insights. This technology finds applications in diverse fields such as image and speech
recognition, natural language processing, recommendation systems, fraud detection, portfolio
optimization, and automating tasks.
For instance, recommender systems use historical data to personalize suggestions. Netflix, for example,
employs collaborative and content-based filtering to recommend movies and TV shows based on user
viewing history, ratings, and genre preferences. Reinforcement learning further enhances these systems
by enabling agents to make decisions based on environmental feedback, continually refining
recommendations.
Sometimes AI uses a
Traditional programming combination of both Data and
ML can find patterns and
is totally dependent on Pre-defined rules, which gives
insights in large datasets that
the intelligence of it a great edge in solving
might be difficult for humans
developers. So, it has complex tasks with good
to discover.
very limited capability. accuracy which seem
impossible to humans.
The lifecycle of a machine learning project involves a series of steps that include:
1. Study the Problems:
The first step is to study the problem. This step involves understanding the business problem and
defining the objectives of the model.
2. Data Collection:
When the problem is well-defined, we can collect the relevant data required for the model. The data
could come from various sources such as databases, APIs, or web scraping.
3. Data Preparation:
When our problem-related data is collected. then it is a good idea to check the data properly and
make it in the desired format so that it can be used by the model to find the hidden patterns. This can
be done in the following steps:
● Data cleaning
● Data Transformation
● Explanatory Data Analysis and Feature Engineering
● Split the dataset for training and testing.
4. Model Selection:
The next step is to select the appropriate machine learning algorithm that is suitable for our problem.
This step requires knowledge of the strengths and weaknesses of different algorithms. Sometimes we
use multiple models and compare their results and select the best model as per our requirements.
5. Model building and Training:
● After selecting the algorithm, we have to build the model.
● In the case of traditional machine learning building mode is easy it is just a few hyperparameter
tunings.
● In the case of deep learning, we have to define layer-wise architecture along with input and output
size, number of nodes in each layer, loss function, gradient descent optimizer, etc.
● After that model is trained using the preprocessed dataset.
6. Model Evaluation:
Once the model is trained, it can be evaluated on the test dataset to determine its accuracy and
performance using different techniques. like classification report, F1 score, precision, recall, ROC
Curve, Mean Square error, absolute error, etc.
7. Model Tuning:
Based on the evaluation results, the model may need to be tuned or optimized to improve its
performance. This involves tweaking the hyperparameters of the model.
8. Deployment:
Once the model is trained and tuned, it can be deployed in a production environment to make
predictions on new data. This step requires integrating the model into an existing software system or
creating a new system for the model.
9. Monitoring and Maintenance:
Finally, it is essential to monitor the model’s performance in the production environment and
perform maintenance tasks as required. This involves monitoring for data drift, retraining the model
as needed, and updating the model as new data becomes available.
Probability measures the likelihood of an event’s occurrence. In situations where the outcome of
an event is uncertain, we discuss the probability of specific outcomes to understand their chances
of happening. The study of events influenced by probability falls under the domain of statistics.
Random Experiment
In probability theory, any event which can be repeated multiple times and its outcome is not
hampered by its repetition is called a Random Experiment. Tossing a coin, rolling dice, etc. are
random experiments.
Sample Space
The set of all possible outcomes for any random experiment is called sample space. For example,
throwing dice results in six outcomes, which are 1, 2, 3, 4, 5, and 6. Thus, its sample space is (1,
2, 3, 4, 5, 6)
Event
The outcome of any experiment is called an event. Various types of events used in probability
theory are,
Independent Events: The events whose outcomes are not affected by the outcomes of other
future and/or past events are called independent events. For example, the output of tossing a
coin in repetition is not affected by its previous outcome.
Dependent Events: The events whose outcomes are affected by the outcome of other events
are called dependent events. For example, picking oranges from a bag that contains 100
oranges without replacement.
Mutually Exclusive Events: The events that can not occur simultaneously are called mutually
exclusive events. For example, obtaining a head or a tail in tossing a coin, because both (head
and tail) can not be obtained together.
Equally likely Events: The events that have an equal chance or probability of happening are
known as equally likely events. For example, observing any face in rolling dice has an equal
probability of 1/6.
Random Variable
A variable that can assume the value of all possible outcomes of an experiment is called a
random variable in Probability Theory. Random variables in probability theory are of two types
which are discussed below,
Variables that can take countable values such as 0, 1, 2,… are called discrete random variables.
Variables that can take an infinite number of values in a given range are called continuous
random variables.
There are various formulas that are used in probability theory and some of them are discussed
below,
Bayes’ Theorem
Bayes’ Theorem is used to determine the conditional probability of an event. It was named after
an English statistician, Thomas Bayes who discovered this formula in 1763. Bayes Theorem is a
very important theorem in mathematics, that laid the foundation of a unique statistical inference
approach called the Bayes’ inference. It is used to find the probability of an event, based on
prior knowledge of conditions that might be related to that event.
For example, if we want to find the probability that a white marble drawn at random came from
the first bag, given that a white marble has already been drawn, and there are three bags each
containing some white and black marbles, then we can use Bayes’ Theorem.
This article explores the Bayes theorem including its statement, proof, derivation, and formula of
the theorem, as well as its applications with various examples.
Bayes theorem (also known as the Bayes Rule or Bayes Law) is used to determine the
conditional probability of event A when event B has already occurred.
The general statement of Bayes’ theorem is “The conditional probability of an event A, given
the occurrence of another event B, is equal to the product of the event of B, given A and the
probability of A divided by the probability of event B.” i.e.
where,
P(A) and P(B) are the probabilities of events A and B also P(B) is never equal to zero.
P(A|B) is the probability of event A when event B happens
P(B|A) is the probability of event B when A happens
Understanding discrete random variables is essential for making informed decisions in various
fields, such as finance, engineering, and healthcare.
In this article, we’ll delve into the fundamentals of discrete random variables, including their
definition, probability mass function, expected value, and variance. By the end of this article,
you’ll have a solid understanding of discrete random variables and how to use them to make
better decisions.
In probability theory, a discrete random variable is a type of random variable that can take on a
finite or countable number of distinct values. These values are often represented by integers or
whole numbers, other than this they can also be represented by other discrete values.
For example, the number of heads obtained after flipping a coin three times is a discrete random
variable. The possible values of this variable are 0, 1, 2, or 3.
A very basic and fundamental example that comes to mind when talking about discrete random
variables is the rolling of an unbiased standard die. An unbiased standard die is a die that has six
faces and equal chances of any face coming on top. Considering we perform this experiment, it is
pretty clear that there are only six outcomes for our experiment.
Thus, our random variable can take any of the following discrete values from 1 to 6.
Mathematically the collection of values that a random variable takes is denoted as a set. In this
case, let the random variable be X.
Thus, X = {1, 2, 3, 4, 5, 6}
Another popular example of a discrete random variable is the number of heads when tossing of
two coins. In this case, the random variable X can take only one of the three choices i.e., 0, 1,
and 2.
Other than these examples, there are various other examples of random discrete variables. Some
of these are as follows:
The probability distribution of a discrete random variable is described by its probability mass
function (PMF), which assigns a probability to each possible value of the variable. The key
properties of a PMF are:
Consider a generalized experiment rather than taking some particular experiment. Suppose that
in your experiment, the outcome of this experiment can take values in some interval (a, b).
That means that each and every single point in the interval can be taken up as the outcome values
when you do the experiment. Hence, you do not have discrete values in this set of possible
values but rather an interval.
Random Variable
Random variable is a fundamental concept in statistics that bridges the gap between theoretical
probability and real-world data. random variable in statistics is a function that assigns a real
value to an outcome in the sample space of a random experiment. For example: if you roll a die,
you can assign a number to each possible outcome.
In this article, we will learn about random variable statistics, their types, random variable
examples, and others in detail.
A random variable is considered a discrete random variable when it takes specific, or distinct
values within an interval. Conversely, if it takes a continuous range of values, then it is classified
as a continuous random variable.
Random variable in statistics is a variable whose possible values are numerical outcomes of a
random phenomenon. It is a function that assigns a real number to each outcome in the sample
space of a random experiment.
We define a random variable as a function that maps from the sample space of an experiment
to the real numbers. Mathematically, Random Variable is expressed as,
X: S →R
where,
If two unbiased coins are tossed then find the random variable associated with that event.
Solution:
Example 2
P(X = xi) = pi
where 1 ≤ i ≤ m
0 ≤ pi ≤ 1; where 1 ≤ i ≤ m
p1 + p2 + p3 + ……. + pm = 1 Or we can say 0 ≤ pi ≤ 1 and ∑pi = 1
X = {0, 1, 2} where m = 3
For example,
Suppose a dice is thrown (X = outcome of the dice). Here, the sample space S = {1, 2, 3, 4, 5, 6}.
The output of the function will be:
P(X=1) = 1/6
P(X=2) = 1/6
P(X=3) = 1/6
P(X=4) = 1/6
P(X=5) = 1/6
P(X=6) = 1/6
0 ≤ pi ≤ 1
∑pi = 1 where the sum is taken over all possible values of x
xi 0 1 2
Pi(X = xi) P1 0.3 0.5
Solution:
P1 + 0.3 + 0.5 = 1
P1 = 0.2
Then, P (X = 0) is 0.2
Continuous Random Variable takes on an infinite number of values. The probability function
associated with it is said to be PDF (Probability Density Function).
Such that,
f(x) = kx3; 0 ≤ x ≤ 3 = 0
Solution:
∫ f(x) dx = 1
3
∫ kx dx = 1
K[x4]/4 = 1
Given interval, 0 ≤ x ≤ 3 = 0
K[34 – 04]/4 = 1
K(81/4) = 1
K = 4/81
Thus,
P = 4/81×[16-1]/4
P = 15/81
For any random variable X where P is its respective probability we define its mean as,
Mean(μ) = ∑ X.P
where,
The variance of a random variable tells us how the random variable is spread about the mean
value of the random variable. Variance of Random Variable is calculated using the formula,
2 2 2
Var(x) = σ = E(X ) – {E(X)}
where,
E(X2) = ∑X2P
E(X) = ∑XP
For any random variable X if it assume the values x1, x2,…xn where the probability
corresponding to each random variable is P(x1), P(x2),…P(xn), then the expected value of the
variable is,
Now for any new random variable Y in which the random variable X is its input, i.e. Y = f(X),
then the cumulative distribution function of Y is,
Fy(Y) = P(g(X) ≤ y)
1. Find the mean value for the continuous random variable, f(x) = x2, 1 ≤ x ≤ 3
Solution:
Given,
2
f(x) = x
1 ≤ x ≤ 3
E(x) = [x4/4]31
4 4
E(x) = 1/4{3 – 1 } = 1/4{81 – 1}
E(x) = 1/4{80} = 20
2. Find the mean value for the continuous random variable, f(x) = ex, 1 ≤ x ≤ 3
Solution:
Given,
f(x) = ex
1 ≤ x ≤ 3
3
E(x) = ∫ 1 x.f(x)dx
Solution :
f(x) = x^2, 1 ≤ x ≤ 3
= [(1/4)x^4]₁ ³ / [(1/3)x^3]₁ ³
= 20 / 26/3
4. Find the mean value for the continuous random variable, f(x) = 2x + 1, 0 ≤ x ≤ 4
Solution :
f(x) = 2x + 1, 0 ≤ x ≤ 4
= (170/3) / 20
= 17/6 ≈ 2.83
5. Find the mean value for the continuous random variable, f(x) = x3, -1 ≤ x ≤ 2
Solution :
f(x) = x3, -1 ≤ x ≤ 2
= [(1/5)x^5]₋ ₁ ² / [(1/4)x^4]₋ ₁ ²
= 33/5 / 17/4
= 334 / (517) ≈ 1.55
6. Find the mean value for the continuous random variable, f(x) = √x, 1 ≤ x ≤ 9
Solution :
f(x) = √x, 1 ≤ x ≤ 9
= [(2/5)x^(5/2)]₁ ⁹ / [(2/3)x^(3/2)]₁ ⁹
≈ 5.2
7. Find the mean value for the continuous random variable, f(x) = 3x2 – 2x, 0 ≤ x ≤ 3
Solution :
8. Find the mean value for the continuous random variable, f(x) = sin(x), 0 ≤ x ≤ π
Solution :
f(x) = sin(x), 0 ≤ x ≤ π
= (π + 1) / 2
≈ 2.07
9. Find the mean value for the continuous random variable, f(x) = ex, 0 ≤ x ≤ 2
Solution :
f(x) = e^x, 0 ≤ x ≤ 2
≈ 1.54
10. Find the mean value for the continuous random variable, f(x) = ln(x), 1 ≤ x ≤ e
Solution :
f(x) = ln(x), 1 ≤ x ≤ e
= (e^2/2 – 1/4) / (e – 1)
≈ 1.95
The probability of an
The probability of an
Probability of interval is the sum of the
interval is the area under
an Interval probabilities of each
the PDF over the interval.
value in the interval.
Types of Quantiles
Let’s consider a dataset: [5, 10, 15, 20, 25, 30, 35, 40, 45, 50].
The problem is that we don’t always know the full probability distribution for
a random variable. This is because we only use a small subset of observations
to derive the outcome. This problem is referred to as Probability Density
Estimation as we use only a random sample of observations to find the
general density of the whole sample space.
A PDF is a function that tells the probability of the random variable from a
sub-sample space falling within a particular range of values and not just one
value. It tells the likelihood of the range of values in the random variable sub-
space being the same as that of the whole sample.
Steps Involved:
Most of the histogram of the different random sample after fitting should
match the histogram plot of the whole population.
Density Estimation: It is the process of finding out the density of the whole
population by examining a random sample of data from that population. One
of the best ways to achieve a density estimate is by using a histogram plot.
A normal distribution has two given parameters, mean and standard deviation.
We calculate the sample mean and standard deviation of the random sample
taken from this population to estimate the density of the random sample. The
reason it is termed as ‘parametric’ is due to the fact that the relation between
the observations and its probability can be different based on the values of the
two parameters.
below.
PDF fitted over histogram plot with one peak value
In some cases, the PDF may not fit the random sample as it doesn’t follow a
normal distribution (i.e instead of one peak there are multiple peaks in the
graph). Here, instead of using distribution parameters like mean and standard
deviation, a particular algorithm is used to estimate the probability
distribution. Thus, it is known as a ‘nonparametric density estimation’.
NOTE: MLE assumes that all PDFs are a likely candidate to being
the best fitting curve. Hence, it is computationally expensive
method.
Intuition:
Fig 1 : MLE Intuition
Fig 1 shows multiple attempts at fitting the PDF bell curve over the random
sample data. Red bell curves indicate poorly fitted PDF and the green bell
curve shows the best fitting PDF over the data. We obtained the optimum bell
curve by checking the values in Maximum Likelihood Estimate plot
corresponding to each PDF.
As observed in Fig 1, the red plots poorly fit the normal distribution, hence
their ‘likelihood estimate’ is also lower. The green PDF curve has the
maximum likelihood estimate as it fits the data perfectly. This is how the
maximum likelihood estimate method works.
Mathematics Involved
In the intuition, we discussed the role that Likelihood value plays in
determining the optimum PDF curve. Let us understand the math involved in
MLE method.
This way, we can obtain the PDF curve that has the maximum likelihood of
fit over the random sample data.
Log Likelihood
Taking the log of likelihood function gives the same result as before due to
the increasing nature of Log function. But now, it becomes less
computational due to the property of logarithm:
Now, we can easily differentiate log L wrt P and obtain the desired result.
What is Covariance?
Covariance is a statistical measure that indicates the direction of the
linear relationship between two variables. It assesses how much two
variables change together from their mean values.
Types of Covariance:
It is an essential tool for understanding how variables change together and are
widely used in various fields, including finance, economics, and science.
Covariance:
Covariance Formula
For Population:
For Sample:
Example –
What is Correlation?
Correlation is a standardized measure of the strength and direction of the
linear relationship between two variables. It is derived from covariance and
ranges between -1 and 1. Unlike covariance, which only indicates the
direction of the relationship, correlation provides a standardized measure.
The correlation coefficient ρ\rhoρ (rho) for variables X and Y is defined as:
1. It show whether and how strongly pairs of variables are related to each
other.
2. Correlation takes values between -1 to +1, wherein values close to +1
represents strong positive correlation and values close to -1 represents
strong negative correlation.
3. In this variable are indirectly related to each other.
4. It gives the direction and strength of relationship between variables.
Correlation Formula
Example –
Applications of Correlation