0% found this document useful (0 votes)
113 views

CS-601 Machine Learning Unit-1 New

Uploaded by

okchaitanya568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
113 views

CS-601 Machine Learning Unit-1 New

Uploaded by

okchaitanya568
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 70

Chameli Devi Group of Institutions, Indore

Department of Computer Science and Engineering

Unit-I
CS601-Machine Learning

1
Syllabus
Unit –I
Introduction to machine learning, scope and limitations, regression,
probability, statistics and linear algebra for machine learning, convex
optimization, data visualization, hypothesis function and testing, data
distributions, data preprocessing, data augmentation, normalizing data sets,
machine learning models, supervised and unsupervised learning.

Course Outcomes:
CO601.1: Student will be able to explain the fundamental concept of machine
learning and apply knowledge of computing & mathematics to machine
learning problems, models, and algorithms.
and algorithms.

CS-601 Machine Learning (Unit-I) 2


Introduction to Machine Learning

CS-601 Machine Learning (Unit-I) 3


Concept of Machine Learning

CS-601 Machine Learning (Unit-I) 4


Concept of Machine Learning

CS-601 Machine Learning (Unit-I) 5


Application Areas of Machine Learning

CS-601 Machine Learning (Unit-I) 6


Scope and Limitations

CS-601 Machine Learning (Unit-I) 7


• In a statistical context, "regression" refers to a technique used to model the
relationship between a dependent variable and one or more independent variables,
aiming to predict or understand the influence of these independent variables on the
dependent variable.

Key Concepts:
•Dependent Variable (Y): The variable you're trying to predict or explain.
•Independent Variable(s) (X): The variable(s) that are used to predict the dependent
variable.
•Regression Model: A mathematical model that describes how the independent
variables influence the dependent variable.
•Prediction: Regression can be used to predict the value of the dependent variable
based on the values of the independent variables.
•Relationship: Regression helps to understand and quantify the relationship between
the dependent and independent variables.
CS-601 Machine Learning (Unit-I) 8
Examples of Regression in Machine Learning:
•Predicting house prices:
•Understanding how factors like size, location, and age affect the price of a house.
•Predicting sales:

•Understanding how factors like advertising spending, pricing, and seasonality affect
sales.

Common Types of Regression:


•Linear Regression: Aims to model the relationship between variables using a straight
line.
•Multiple Linear Regression: Uses multiple independent variables to predict the
dependent variable.
•Logistic Regression: Used when the dependent variable is categorical (e.g., yes/no,
success/failure).
CS-601 Machine Learning (Unit-I) 9
•Polynomial Regression: Models non-linear relationships with polynomial terms.
Linear Regression

It works on continuous data.

CS-601 Machine Learning (Unit-I) 10


Linear Regression

• Rotation
• Translation

CS-601 Machine Learning (Unit-I) 11


Linear Regression

CS-601 Machine Learning (Unit-I) 12


Probability

• Probability implies 'likelihood' or 'chance’.


• When an event is certain to happen then the probability of occurrence of
that event is 1 and when it is certain that the event cannot happen then
the probability of that event is 0.
• Hence the value of probability ranges from 0 to 1.
• Definition: if there are n exhaustive, mutually exclusive and equally likely
cases out of which m cases are favorable to the happening of event A,
• Then the probabilities of event A is defined as given by the following
probability function:

CS-601 Machine Learning (Unit-I) 13


Probability

Problem Statement:
A coin is tossed. What is the probability of getting a head?
Solution:
Total number of equally likely outcomes (n) = 2 (i.e. head or tail)
Number of outcomes favorable to head (h) = 1
Number of outcomes favorable to head (t) = 1

CS-601 Machine Learning (Unit-I) 14


Statistics

CS-601 Machine Learning (Unit-I) 15


Descriptive Statistics

CS-601 Machine Learning (Unit-I) 16


Inferential statistics

Inferential statistics allows us to make predictions, draw conclusions,


and generalize insights from a sample to an entire population.

CS-601 Machine Learning (Unit-I) 17


centrality measures

Three kinds of centrality measures belongs to descriptive statistics:

• The "mean" is the "average" you're used to, where you add up all the
numbers and then divide by the number of numbers.

• The "median" is the "middle" value in the list of numbers. To find the median,
your numbers have to be listed in numerical order from smallest to largest,
so you may have to rewrite your list before you can find the median.

• The "mode" is the value that occurs most often/frequent. If no number in the
list is repeated, then there is no mode for the list.

CS-601 Machine Learning (Unit-I) 18


centrality measures

13, 18, 13, 14, 13, 16, 14, 21, 13


mean: 15
median: 14
modes: 13, 14

1, 2, 4, 7
mean: 3.5
median: 3
mode: none

8, 9, 10, 10, 10, 11, 11, 11, 12, 13


mean: 10.5
median: 10.5
modes: 10 and 11 CS-601 Machine Learning (Unit-I) 19
Measures of spreads

Range is the difference between minimum


and maximum value of your dataset.
For Example: MST Marks of students varies
from 8 (minimum) to 19 (Maximum)

Range fails if any student appeared in the


MST test but scored 0 (zero) marks.

CS-601 Machine Learning (Unit-I) 20


Measures of spreads

Percentile is a comparison of a value to


others in a set of data.

The 80th percentile means that 80% of the scores


are lower than yours and 20% of the scores are
higher than yours.

For example, if you are the fourth tallest person in


a group of 20, you are at the 80th percentile.

CS-601 Machine Learning (Unit-I) 21


Measures of spreads

Quartiles means first you sort your data


then divides your sorted data into 04 parts.

Inter-Quartile Range between Q2 & Q3


(Most dense part of the data)

Q1 Q2 Q3 Q4

25% 25% 25% 25%


Median
50% 75% 100%

CS-601 Machine Learning (Unit-I) 22


Measures of spreads

Mean Absolute Deviation: We will take mean of whole


data set and measure the distance of each data point
from that point (mean).

Mean

Variance: Square of the mean absolute deviation.

Standard Deviation: Square root of the variance.

CS-601 Machine Learning (Unit-I) 23


Normal distribution

In machine learning, a "normal distribution" (also called a Gaussian distribution)


refers to a bell-shaped curve probability distribution where most data points
cluster around the mean value, with progressively fewer data points tapering
off symmetrically towards the extremes.

Key points about normal distribution in


machine learning:
• Bell-shaped curve:

The visual representation of a normal distribution is a


symmetrical bell-shaped curve, with the highest
point at the mean value.
• Mean and standard deviation:

A normal distribution is completely defined by its


mean (center of the distribution) and Learning
CS-601 Machine standard
(Unit-I) 24
deviation (spread of the data).
Normal distribution

CS-601 Machine Learning (Unit-I) 25


Linear Algebra for Machine
Learning
• Linear Algebra is a branch of mathematics that lets you concisely describe
coordinates and interactions of planes in higher dimensions and perform
operations on them and concerned with vectors, matrices, and linear transforms.
• Although linear algebra is integral to the field of machine learning, the tight
relationship is often left unexplained or explained using abstract concepts such as
vector spaces or specific matrix operations.
• Linear Algebra is required -
• When working with data, such as tabular datasets and images.
• When working with data preparation, such as one hot encoding and dimensionality
reduction.
• The ingrained use of linear algebra notation and methods in sub-fields such as deep
learning, natural language processing, and recommender systems.

26
Linear Algebra

Applications of Linear Algebra in Machine Learning

CS-601 Machine Learning (Unit-I) 27


Linear Algebra for Machine
Learning
Examples of linear algebra in machine learning-
1. Dataset and Data Files
2. Images and Photographs
3. Linear Regression
4. Regularization
5. Principal Component Analysis
6. Singular-Value Decomposition
7. Latent Semantic Analysis
8. Recommender Systems
9. Deep Learning

28
Convex Optimization:
Convex Set
• Convex optimization in machine learning involves finding the best solution (minimum or maximum)
of a problem where the objective function is convex and all constraints are also convex.

Here's a breakdown of key aspects:


What it is:
• Convex optimization focuses on minimizing a convex objective function (or maximizing a concave one) subject
to convex constraints.
Why it's important:
• A key advantage is that any local optimum found by an optimization algorithm is also the global optimum,
ensuring you find the best possible solution.

Examples:
• Linear Programming: Where the objective function and constraints are linear.
• Support Vector Machines (SVMs)
• Least Squares Problems: Where the objective is to minimize the sum of squared errors.
29
Data Visualization
• Data visualization is an important skill in applied statistics and machine learning.
• Statistics does indeed focus on quantitative descriptions and estimations of data. Data visualization provides an
important suite of tools for gaining a qualitative understanding.
• This can be helpful when exploring and getting to know a dataset and can help with identifying patterns, corrupt
data, outliers, and much more.
• With a little domain knowledge, data visualizations can be used to express and demonstrate key relationships in
plots and charts that are more visceral to yourself and stakeholders than measures of association or significance.

There are five key plots that need to know well for basic data visualization. They are:
• Line Plot
• Bar Chart
• Histogram Plot
• Scatter Plot
With knowledge of these plots, you can quickly get a qualitative understanding of most data that you come across.

30
Data Visualization
Line Plot
A line plot is generally used to present observations collected at regular intervals.
The x-axis represents the regular interval, such as time. The y-axis shows the
observations, ordered by the x-axis and connected by a line.

31
Data Visualization
Bar Chart
A bar chart is generally used to present relative quantities for multiple categories.
A bar chart can be created by calling the bar() function and passing the category names
for the x-axis and the quantities for the y-axis.
Bar charts can be useful for comparing multiple point quantities or estimations.

Figure: 1.5 Bar Chart

32
Data Visualization
Histogram Plot
A histogram plot is generally used to summarize the distribution of a data sample.
The x-axis represents discrete bins or intervals for the observations. For example
observations with values between 1 and 10 may be split into five bins, the values [1,2]
would be allocated to the first bin, [3,4] would be allocated to the second bin, and so
on.
The y-axis represents the frequency or count of the number of observations in the
dataset that belong to each bin.

33
Data Visualization
Scatter Plot
A scatter plot is generally used to summarize the relationship between two paired data samples.
Paired data samples means that two measures were recorded for a given observation, such as the
weight and height of a person.
The x-axis represents observation values for the first sample, and the y-axis represents the
observation values for the second sample. Each point on the plot represents a single observation.
Scatter plots are useful for showing the association or correlation between two variables. A
correlation can be quantified, such as a line of best fit that too can be drawn as a line plot on the
same chart, making the relationship clearer.

34
Hypothesis function and testing
Hypothesis testing is a statistical method that is used in making statistical decisions using experimental data. Hypothesis Testing
is basically an assumption that we make about the population parameter. The equations used to represent the methods are
called Hypothesis function.

Example: Say average student in class is 40 or a boy is taller than girls.

For drawing some inferences, we have to make some assumptions that lead to two terms that are used in the hypothesis
testing.

• Null hypothesis: It is regarding the assumption that there is no anomaly pattern or believing according to the assumption
made.

• Alternate hypothesis: Contrary to the null hypothesis, it shows that observation is the result of real effect.

Some of widely used hypothesis testing types:-

1. T Test ( Student T test)

2. Z Test

3. ANOVA Test

4. Chi-Square Test
CS-601 Machine Learning (Unit-I) 35
Hypothesis Testing
• All hypotheses are tested using a four-step process:
• The first step is for the analyst to state the two hypotheses so that only one can
be right.

• The next step is to formulate an analysis plan, which outlines how the data will
be evaluated.

• The third step is to carry out the plan and physically analyze the sample data.

• The fourth and final step is to analyze the results and either reject the null
hypothesis, or state that the null hypothesis is plausible, given the data.

CS-601 Machine Learning (Unit-I) 36


Hypothesis Testing
• If, for example, a person wants to test that a coin has exactly a 50% chance of
landing on heads, the null hypothesis would be yes, and the alternative
hypothesis would be no (it does not land on heads). Mathematically, the null
hypothesis would be represented as Ho: P = 0.5.

• The alternative hypothesis would be denoted as "Ha" and be identical to the null
hypothesis, except with the equal sign struck-through, meaning that it does not
equal 50%.

CS-601 Machine Learning (Unit-I) 37


Hypothesis Testing
• A random sample of 100 coin flips is taken, and the null hypothesis is then tested.
• If it is found that the 100 coin flips were distributed as 40 heads and 60 tails, the
analyst would assume that a coin does not have a 50% chance of landing on heads
and would reject the null hypothesis and accept the alternative hypothesis.
• If, on the other hand, there were 48 heads and 52 tails, then it is plausible that the
coin could be fair and still produce such a result.
• In cases such as this where the null hypothesis is "accepted," the analyst states
that the difference between the expected results (50 heads and 50 tails) and the
observed results (48 heads and 52 tails) is "explainable by chance alone."

CS-601 Machine Learning (Unit-I) 38


Decision Errors
• Type I error: A Type I error occurs when the researcher rejects a null hypothesis
when it is true. The probability of committing a Type I error is called the
significance level. This probability is also called alpha, and is often denoted by α.

• Type II error: A Type II error occurs when the researcher fails to reject a null
hypothesis that is false. The probability of committing a Type II error is called
Beta, and is often denoted by β. The probability of not committing a Type II error
is called the Power of the test.

CS-601 Machine Learning (Unit-I) 39


Decision Rules
• P-value: The strength of evidence in support of a null hypothesis is measured by
the P-value.
• Suppose the test statistic is equal to S. The P-value is the probability of observing a
test statistic as extreme as S, assuming the null hypothesis is true.
• If the P-value is less than the significance level, we reject the null hypothesis.
• Region of acceptance: It is a range of values. If the test statistic falls within the
region of acceptance, the null hypothesis is not rejected.
• The region of acceptance is defined so that the chance of making a Type I error is
equal to the significance level.

CS-601 Machine Learning (Unit-I) 40


Hypothesis Testing Scenario

CS-601 Machine Learning (Unit-I) 41


Data Distributions
• From a practical perspective, we can think of a distribution as a function that
describes the relationship between observations in a sample space.
• For example, we may be interested in the age of humans, with individual ages
representing observations in the domain, and ages 0 to 125 the extent of the
sample space. The distribution is a mathematical function that describes the
relationship of observations of different heights.
• A distribution is simply a collection of data, or scores, on a variable. Usually, these
scores are arranged in order from smallest to largest and then they can be
presented graphically.
• Once a distribution function is known, it can be used as a shorthand for describing
and calculating related quantities, such as likelihoods of observations, and plotting
the relationship between observations in the domain.

CS-601 Machine Learning (Unit-I) 42


Data Distributions
• Datasets are composed of two main types of data: Numerical (eg. integers, floats),
and Categorical (e.g., names, laptops brands).
• Numerical data can additionally be divided into other two
categories: Discrete and Continue. Discrete data can take only certain values (e.g.
number of students in a school) while continuous data can take any real or
fractional value (e.g. the concepts of height and weights).
• From discrete random variables, it is possible to calculate Probability Mass
Functions, while from continuous random variables can be derived Probability
Density Functions.
• The Probability Density Function(PDF) is the probability function which is
represented for the density of a continuous random variable lying between a
certain range of values. It is also called a probability distribution function or just a
probability function.

CS-601 Machine Learning (Unit-I) 43


Probability Distributions

CS-601 Machine Learning (Unit-I) 44


Bernoulli Distribution
• The Bernoulli distribution is one of the easiest distributions to understand.
• This distribution has only two possible outcomes and a single trial.
• A simple example can be a single toss of a biased/unbiased coin. In this example, the
probability that the outcome might be heads can be considered equal to p and (1 -
p) for tails (the probabilities of mutually exclusive events that encompass all possible
outcomes needs to sum up to one).

CS-601 Machine Learning (Unit-I) 45


Uniform Distribution
• The Uniform Distribution can be easily derived from the Bernoulli Distribution. In this
case, a possibly unlimited number of outcomes are allowed and all the events hold
the same probability to take place.
• As an example, imagine the roll of a fair dice. In this case, there are multiple possible
events with each of them having the same probability to happen.

CS-601 Machine Learning (Unit-I) 46


Binomial Distribution
• The Binomial Distribution can instead be thought as the sum of outcomes of an event
following a Bernoulli distribution.
• The Binomial Distribution is therefore used in binary outcome events and the
probability of success and failure is the same in all the successive trials. This
distribution takes two parameters as inputs: the number of times an event takes place
and the probability assigned to one of the two classes.

CS-601 Machine Learning (Unit-I) 47


Normal (Gaussian) Distribution
• Many common phenomena that take place in our daily life follows Normal
Distributions such as: the income distribution in the economy, students average
reports, the average height in populations, etc.
• In addition to this, the sum of small random variables also turns out to usually follow
a normal distribution.
• The curve is symmetric at the centre. Therefore mean, mode and median are all equal
to the same value, making distribute all the values symmetrically around the mean.

CS-601 Machine Learning (Unit-I) 48


Poisson Distribution
• Poisson Distributions are commonly used to find the probability that an event might
happen or not knowing how often it usually occurs. Additionally, Poisson Distributions
can also be used to predict how many times an event might occur in a given time
period.
• Poisson Distributions are for example frequently used by insurance companies to
conduct risk analysis (e.g. predict the number of car crash accidents within a
predefined time span) to decide car insurance pricing.

CS-601 Machine Learning (Unit-I) 49


Exponential Distribution
• Exponential Distribution is used to model the time taken between the occurrence of
different events.
• As an example, let’s imagine we work at a restaurant and we want to predict what is
going to be the time interval between different customers coming to the restaurant.
Using an Exponential Distribution for this type of problem, could be the perfect place
where to start.

CS-601 Machine Learning (Unit-I) 50


Data Pre-processing
• Pre-processing refers to the transformations applied to our data before feeding it
to the algorithm.
• Data pre-processing is a technique that is used to convert the raw data into a clean
data set. In other words, whenever the data is gathered from different sources it is
collected in raw format which is not feasible for the analysis.

CS-601 Machine Learning (Unit-I) 51


Need of Data Pre-processing
• For achieving better results from the applied model in Machine Learning projects
the format of the data has to be in a proper manner. Some specified Machine
Learning model needs information in a specified format, for example, Random
Forest algorithm does not support null values; therefore, to execute random forest
algorithm null values have to be managed from the original raw data set.
• Another aspect is that data set should be formatted in such a way that more than
one Machine Learning and Deep Learning algorithms are executed in one data set,
and best out of them is chosen.

CS-601 Machine Learning (Unit-I) 52


Data Augmentation
Data augmentation is the process of increasing the amount and diversity of data. We do not
collect new data; rather we transform the already present data. For instance, we can consider
image, so in image there are various ways to transform and augment the image data.

Need for data augmentation


Data augmentation is an integral process in deep learning, as in deep learning we need large
amounts of data and in some cases it is not feasible to collect thousands or millions of images,
so data augmentation comes to the rescue. It helps us to increase the size of the dataset and
introduce variability in the dataset.
The most commonly used operations are-
1. Rotation
2. Shearing
3. Zooming
4. Cropping
5. Flipping
6. Changing the brightness level
CS-601 Machine Learning (Unit-I) 53
Normalizing Data Sets
Normalization is a technique often applied as part of data preparation for machine learning. The goal of
normalization is to change the values of numeric columns in the dataset to a common scale, without
distorting differences in the ranges of values. For machine learning, every dataset does not require
normalization. It is required only when features have different ranges.
The goal of normalization is to transform features to be on a similar scale. This improves the performance
and training stability of the model.
Four common normalization techniques may be useful:
• Scaling to a range
• Clipping
• Log scaling
• Z-score
Normalization is a technique often applied as part of data preparation for machine learning. The goal of
normalization is to change the values of numeric columns in the dataset to use a common scale, without
distorting differences in the ranges of values or losing information. Normalization is also required for some
algorithms to model the data correctly.

CS-601 Machine Learning (Unit-I) 54


Machine Learning Models

CS-601 Machine Learning (Unit-I) 55


Machine Learning Models

CS-601 Machine Learning (Unit-I) 56


Supervised Learning

CS-601 Machine Learning (Unit-I) 57


Supervised machine learning

Supervised machine learning is a machine learning technique that uses labeled data to train models to
predict outcomes.

How it works
A model is trained using input data and desired output values.
The model compares its predicted value to the actual value.
The model updates its solution based on the difference between the predicted and actual values.
The model repeats this process for each labeled example in the dataset.
The model learns the mathematical relationship between the features and the label.

CS-601 Machine Learning (Unit-I) 58


Supervised Learning

CS-601 Machine Learning (Unit-I) 59


CS-601 Machine Learning (Unit-I) 60
Unsupervised Learning
Unsupervised learning is a machine learning technique that analyzes unlabeled
data to find patterns and relationships. It can help you understand your data
more deeply.

How does it work?


• Unsupervised learning doesn't require labeled data, making it faster to work with large datasets.
• It can reduce large amounts of data into simpler forms without losing essential patterns.
• It can uncover patterns and relationships in the data that were previously unknown.

When is unsupervised learning useful?


• When labeled data is limited, unsupervised learning is more flexible than supervised learning.
• It can detect new types of anomalies not seen before.
• It can help simplify the dataset, reduce computation time, reduce memory requirements, and
improve the performance of machine learning algorithms.

CS-601 Machine Learning (Unit-I) 61


CS-601 Machine Learning (Unit-I) 62
Unsupervised Learning
Numerical Based on K Means Clustering
K Means Clusters
1. Decide n clusters
2. Initialize centroids
3. Assign the clusters (to centroids)

4. Repeat step 3 till group become


static means no centroid will
change the group
5. FINISH Let say, K=3

CS-601 Machine Learning (Unit-I) 63


Unsupervised Learning
Numerical Based on K Means Clustering

Initial Clusters before K-Means


CS-601 Machine Learning (Unit-I) 64
Unsupervised Learning
Numerical Based on K Means Clustering

CS-601 Machine Learning (Unit-I) 65


Unsupervised Learning
Numerical Based on K Means Clustering

Final Clusters after K-Means


CS-601 Machine Learning (Unit-I) 66
Reinforcement Learning

Reinforcement Learning (RL) is a machine learning


paradigm where an agent learns to make decisions
by interacting with an environment, receiving
rewards or punishments for its actions, and
iteratively optimizing its behavior to maximize
cumulative rewards.

Key Components:

• Agent: The entity that interacts with the


environment and makes decisions.
• Environment: The world or system in which the
agent operates.
• State: The current situation or condition of the
environment.
CS-601 Machine Learning (Unit-I) 67
• Action: The possible moves or choices the agent
CS-601 Machine Learning (Unit-I) 68
CS-601 Machine Learning (Unit-I) 69
CS-601 Machine Learning (Unit-I) 70

You might also like