Unit-1-MLF-1

The document provides an overview of Machine Learning fundamentals, including key concepts such as supervised and unsupervised learning, model selection, and the importance of machine learning in various industries. It discusses essential terminology like data sets, features, and labels, as well as techniques for handling categorical data such as one-hot encoding. Additionally, it covers challenges like underfitting and overfitting, the bias-variance trade-off, and the model selection process.

Uploaded by

VANSHIKA GADA

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Unit-1-MLF-1

Uploaded by

VANSHIKA GADA

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

Machine Learning

Fundamentals
B.Tech(CSBS), Btech(CSDS-311)
V Sem

Dr. Shubha Puthran

Mumbai Campus
Machine Learning Fundamentals:
Terminology, Supervised and Unsupervised Learning, Underfitting,
Overfitting, Bias, Variance, Trade-off, Model Selection, Applications
Introduction
• Machine learning is a subset of artificial
intelligence that focuses on developing algorithms
and models that enable computers to learn and
make predictions or decisions without being
explicitly programmed
• Study of algorithms and statistical models that
allow computer systems to improve their
performance on a specific task through iterative
learning from data
Why is machine learning necessary?

• Learning is a hallmark of intelligence; many would argue that a

system that cannot learn is not intelligent.
• Without learning, everything is new; a system that cannot learn
is not efficient because it rederives each solution and
repeatedly makes the same mistakes.
Different Varieties of Machine Learning
• Concept Learning
• Clustering Algorithms
• Genetic Algorithms
• Reinforcement Learning
• Case-based Learning
• Discovery Systems
• Knowledge capture
Importance of machine learning in today's world
•Machine learning has revolutionized various industries and
sectors, including healthcare, finance, e-commerce, transportation,
and more.

•It enables us to extract valuable insights , for example predicting

customer behavior, detecting fraud, diagnosing diseases,
autonomous driving, and personalizing user experiences.

•In an era of data explosion, machine learning provides the tools to

harness the power of data and uncover hidden patterns, making it
a crucial technology for businesses and organizations to stay
competitive and innovative.

•Emphasizing that machine learning is transforming the way we

live, work, and interact with technology, and its impact will only
continue to grow in the future.
Key Terminology
Data Set
• A data set refers to a collection of individual
data points or examples used for training,
testing, and evaluation in machine learning
• It can include various types of data, such as
numerical, categorical, textual, or image data,
depending on the specific problem being
addressed
Feature:
• In machine learning, a feature refers to an individual measurable
property or characteristic of a data point or object
• Features are used as input variables or dimensions in machine
learning algorithms to make predictions or classifications
• Examples of features could include numerical values like age or
income, categorical variables like gender or location, or textual data
like a product description
Label:
• A label, also known as a target or output variable, is the value or
class that we want to predict or classify in a supervised learning
problem.
• In a labeled data set, each data point is associated with a
corresponding label that represents the ground truth or desired
outcome.
• For example, in a spam email classification task, the label can be
binary, indicating whether the email is spam (1) or not spam (0).
Titanic dataset Features and Label
One hot encoding
• These datasets consist of both categorical as well as numerical
columns. However, various Machine Learning models do not work
with categorical data and to fit this data into the machine learning
model it needs to be converted into numerical data.
• For example, suppose a dataset has a Gender column with
categorical elements like Male and Female. These labels have no
specific order of preference and also since the data is string labels,
machine learning models misinterpreted that there is some sort of
hierarchy in them.
One Hot Encoding
• One hot encoding is a technique that we use to represent categorical
variables as numerical values in a machine learning model.
The advantages of using one hot encoding include:
1.It allows the use of categorical variables in models that require numerical
input.
2.It can improve model performance by providing more information to the
model about the categorical variable.
3.It can help to avoid the problem of ordinality, which can occur when a
categorical variable has a natural ordering (e.g. “small”, “medium”, “large”).
• The disadvantages of using one hot encoding include:
1.It can lead to increased dimensionality, as a separate column is created for
each category in the variable. This can make the model more complex and
slow to train.
2.It can lead to sparse data, as most observations will have a value of 0 in
most of the one-hot encoded columns.
3.It can lead to overfitting, especially if there are many categories in the
variable and the sample size is relatively small.
4. One-hot-encoding is a powerful technique to treat categorical data, but it
can lead to increased dimensionality, sparsity, and overfitting. It is
important to use it cautiously and consider other methods such as ordinal
encoding or binary encoding.
One hot Encoding

Categorical
Fruit Price
value of fruit

apple 1 5

mango 2 10

apple 1 15

orange 3 20
One hot Encoding
Supervised learning and its characteristics:
• Supervised learning is a type of machine learning
where the model is trained on labeled examples,
consisting of input-output pairs
• The goal of supervised learning is to learn a
mapping or relationship between the input
features and their corresponding output labels
• During the training phase, the model learns from
the labeled data by adjusting its parameters to
minimize the difference between the predicted
outputs and the true labels
• The trained model can then make predictions or
classify new, unseen data based on the learned
patterns
Supervised Learning Examples
• Predicting House Prices:
• Given features like area, number of rooms,
location, etc., predict the price of a house
• The labeled data would consist of houses with
their corresponding prices

• Email Spam Classification:

• Given the content and features of an email,
classify it as either spam or non-spam
• The labeled data would consist of emails marked
as spam or non-spam
Applications:
Unsupervised learning • Customer Segmentation:
• Given a dataset of customer attributes and
What is it? behaviors, identify distinct groups or segments
of customers with similar characteristics
• Unsupervised learning is a type of • The model would cluster the customers based on
machine learning where the model their similarities and help businesses tailor their
is trained on unlabeled examples marketing strategies accordingly
without any specific output or target
variable
• The goal of unsupervised learning is
to discover patterns, relationships,
or structures in the data
• The model learns to identify
clusters, similarities, or anomalies in
the data without any prior
knowledge of the desired outcomes
Unsupervised Learning
Topic Modeling in Documents:
• Given a large collection of documents, discover the underlying topics
and themes within the text
• The model would identify common patterns, keywords, and
relationships among the documents to categorize them into different
topics.
Underfitting:

• Underfitting occurs when a machine learning model is unable to

capture the underlying patterns or relationships in the data,
resulting in poor performance and low accuracy.
• It usually happens when the model is too simple or lacks the
complexity to represent the true underlying function.
• Underfitting can be avoided by using more data and also
reducing the features by feature selection

Causes and consequences of underfitting:

• Insufficient model complexity: The model may not have enough
parameters or flexibility to capture the complexity of the data
• Limited training data: When the training data is limited or not
representative of the entire population, the model may not learn
enough to generalize well
• Consequences: Underfitting leads to high bias, meaning the model
oversimplifies the problem and fails to capture important patterns
or variations in the data
Reasons for Underfitting:
1. High bias and low variance.
2. The size of the training dataset used is not enough.
3. The model is too simple.
4. Training data is not cleaned and also contains noise in it.

Techniques to Reduce Underfitting

1.Increase model complexity.
2.Increase the number of features, performing feature
engineering.
3.Remove noise from the data.
4.Increase the number of epochs or increase the duration of
training to get better results.
Overfitting:
• Overfitting occurs when a machine learning model
learns the training data too well, including noise
and irrelevant details, resulting in poor
generalization to new, unseen data
• The model essentially memorizes the training
examples instead of learning the underlying
patterns
Causes and consequences of overfitting:
• Excessive model complexity: The model may have
too many parameters or degrees of freedom,
allowing it to fit the noise or outliers in the training
data
• Lack of regularization: Without proper
regularization techniques, the model can become
overly sensitive to the training data
• Consequences: Overfitting leads to high variance,
meaning the model is too sensitive to small
fluctuations in the training data, making it perform
poorly on new data.
Bias and Variance
Bias and Variance..continued..
Variance and its effects
Bias-Variance Tradeoff
Bias-Variance Trade-off
• A model with high bias makes strong assumptions about the
underlying data, often leading to underfitting. Underfitting occurs
when a model is too simple to capture the true patterns in the data,
resulting in poor performance and low accuracy.
• Variance, on the other hand, refers to the error introduced by the
model's sensitivity to fluctuations in the training data.
• A model with high variance is overly complex and captures noise or
random fluctuations in the training data, leading to overfitting.
• Overfitting occurs when a model fits the training data extremely
well but fails to generalize to new, unseen data, resulting in poor
performance and high error on test data.
• The bias-variance trade-off arises from the fact that decreasing bias
often increases variance and vice versa.
• A complex model can better fit the training data, reducing bias but
increasing variance.
Bias-Variance Trade-off
• To achieve this balance, various techniques can be employed, such
as:
1.Cross-validation: Splitting the data into multiple train-test splits and
using techniques like k-fold cross-validation can help estimate
model performance on unseen data and select the model with the
best trade-off.
2.Feature selection: Choosing the most relevant features and reducing
the dimensionality of the data can help reduce model complexity and
variance.
3.Ensemble methods: Combining predictions from multiple models,
such as bagging, boosting, or stacking, can help reduce variance by
averaging out individual model errors.
4.Model selection: Choosing a model that naturally balances bias and
variance, such as decision trees with limited depth or support vector
machines with appropriate kernel parameters.
Model Selection:

The model selection process typically involves the following steps:

1.Define the problem: Clearly understand the problem you are trying to solve,
including the type of task (classification, regression, etc.) and the evaluation metric
to assess model performance (accuracy, precision, recall, etc.).
2.Prepare the data: Preprocess and prepare your dataset by handling missing
values, encoding categorical variables, normalizing or standardizing features, and
splitting the data into training, validation, and test sets.
3.Choose a set of candidate models: Select a set of models that are suitable for
the problem at hand. This can include different algorithms (e.g., decision trees,
support vector machines, neural networks) or variations of the same algorithm
with different hyperparameters.
4.Train the models: Train each model on the training data using an appropriate training algorithm or
optimization procedure. The models learn from the input data and adjust their parameters to minimize a
specific loss or error function.
5.Evaluate the models: Assess the performance of each trained model on the validation set. This involves
computing the evaluation metric(s) defined earlier. This step helps estimate how well the models generalize
to unseen data.
6.Select the best model: Compare the performance of the models on the validation set and select the one
that performs the best according to the chosen evaluation metric. This model is typically expected to have a
good trade-off between bias and variance.
7.Validate the selected model: After choosing the best model, it is essential to validate its performance on
the test set, which serves as a final unbiased evaluation. This step ensures that the selected model performs
well on unseen data and is not overfitted to the validation set.
It is worth noting that model selection is an iterative process, and it may require adjusting hyperparameters,
exploring different architectures, or trying different feature engineering techniques to improve model
performance.
Additionally, techniques like cross-validation can be employed to automate and optimize the model
selection process.
Applications
• Ask Students to Identify through Research Papers in the
Lab(Experiment 1: Task 1)
• Discuss their research paper in the class
• Padlet can be used for understanding the concept of Unit-1

MUS Story Book PDF
No ratings yet
MUS Story Book PDF
12 pages
Business Model 1
100% (1)
Business Model 1
38 pages
A Nordic Nirvana - Gender, Citizenship, and Social Justice in The Nordic Welfare States
0% (1)
A Nordic Nirvana - Gender, Citizenship, and Social Justice in The Nordic Welfare States
37 pages
UNIT 1 - Introduction (Types of Machine Learning)
100% (1)
UNIT 1 - Introduction (Types of Machine Learning)
21 pages
Data Science-Unit-4- 05.10.23
No ratings yet
Data Science-Unit-4- 05.10.23
59 pages
ML & DL
No ratings yet
ML & DL
19 pages
5.3 Model
No ratings yet
5.3 Model
26 pages
Fundamentals of Machine Learning II
No ratings yet
Fundamentals of Machine Learning II
13 pages
Unit 1ML
No ratings yet
Unit 1ML
12 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
Unit1 ML
No ratings yet
Unit1 ML
15 pages
MachineLearning Jan2nd
100% (2)
MachineLearning Jan2nd
171 pages
Machine Learning
No ratings yet
Machine Learning
51 pages
Csa202 Unit 2
No ratings yet
Csa202 Unit 2
36 pages
Classification
No ratings yet
Classification
53 pages
Introduction To ML
No ratings yet
Introduction To ML
55 pages
Machine Learning
No ratings yet
Machine Learning
11 pages
Chapter 1-ML
No ratings yet
Chapter 1-ML
27 pages
Week 01
No ratings yet
Week 01
37 pages
Machine Learning Notes "2023
No ratings yet
Machine Learning Notes "2023
31 pages
Lecture5
No ratings yet
Lecture5
26 pages
module3_DS_ppt
No ratings yet
module3_DS_ppt
68 pages
Data Science Vijay1
No ratings yet
Data Science Vijay1
88 pages
Machine Learning
100% (1)
Machine Learning
23 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
10 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
Complete ML Concepts
No ratings yet
Complete ML Concepts
30 pages
Tutorial 3
No ratings yet
Tutorial 3
30 pages
Machine Learning
No ratings yet
Machine Learning
17 pages
MLT - MKC
No ratings yet
MLT - MKC
10 pages
Module 3 Data Science Machine Learning
No ratings yet
Module 3 Data Science Machine Learning
53 pages
Unit Iii Supervised Learning
No ratings yet
Unit Iii Supervised Learning
67 pages
Machine Learning - 1 (UNIT - 1)
No ratings yet
Machine Learning - 1 (UNIT - 1)
6 pages
Asset v1 ACCA+ML001+2T2021+Type@Asset+Block@Glossary
No ratings yet
Asset v1 ACCA+ML001+2T2021+Type@Asset+Block@Glossary
5 pages
Chapter 1
No ratings yet
Chapter 1
3 pages
Machine Learning: in Telugu
No ratings yet
Machine Learning: in Telugu
14 pages
Machine Learning
No ratings yet
Machine Learning
12 pages
Summer Internship Report
No ratings yet
Summer Internship Report
27 pages
Unit 3
No ratings yet
Unit 3
10 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
15 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
MachineLearning in short
No ratings yet
MachineLearning in short
10 pages
AI Unit4 Learning Dd83e0ee 7d19 48c7 Bc5d b39decf3b0fc
No ratings yet
AI Unit4 Learning Dd83e0ee 7d19 48c7 Bc5d b39decf3b0fc
19 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
Unit 1 Machine Learning - PDF Lands
No ratings yet
Unit 1 Machine Learning - PDF Lands
5 pages
UNIT 3__ML
No ratings yet
UNIT 3__ML
15 pages
Supervised learning
No ratings yet
Supervised learning
19 pages
Ybi Python Final Internship Report
100% (6)
Ybi Python Final Internship Report
29 pages
Machine Learning
No ratings yet
Machine Learning
115 pages
5_6095834670757318868
No ratings yet
5_6095834670757318868
62 pages
Unit5_ML_introduction
No ratings yet
Unit5_ML_introduction
32 pages
Unit 1 Machine Learning
No ratings yet
Unit 1 Machine Learning
68 pages
Unit I
No ratings yet
Unit I
44 pages
python-Final-Internship-Report
No ratings yet
python-Final-Internship-Report
29 pages
Lec2 Intro to ML
No ratings yet
Lec2 Intro to ML
35 pages
Chapter 01 Introduction to ML
No ratings yet
Chapter 01 Introduction to ML
178 pages
Slide 1
No ratings yet
Slide 1
29 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
ML-1-PPT-UNIT-1
No ratings yet
ML-1-PPT-UNIT-1
93 pages
Week 4 - Intro to ML
No ratings yet
Week 4 - Intro to ML
37 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
BATE
No ratings yet
BATE
5 pages
Lecture 23 Supply Chain Performance & SCOR (1)
No ratings yet
Lecture 23 Supply Chain Performance & SCOR (1)
17 pages
b922c66d-8816-4728-a6f8-a44e0f034391
No ratings yet
b922c66d-8816-4728-a6f8-a44e0f034391
3 pages
2m Tanker Cal. Solved
100% (1)
2m Tanker Cal. Solved
21 pages
Introduction To Psychology PPT and Report Saniya Lanjekar
No ratings yet
Introduction To Psychology PPT and Report Saniya Lanjekar
16 pages
Social Questions
100% (6)
Social Questions
40 pages
Cafe 16 Menu
No ratings yet
Cafe 16 Menu
5 pages
GSGV
No ratings yet
GSGV
2 pages
CSS Solved Pair of Words 2000 To 2023 Edition
100% (10)
CSS Solved Pair of Words 2000 To 2023 Edition
47 pages
English 10 Quarter 1 Reviewer
No ratings yet
English 10 Quarter 1 Reviewer
2 pages
Experiment 4 - Orientation Effect
No ratings yet
Experiment 4 - Orientation Effect
5 pages
JMC College of Law Public International Law Introduction
No ratings yet
JMC College of Law Public International Law Introduction
7 pages
6th Mscshealth and Diet Notes
No ratings yet
6th Mscshealth and Diet Notes
4 pages
Prequalified Contractor List
No ratings yet
Prequalified Contractor List
72 pages
GRBG Mayfair Menu FINAL
No ratings yet
GRBG Mayfair Menu FINAL
1 page
SELCO Foundation Annual Report 2021 2022 2 Compressed
No ratings yet
SELCO Foundation Annual Report 2021 2022 2 Compressed
51 pages
Assignment 3
No ratings yet
Assignment 3
9 pages
The Revival of Early Music in 18th-Century Italy: Observations On The Correspondence Between Girolamo Chiti and Padre Giambattista Martini
100% (1)
The Revival of Early Music in 18th-Century Italy: Observations On The Correspondence Between Girolamo Chiti and Padre Giambattista Martini
12 pages
Practical Research Ii
No ratings yet
Practical Research Ii
9 pages
T 27-12-01-640-801-Pressure Greasing of The Aileron Servo-Actuator Be
No ratings yet
T 27-12-01-640-801-Pressure Greasing of The Aileron Servo-Actuator Be
3 pages
"A Modern Olympia." (Critique)
No ratings yet
"A Modern Olympia." (Critique)
4 pages
Antidiabetic Potential of Plants Used in Cuba
No ratings yet
Antidiabetic Potential of Plants Used in Cuba
11 pages
Whom It May Concern
No ratings yet
Whom It May Concern
3 pages
Polar Patterns
No ratings yet
Polar Patterns
3 pages
Metric Conversion
No ratings yet
Metric Conversion
5 pages
Seangio Vs Reyes PDF Will and Testament Intestacy
No ratings yet
Seangio Vs Reyes PDF Will and Testament Intestacy
1 page
Present Perfect Affirmative Forms
No ratings yet
Present Perfect Affirmative Forms
9 pages