0% found this document useful (0 votes)

17 views20 pages

UNIT 1

The document provides an overview of machine learning (ML) as a subfield of artificial intelligence (AI), emphasizing its focus on learning from data without explicit programming. It discusses various learning problems, perspectives, and key issues in ML, including supervised, unsupervised, and reinforcement learning, as well as concepts like decision trees and inductive bias. Additionally, it outlines the candidate elimination algorithm and the importance of data quality, model complexity, and interpretability in the learning process.

Uploaded by

charugeshm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views20 pages

UNIT 1

Uploaded by

charugeshm

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

NCVRT- AI& ML

MACHINE
LEARNING
UNIT-1
▪Learning Problems – Perspectives and Issues – Concept
Learning – Version Spaces and Candidate Eliminations –
Inductive bias – Decision Tree learning – Representation –
Algorithm – Heuristic Space Search.
WHAT IS MACHINE LEARNING
▪ Machine learning (ML) is a subfield of artificial intelligence (AI) that focuses on enabling
computer systems to learn from data without being explicitly programmed.
▪ Subset of AI: Machine learning is a specific approach to achieving artificial intelligence.
While AI is a broader concept encompassing any technique that allows computers to mimic
human intelligence, ML focuses on learning from data.
▪ Learning from Data: Instead of being given explicit instructions for every task, ML
algorithms are trained on large datasets. They identify patterns, relationships, and insights
within this data.
▪ Without Explicit Programming: The core idea is that the machine learns how to perform a
task by analyzing data, rather than a programmer writing specific code for every possible
scenario.
▪ Improvement Through Experience: As ML models are exposed to more data, their
performance on the given task typically improves.
1.LEARNING PROBLEMS –
PERSPECTIVES AND ISSUES
▪ What is a Learning Problem?
▪ A learning problem arises when we want a computer system to improve its performance on a
specific task based on experience (data).
▪ Task (T): The specific problem we want to solve (e.g., classifying emails as spam or not spam,
predicting house prices).
▪ Performance (P): A metric that quantifies how well the system is performing the task (e.g.,
accuracy, precision, recall, mean squared error)
▪ Experience (E): The data that the system learns from (e.g., a collection of labeled emails,
historical house prices).
▪ The goal of a learning algorithm is to use the experience (E) to improve the system's
performance (P) on the task (T).
2.PERSPECTIVES ON
LEARNING PROBLEMS:
▪ 1.Supervised Learning: Learning from labeled data (input-output pairs). The goal is to learn a mapping
function that can predict the output for new, unseen inputs.
▪ Classification: Predicting a discrete output label (e.g., cat/dog, spam/not spam)
▪ Regression: Predicting a continuous output value (e.g., house price, temperature)

▪ 2.Unsupervised Learning: Learning from unlabeled data. The goal is to discover hidden patterns, structures, or
relationships in the data.
▪ Clustering: Grouping similar data points together.
▪ Dimensionality Reduction: Reducing the number of features while preserving important information.
▪ Association Rule Mining: Finding relationships between different items in a dataset.

▪ 3.Reinforcement Learning: Learning through interaction with an environment. An agent learns to take actions
that maximize a reward signal.
▪ 4.Semi-Supervised Learning: Learning from a combination of labeled and unlabeled data.
▪ 5.Active Learning: The learning algorithm strategically queries a user or oracle to label the most informative
data point
KEY ISSUES IN MACHINE
LEARNING
▪ 1. Data Acquisition and Preparation: Obtaining sufficient, relevant, and high-quality data is
crucial. This involves data cleaning, preprocessing, feature engineering, and handling missing
values.
▪ 2. Choosing the Right Representation: How the data is represented (features) significantly
impacts the learning process and the model's performance.
▪ 3. Selecting the Appropriate Algorithm: Different algorithms have different strengths and
weaknesses and are suited for different types of tasks and data
▪ 4. Model Complexity and Generalization: Balancing model complexity to avoid overfitting
(performing well on training data but poorly on unseen data) and underfitting (failing to
capture the underlying patterns)
▪ 5. Bias and Fairness: Ensuring that the learning process and the resulting models are fair and
do not perpetuate or amplify existing biases in the data
KEY ISSUES IN MACHINE
LEARNING
▪ 6. Interpretability and Explainability: Understanding why a model makes certain
predictions, especially important in critical applications.
▪ 7. Scalability: Handling large datasets and complex models efficiently.
▪ 8. Evaluation and Validation: Assessing the performance of the learned model on unseen
data to ensure generalization.
▪ 9. Computational Resources: The time and computational power required for training and
deploying models.
▪ 10. Data Privacy and Security: Protecting sensitive data used for training and prediction.
▪ 11. Concept Drift: Dealing with changes in the underlying data distribution over time.
3.CONCEPT LEARNING
▪ What is a Concept?
▪ In machine learning, a concept is a boolean-valued function defined over a set of instances. It represents a category or a set
of items that belong together. The goal of concept learning is to learn this boolean function from a set of positive and
negative examples.
▪ Instance Space (X): The set of all possible objects or examples. Each instance is typically represented by a set of features.
▪ Target Concept (c): The boolean function we want to learn, where c(x)=1 if instance x belongs to the concept and c(x)=0
otherwise.
▪ Training Examples (D): A set of labeled instances, where each instance x is paired with its correct label c(x)
▪ Positive Examples: Instances for which c(x)=1
▪ Negative Examples: Instances for which c(x)=0
▪ Hypothesis Space (H): The set of all possible hypotheses (candidate concept definitions) that the learning algorithm can
consider.
▪ Learner’s Task: To find a hypothesis h in H such that h(x)=c(x) for all instances x in the instance space X.
4. VERSION SPACES AND
CANDIDATE ELIMINATIONS
▪ Version Space:
▪ The version space, with respect to a hypothesis space H and a set of training examples D, is the subset
from hypotheses from H that are consistent with all training examples in D.
▪ In other words, it's the set of all plausible hypotheses that could be the target concept given the
observed data.
▪ Candidate Elimination Algorithm:
▪ The candidate elimination algorithm is a method for finding the version space. It maintains two sets of
hypotheses:
▪ G (General Boundary): The set of the most general hypotheses in H that are consistent with all
positive examples and inconsistent with at least one negative example.
▪ S (Specific Boundary): The set of the most specific hypotheses in H that are consistent with all
positive examples and consistent with all negative examples.
ALGORITHM STEPS
▪ 1. Initialization:
▪ Initialize S to contain the most specific hypothesis (e.g., for conjunctive hypotheses, this could be a hypothesis
that matches no instances).
▪ Initialize G to contain the most general hypothesis (e.g., for conjunctive hypotheses, this could be a hypothesis
that matches all instances).
▪ 2. Processing Positive Examples: For each positive training example x:
▪ Remove any hypothesis in G that is inconsistent with x.
▪ For each hypothesis s in S that is inconsistent with x :
▪ Remove s from S
▪ Generalize s to the minimal more general hypotheses that are consistent with x.
▪ Add each of these new generalized hypotheses to S if and only if they are more specific than some hypothesis in
G.
▪ Remove any hypothesis in S that is more general than another hypothesis in S.
ALGORITHM STEPS
▪ 3. Processing Negative Examples: For each negative training example x:
▪ Remove any hypothesis in S that is consistent with x.
▪ For each hypothesis g in G that is consistent with x.
▪ Remove g from G
▪ Specialize g to the minimal more specific hypotheses that are inconsistent with x.
▪ Add each of these new specialized hypotheses to G if and only if they are more general than
some hypothesis in S.
▪ Remove any hypothesis in G that is more specific than another hypothesis in G.
▪ 4. Termination: The algorithm terminates when S and G converge to a single hypothesis (if
the target concept is learnable and uniquely identifiable within H) or when they define the
boundaries of the version space.
5. INDUCTIVE BIAS
▪ What is Inductive Bias?
▪ Inductive bias (also known as learning bias) refers to the set of assumptions that a learning
algorithm makes to generalize from the training data to unseen instances.
▪ It's the preference of the learning algorithm for one hypothesis over another, even if both are
consistent with the observed training data.
▪ Why is Inductive Bias Necessary?
▪ Without any inductive bias, a learning algorithm would have no basis for choosing one
generalization over another when faced with unseen data.
▪ For any set of training examples, there could be infinitely many hypotheses that are consistent
with them but make different predictions on new instances. Inductive bias allows the algorithm
to make reasonable generalizations.
TYPES OF INDUCTIVE BIAS
▪ Hypothesis Space Restriction: The algorithm only considers hypotheses from a specific,
limited set. This is a strong form of bias. For example, a linear regression model assumes a
linear relationship between features and the target variable
▪ Preference for Certain Hypotheses: Even within a given hypothesis space, the algorithm
might prefer some hypotheses over others.
▪ Search Bias: The way the learning algorithm searches through the hypothesis space can
introduce bias. For example, a greedy search might find a locally optimal solution that is not
globally optimal
6. DECISION TREE LEARNING
▪ In machine learning, a decision tree is a supervised learning algorithm used for both
classification and regression, represented as a flowchart-like structure that makes predictions
by following a series of decision rules.
▪ Decision trees learn from labeled data, meaning they are trained on data where the correct
outcome (or target variable) is known.
▪ Decision rules:
▪ These are the conditions or questions that determine the path taken through the tree.
▪ Training Data: The data used to train the decision tree algorithm, containing features and the
corresponding target variable.
▪ Prediction: Once trained, the decision tree can predict outcomes for new, unseen data by
following the decision rules.
HOW DECISION TREE WORKS
▪ 1. Data Input: The algorithm is fed with a dataset containing features and the target variable.
▪ 2. Feature Selection: The algorithm selects the most important features to use for splitting the
data.
▪ 3.Splitting: The data is split into subsets based on the selected features and their values.
▪ 4. Recursion: The process of splitting and selecting features is repeated recursively until a
stopping condition is met (e.g., all data in a branch is of the same class, or a maximum tree
depth is reached).
▪ 5. Prediction: When a new data point is presented, it is passed down the tree based on the
decision rules, eventually reaching a leaf node, which represents the prediction
7.DECISION TREE LEARNING-
REPRESENTATION
▪ Representation: A decision tree is a tree-like structure where:
▪ Each internal node represents a test on an attribute (feature)
▪ Each branch represents the outcome of the test.
▪ Each leaf node represents a class label (for classification) or a predicted value (for
regression).
▪ To classify a new instance, we start at the root node, perform the test on the attribute, follow
the corresponding branch, and continue this process until we reach a leaf node, which
provides the classification or prediction.
REPRESENTATION
▪ Root Node: The starting point of the tree,
representing the entire dataset.
▪ Internal Nodes: Represent decision rules
or questions based on features of the data.
▪ Branches: Represent possible outcomes or
paths based on the decision rules.
▪ Leaf Nodes: Represent the final
predictions or classifications.
8. ALGORITHM (A BASIC
GREEDY APPROACH - E.G.,
ID3, C4.5):
▪ 1. Start with all training examples at the root node.
▪ 2.If all examples at the current node belong to the same class, then the current node becomes a leaf
node labeled with that class.
▪ 3.If there are no remaining attributes to test, then the current node becomes a leaf node labeled with the
most common class among the examples at that node (majority voting).
▪ 4. Otherwise, select the "best" attribute to split the current node. The "best" attribute is typically
chosen based on a splitting criterion that aims to maximize the separation of classes in the resulting
child nodes.
▪ 5. Create child nodes for each distinct value of the selected attribute.
▪ 6. Distribute the training examples at the current node to the child nodes based on the value of the
selected attribute.
▪ 7. Recursively apply steps 2-6 to each child node
9. HEURISTIC SPACE SEARCH
▪ The process of learning a decision tree can be viewed as a heuristic search through the space of
possible decision trees.
▪ Search Space: The set of all possible decision trees that can be constructed from the given
attributes. This space is typically very large.
▪ Heuristic: The splitting criterion (e.g., information gain, gain ratio, Gini impurity) acts as a
heuristic function that guides the search towards more promising decision trees. The goal of
the heuristic is to find a tree that accurately classifies the training data and generalizes well to
unseen data.
▪ Greedy Search: Most decision tree learning algorithms employ a greedy top-down approach.
At each step, they select the locally optimal attribute to split on without backtracking or
considering alternative splits made earlier in the tree construction process. This greedy nature
means that the algorithm might not find the globally optimal decision tree.
ISSUES AND
CONSIDERATIONS IN
DECISION TREE LEARNING
▪ Overfitting: Decision trees can easily overfit the training data, especially if they grow very
deep.
▪ Handling Continuous Attributes: Continuous attributes need to be discretized or handled
using split points.
▪ Handling Missing Values: Strategies are needed to deal with instances where some attribute
values are missing.
▪ Computational Complexity: Building a decision tree can be computationally expensive,
especially with a large number of attributes and training examples.
▪ Bias: Decision trees have a bias towards features with more levels (in information gain) and
can be sensitive to the order of attributes considered.
▪ Representation Power: Decision trees can represent complex decision boundaries, but they
might struggle with certain types of functions (e.g., XOR)

ML 02
No ratings yet
ML 02
25 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
27 pages
complete ml (1)
No ratings yet
complete ml (1)
325 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
Machine Learning Moudle - 1: There Are Three Main Types of Machine Learning
No ratings yet
Machine Learning Moudle - 1: There Are Three Main Types of Machine Learning
86 pages
1 Leaning Introduction
No ratings yet
1 Leaning Introduction
29 pages
Machine Learning
No ratings yet
Machine Learning
99 pages
UNIT I 4 ML Hypothesis & Concept Learning
No ratings yet
UNIT I 4 ML Hypothesis & Concept Learning
69 pages
A Course in Machine Learning
100% (1)
A Course in Machine Learning
191 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
54 pages
MLT Part 1
No ratings yet
MLT Part 1
230 pages
ML - Full Slides Srikanth Allamshatty
No ratings yet
ML - Full Slides Srikanth Allamshatty
369 pages
ML Full Slides Final
No ratings yet
ML Full Slides Final
458 pages
Machine Learning Basics Dl2 Rk (1)
No ratings yet
Machine Learning Basics Dl2 Rk (1)
16 pages
Module 3 - AIML
No ratings yet
Module 3 - AIML
134 pages
UNIT-I
No ratings yet
UNIT-I
132 pages
Hypothesis Testing
60% (5)
Hypothesis Testing
26 pages
Machine Learning
No ratings yet
Machine Learning
104 pages
ML Unit 1 Notes
No ratings yet
ML Unit 1 Notes
134 pages
Module 2
No ratings yet
Module 2
15 pages
MLT Key
No ratings yet
MLT Key
71 pages
oaneoinae
No ratings yet
oaneoinae
13 pages
ML Unit-4 Prob Learning
No ratings yet
ML Unit-4 Prob Learning
36 pages
Machine Learnig
No ratings yet
Machine Learnig
38 pages
ML Short_Ques_Answers
No ratings yet
ML Short_Ques_Answers
10 pages
Basic_concepts_of_Machine_Learning_for_Beginners_1732109263
No ratings yet
Basic_concepts_of_Machine_Learning_for_Beginners_1732109263
102 pages
AI_M5
No ratings yet
AI_M5
19 pages
UNIT 2
No ratings yet
UNIT 2
19 pages
ML1-Introduction To Machine Learning
No ratings yet
ML1-Introduction To Machine Learning
46 pages
Machine Learning - ch1
No ratings yet
Machine Learning - ch1
46 pages
Ai Unit5 Learning
No ratings yet
Ai Unit5 Learning
62 pages
unit 1 ml
No ratings yet
unit 1 ml
41 pages
ml1
No ratings yet
ml1
17 pages
UNIT 3
No ratings yet
UNIT 3
16 pages
machine-1
No ratings yet
machine-1
35 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
46 pages
Machine Learning_v1 (1)
No ratings yet
Machine Learning_v1 (1)
30 pages
ITML U1 Overview
No ratings yet
ITML U1 Overview
45 pages
Chapter 6:artificial Intelligence Learning: By. Getaneh T
No ratings yet
Chapter 6:artificial Intelligence Learning: By. Getaneh T
59 pages
Key Ideas in Machine Learning
No ratings yet
Key Ideas in Machine Learning
11 pages
Psychology of Belief
No ratings yet
Psychology of Belief
174 pages
Unit 3
No ratings yet
Unit 3
16 pages
ML short Question and answers
No ratings yet
ML short Question and answers
11 pages
Linear Regression Models Applications In R John P Hoffman instant download
No ratings yet
Linear Regression Models Applications In R John P Hoffman instant download
83 pages
ML Unit 1
No ratings yet
ML Unit 1
35 pages
Research Methodology For BBS 4th Year
No ratings yet
Research Methodology For BBS 4th Year
53 pages
U1 - ML
No ratings yet
U1 - ML
5 pages
Unit 1 ML
No ratings yet
Unit 1 ML
8 pages
Lecture 1.2 Introduction to Machine Learning
No ratings yet
Lecture 1.2 Introduction to Machine Learning
31 pages
19. Larning Introduction
No ratings yet
19. Larning Introduction
6 pages
Machine Leaning 1 unit
No ratings yet
Machine Leaning 1 unit
10 pages
LAB MANUAL_ANATOMY-1
No ratings yet
LAB MANUAL_ANATOMY-1
10 pages
Machine Learning: Version 2 CSE IIT, Kharagpur
No ratings yet
Machine Learning: Version 2 CSE IIT, Kharagpur
9 pages
Leadership for a Better World Understanding the Social Change Model of Leadership Development Nclp instant download
100% (1)
Leadership for a Better World Understanding the Social Change Model of Leadership Development Nclp instant download
66 pages
Short - Ques - Answers FML
No ratings yet
Short - Ques - Answers FML
10 pages
Meaning and Nature of Research
89% (18)
Meaning and Nature of Research
40 pages
What Is Supervise
No ratings yet
What Is Supervise
3 pages
Clinical Approach in Family Medicine
100% (3)
Clinical Approach in Family Medicine
19 pages
ZAPOROZHETS, A. S. - Estudo Psicológico Do Desenvolvimento Da Atividade Motora Na Pre-Escola
No ratings yet
ZAPOROZHETS, A. S. - Estudo Psicológico Do Desenvolvimento Da Atividade Motora Na Pre-Escola
13 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Stat Chapter-9
No ratings yet
Stat Chapter-9
64 pages
Tributes Interpreters of Our Cultural Tradition (E.H. Gombrich) (Z-Library)
No ratings yet
Tributes Interpreters of Our Cultural Tradition (E.H. Gombrich) (Z-Library)
268 pages
International Strategic Management of Brands and Online Firms 1st Edition by Carolina Sinning, Bernhard Swoboda, Thomas Foscht, Hanna Schramm Klein ISBN 9783658380502 3658380500pdf download
No ratings yet
International Strategic Management of Brands and Online Firms 1st Edition by Carolina Sinning, Bernhard Swoboda, Thomas Foscht, Hanna Schramm Klein ISBN 9783658380502 3658380500pdf download
56 pages
Statistics and Probability
100% (1)
Statistics and Probability
27 pages
Scientific+Revolution+Stations+Activity+with+Graphic+Organizer+
No ratings yet
Scientific+Revolution+Stations+Activity+with+Graphic+Organizer+
36 pages
unit1
No ratings yet
unit1
6 pages
14.chapter 3 Research Methodology
No ratings yet
14.chapter 3 Research Methodology
25 pages
Learning in Artificial Intelligence
No ratings yet
Learning in Artificial Intelligence
6 pages
Buy Ebook (Ebook PDF) Statistics Unplugged 4th Edition by Sally Caldwell Cheap Price
100% (8)
Buy Ebook (Ebook PDF) Statistics Unplugged 4th Edition by Sally Caldwell Cheap Price
41 pages
Data Science and Its Relationship To Library and Information Science: A Content Analysis
No ratings yet
Data Science and Its Relationship To Library and Information Science: A Content Analysis
21 pages
CSS 131 PDF
No ratings yet
CSS 131 PDF
166 pages
Scientific Inquiry: Science Class 8 Grade Ms. Artigas
No ratings yet
Scientific Inquiry: Science Class 8 Grade Ms. Artigas
13 pages
Hempel-Philosophy of Natural Science
100% (1)
Hempel-Philosophy of Natural Science
10 pages
Chapter 1
No ratings yet
Chapter 1
13 pages
DSAT English - Command of Evidence (Textual) Questions
No ratings yet
DSAT English - Command of Evidence (Textual) Questions
4 pages
Chi Square Test Course Requirement
No ratings yet
Chi Square Test Course Requirement
5 pages
Grade 8 DLP Q2 Day 4 1
No ratings yet
Grade 8 DLP Q2 Day 4 1
5 pages
Module 1.1
No ratings yet
Module 1.1
12 pages
Criterion B Report Template Year 4-5
No ratings yet
Criterion B Report Template Year 4-5
4 pages
CH2020 Writing Checklist 2021 Mid
No ratings yet
CH2020 Writing Checklist 2021 Mid
2 pages
Abductive Reasoning As A Way of Worldmaking
No ratings yet
Abductive Reasoning As A Way of Worldmaking
18 pages
BIO 2129 - Ecology Lab Schedule - Fall 2017: Labs Begin at 8:30 Am Unless Indicated by An Asterisk ( )
No ratings yet
BIO 2129 - Ecology Lab Schedule - Fall 2017: Labs Begin at 8:30 Am Unless Indicated by An Asterisk ( )
12 pages
Assessment 1 - Ins Program
No ratings yet
Assessment 1 - Ins Program
21 pages
Systems Leadership Intro
No ratings yet
Systems Leadership Intro
9 pages
Fundamentals of Machine Learning: a Simplified Approach
From Everand
Fundamentals of Machine Learning: a Simplified Approach
Er. Sudhir Goswami
No ratings yet
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
From Everand
MACHINE LEARNING FOR BEGINNERS: A Practical Guide to Understanding and Applying Machine Learning Concepts (2023 Beginner Crash Course)
Elaine Tate
No ratings yet
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Machine Learning: Fundamentals and Applications
From Everand
Machine Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet

UNIT 1

Uploaded by

UNIT 1

Uploaded by

NCVRT- AI& ML

You might also like