0% found this document useful (0 votes)

22 views

Chapter Introduction

Uploaded by

RAKESH SWAIN

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Chapter Introduction

Uploaded by

RAKESH SWAIN

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

CHAPTER 1

Introduction

The main focus of machine learning (ML) is making decisions or predictions based on data.
There are a number of other fields with significant overlap in technique, but difference in
focus: in economics and psychology, the goal is to discover underlying causal processes This description
and in statistics it is to find a model that fits a data set well. In those fields, the end product paraphrased from a
post on 9/4/12 at
is a model. In machine learning, we often fit models, but as a means to the end of making
andrewgelman.com
good predictions or decisions.
As ML methods have improved in their capability and scope, ML has become arguably
the best way–measured in terms of speed, human engineering time, and robustness–to
approach many applications. Great examples are face detection, speech recognition, and
many kinds of language-processing tasks. Almost any application that involves under-
standing data or signals that come from the real world can be nicely addressed using ma-
chine learning.
One crucial aspect of machine learning approaches to solving problems is that human and often undervalued
engineering plays an important role. A human still has to frame the problem: acquire and
organize data, design a space of possible solutions, select a learning algorithm and its pa-
rameters, apply the algorithm to the data, validate the resulting solution to decide whether
it’s good enough to use, try to understand the impact on the people who will be affected
by its deployment, etc. These steps are of great importance.
The conceptual basis of learning from data is the problem of induction: Why do we think
that previously seen data will help us predict the future? This is a serious long standing
philosophical problem. We will operationalize it by making assumptions, such as that all
training data are so-called i.i.d.(independent and identically distributed), and that queries This means that the el-
will be drawn from the same distribution as the training data, or that the answer comes ements in the set are
related in the sense that
from a set of possible answers known in advance.
they all come from the
In general, we need to solve these two problems: same underlying prob-
ability distribution, but
• estimation: When we have data that are noisy reflections of some underlying quan- not in any other ways.
tity of interest, we have to aggregate the data and make estimates or predictions
about the quantity. How do we deal with the fact that, for example, the same treat-
ment may end up with different results on different trials? How can we predict how
well an estimate may compare to future results?
• generalization: How can we predict results of a situation or experiment that we have
never encountered before in our data set?

6
MIT 6.390 Spring 2024 7

We can describe problems and their solutions using six characteristics, three of which
characterize the problem and three of which characterize the solution:

1. Problem class: What is the nature of the training data and what kinds of queries will
be made at testing time?

2. Assumptions: What do we know about the source of the data or the form of the
solution?

3. Evaluation criteria: What is the goal of the prediction or estimation system? How
will the answers to individual queries be evaluated? How will the overall perfor-
mance of the system be measured?

4. Model type: Will an intermediate model of the world be made? What aspects of the
data will be modeled in different variables/parameters? How will the model be used
to make predictions?

5. Model class: What particular class of models will be used? What criterion will we
use to pick a particular model from the model class?

6. Algorithm: What computational process will be used to fit the model to the data
and/or to make predictions?

Without making some assumptions about the nature of the process generating the data, we
cannot perform generalization. In the following sections, we elaborate on these ideas. Don’t feel you have
to memorize all these
kinds of learning, etc.
1.1 Problem class We just want you to
have a very high-level
view of (part of) the
There are many different problem classes in machine learning. They vary according to what breadth of the field.
kind of data is provided and what kind of conclusions are to be drawn from it. Five stan-
dard problem classes are described below, to establish some notation and terminology.
In this course, we will focus on classification and regression (two examples of super-
vised learning), and we will touch on reinforcement learning, sequence learning, and clus-
tering.

1.1.1 Supervised learning

The idea of supervised learning is that the learning system is given inputs and told which
specific outputs should be associated with them. We divide up supervised learning based
on whether the outputs are drawn from a small finite set (classification) or a large finite
ordered set or continuous set (regression).

1.1.1.1 Regression
For a regression problem, the training data Dn is in the form of a set of n pairs:

Dn = {(x(1) , y(1) ), . . . , (x(n) , y(n) )},

where x(i) represents an input, most typically a d-dimensional vector of real and/or dis-
crete values, and y(i) is the output to be predicted, in this case a real-number. The y values Many textbooks use xi
are sometimes called target values. and ti instead of x(i)
The goal in a regression problem is ultimately, given a new input value x(n+1) , to predict and y(i) . We find that
notation somewhat dif-
the value of y(n+1) . Regression problems are a kind of supervised learning, because the ficult to manage when
desired output y(i) is specified for each of the training examples x(i) . x(i) is itself a vector and
we need to talk about
its elements. The no-
Last Updated: 04/07/24 16:49:48 tation we are using is
standard in some other
parts of the ML litera-
ture.
MIT 6.390 Spring 2024 8

1.1.1.2 Classification
A classification problem is like regression, except that the values that y(i) can take do not
have an order. The classification problem is binary or two-class if y(i) (also known as the
class) is drawn from a set of two possible values; otherwise, it is called multi-class.

1.1.2 Unsupervised learning

Unsupervised learning doesn’t involve learning a function from inputs to outputs based on
a set of input-output pairs. Instead, one is given a data set and generally expected to find
some patterns or structure inherent in it.

1.1.2.1 Clustering
Given samples x(1) , . . . , x(n) ∈ Rd , the goal is to find a partitioning (or “clustering”) of
the samples that groups together similar samples. There are many different objectives,
depending on the definition of the similarity between samples and exactly what criterion
is to be used (e.g., minimize the average distance between elements inside a cluster and
maximize the average distance between elements across clusters). Other methods perform
a “soft” clustering, in which samples may be assigned 0.9 membership in one cluster and
0.1 in another. Clustering is sometimes used as a step in the so-called density estimation
(described below), and sometimes to find useful structure or influential features in data.

1.1.2.2 Density estimation

Given samples x(1) , . . . , x(n) ∈ Rd drawn i.i.d. from some distribution Pr(X) , the goal is to The capital X is a typ-
predict the probability Pr(x(n+1) ) of an element drawn from the same distribution. Density ical practice to empha-
size this is a so-called
estimation sometimes plays a role as a “subroutine” in the overall learning method for
random variable. Small
supervised learning, as well. letters are often used in
probability too; those
are typically reserved
1.1.2.3 Dimensionality reduction to denote the realized
values of random vari-
Given samples x(1) , . . . , x(n) ∈ RD , the problem is to re-represent them as points in a d- ables. It might help to
dimensional space, where d < D. The goal is typically to retain information in the data set concretely think of coin-
that will, e.g., allow elements of one class to be distinguished from another. tosses; there, the toss
Dimensionality reduction is a standard technique that is particularly useful for visualiz- outcome is a random
variable and it may be
ing or understanding high-dimensional data. If the goal is ultimately to perform regression realized as a "head".
or classification on the data after the dimensionality is reduced, it is usually best to artic- This paragraph actu-
ulate an objective for the overall prediction problem rather than to first do dimensionality ally talks about both a
reduction without knowing which dimensions will be important for the prediction task. random variable and a
realization of it, can you
spot that from the nota-
tion and do you feel the
1.1.3 Sequence learning difference?
In sequence learning, the goal is to learn a mapping from input sequences x0 , . . . , xn to output
sequences y1 , . . . , ym . The mapping is typically represented as a state machine, with one
function fs used to compute the next hidden internal state given the input, and another
function fo used to compute the output given the current hidden state.
It is supervised in the sense that we are told what output sequence to generate for which
input sequence, but the internal functions have to be learned by some method other than
direct supervision, because we don’t know what the hidden state sequence is.

Last Updated: 04/07/24 16:49:48

MIT 6.390 Spring 2024 9

1.1.4 Reinforcement learning

In reinforcement learning, the goal is to learn a mapping from input values (typically as-
sumed to be states of an agent or system; for now, think e.g. the velocity of a moving car)
to output values (typically we want control actions; for now, think e.g. if to accelerate or hit
the brake). However, we need to learn the mapping without a direct supervision signal to
specify which output values are best for a particular input; instead, the learning problem
is framed as an agent interacting with an environment, in the following setting:

• The agent observes the current state st . Note it’s standard prac-
tice in reinforcement
• It selects an action at . learning to use s and
a instead of x and y
• It receives a reward, rt , which typically depends on st and possibly at . to denote the machine
learning model’s in-
• The environment transitions probabilistically to a new state, st+1 , with a distribution put and output. The
that depends only on st and at . subscript t denotes the
timestep, and captures
the sequential nature of
• The agent observes the current state, st+1 . the problem.
• ...

The goal is to find a policy π, mapping s to a, (that is, states to actions) such that some
long-term sum or average of rewards r is maximized.
This setting is very different from either supervised learning or unsupervised learning,
because the agent’s action choices affect both its reward and its ability to observe the envi-
ronment. It requires careful consideration of the long-term effects of actions, as well as all
of the other issues that pertain to supervised learning.

1.1.5 Other settings

There are many other problem settings. Here are a few.
In semi-supervised learning, we have a supervised-learning training set, but there may
be an additional set of x(i) values with no known y(i) . These values can still be used
to improve learning performance (if they are drawn from Pr(X) that is the marginal of
Pr(X, Y) that governs the rest of the data set).
In active learning, it is assumed to be expensive to acquire a label y(i) (imagine asking a
human to read an x-ray image), so the learning algorithm can sequentially ask for particular
inputs x(i) to be labeled, and must carefully select queries in order to learn as effectively as
possible while minimizing the cost of labeling.
In transfer learning (also called meta-learning), there are multiple tasks, with data drawn
from different, but related, distributions. The goal is for experience with previous tasks to
apply to learning a current task in a way that requires decreased experience with the new
task.

1.2 Assumptions
The kinds of assumptions that we can make about the data source or the solution include:

• The data are independent and identically distributed (i.i.d.).

• The data are generated by a Markov chain (i.e. outputs only depend only on the
current state, with no additional memory).

• The process generating the data might be adversarial.

Last Updated: 04/07/24 16:49:48

MIT 6.390 Spring 2024 10

• The “true” model that is generating the data can be perfectly described by one of
some particular set of hypotheses.

The effect of an assumption is often to reduce the “size” or “expressiveness” of the space of
possible hypotheses and therefore reduce the amount of data required to reliably identify
an appropriate hypothesis.

1.3 Evaluation criteria

Once we have specified a problem class, we need to say what makes an output or the an-
swer to a query good, given the training data. We specify evaluation criteria at two levels:
how an individual prediction is scored, and how the overall behavior of the prediction or
estimation system is scored.
The quality of predictions from a learned model is often expressed in terms of a loss
function. A loss function L(g, a) tells you how much you will be penalized for making a
guess g when the answer is actually a. There are many possible loss functions. Here are
some frequently used examples:

• 0-1 Loss applies to predictions drawn from finite domains. If the actual values are
drawn from a contin-
0 if g = a uous distribution, the
L(g, a) = probability they would
1 otherwise ever be equal to some
predicted g is 0 (except
for some weird cases).
• Squared loss
L(g, a) = (g − a)2

• Absolute loss
L(g, a) = |g − a|

• Asymmetric loss Consider a situation in which you are trying to predict whether
someone is having a heart attack. It might be much worse to predict “no” when the
answer is really “yes”, than the other way around.

1 if g = 1 and a = 0

L(g, a) = 10 if g = 0 and a = 1


0 otherwise

Any given prediction rule will usually be evaluated based on multiple predictions and
the loss of each one. At this level, we might be interested in:

• Minimizing expected loss over all the predictions (also known as risk)

• Minimizing maximum loss: the loss of the worst prediction

• Minimizing or bounding regret: how much worse this predictor performs than the
best one drawn from some class

• Characterizing asymptotic behavior: how well the predictor will perform in the limit
of infinite training data

• Finding algorithms that are probably approximately correct: they probably generate
a hypothesis that is right most of the time.

Last Updated: 04/07/24 16:49:48

MIT 6.390 Spring 2024 11

There is a theory of rational agency that argues that you should always select the action
that minimizes the expected loss. This strategy will, for example, make you the most money
in the long run, in a gambling setting. As mentioned above, expected loss is also sometimes Of course, there are
called risk in ML literature, but that term means other things in economics or other parts other models for ac-
tion selection and it’s
of decision theory, so be careful...it’s risky to use it. We will, most of the time, concentrate
clear that people do not
on this criterion. always (or maybe even
often) select actions that
follow this rule.
1.4 Model type
Recall that the goal of a ML system is typically to estimate or generalize, based on data
provided. Below, we examine the role of model-making in machine learning.

1.4.1 Non-parametric models

In some simple cases, in response to queries, we can generate predictions directly from
the training data, without the construction of any intermediate model, or more precisely,
without the learning of any parameters.
For example, in regression or classification, we might generate an answer to a new
query by averaging answers to recent queries, as in the nearest neighbor method.

1.4.2 Parametric models

This two-step process is more typical:
1. “Fit” a model (with some a-prior chosen parameterization) to the training data
2. Use the model directly to make predictions
In the parametric models setting of regression or classification, the model will be some
hypothesis or prediction rule y = h(x; Θ) for some functional form h. The term hypothesis
has its roots in statistical learning and the scientific method, where models or hypotheses
about the world are tested against real data, and refined with more evidence, observations,
or insights. Note that the parameters themselves are only part of the assumptions that
we’re making about the world. The model itself is a hypothesis that will be refined with
more evidence.
The idea is that Θ is a set of one or more parameter values that will be determined by
fitting the model to the training data and then be held fixed during testing.
Given a new x(n+1) , we would then make the prediction h(x(n+1) ; Θ). We write f(a; b) to de-
The fitting process is often articulated as an optimization problem: Find a value of Θ scribe a function that is
that minimizes some criterion involving Θ and the data. An optimal strategy, if we knew usually applied to a sin-
gle argument a, but is a
the actual underlying distribution on our data, Pr(X, Y) would be to predict the value of member of a paramet-
y that minimizes the expected loss, which is also known as the test error. If we don’t have ric family of functions,
that actual underlying distribution, or even an estimate of it, we can take the approach with the particular func-
tion determined by pa-
of minimizing the training error: that is, finding the prediction rule h that minimizes the rameter value b.
average loss on our training data set. So, we would seek Θ that minimizes
n This notation describes
1X a so-called "joint distri-
En (h; Θ) = L(h(x(i) ; Θ), y(i) ) ,
n bution"; roughly, as the
i=1 name suggests, it cap-
where the loss function L(g, a) measures how bad it would be to make a guess of g tures how both random
variables X and Y "con-
when the actual value is a. tribute" to the chance of
We will find that minimizing training error alone is often not a good choice: it is possible something happening.
to emphasize fitting the current data too strongly and end up with a hypothesis that does
not generalize well when presented with new x values.

Last Updated: 04/07/24 16:49:48

MIT 6.390 Spring 2024 12

1.5 Model class and parameter fitting

A model class M is a set of possible models, typically parameterized by a vector of param-
eters Θ. What assumptions will we make about the form of the model? When solving a
regression problem using a prediction-rule approach, we might try to find a linear func-
tion h(x; θ, θ0 ) = θT x + θ0 that fits our data well. In this example, the parameter vector
Θ = (θ, θ0 ).
For problem types such as classification, there are huge numbers of model classes that
have been considered...we’ll spend much of this course exploring these model classes, es-
pecially neural networks models. We will almost completely restrict our attention to model
classes with a fixed, finite number of parameters. Models that relax this assumption are
called “non-parametric” models.
How do we select a model class? In some cases, the ML practitioner will have a good
idea of what an appropriate model class is, and will specify it directly. In other cases, we
may consider several model classes and choose the best based on some objective function.
In such situations, we are solving a model selection problem: model-selection is to pick a
model class M from a (usually finite) set of possible model classes, whereas model fitting is
to pick a particular model in that class, specified by (usually continuous) parameters Θ.

1.6 Algorithm
Once we have described a class of models and a way of scoring a model given data, we
have an algorithmic problem: what sequence of computational instructions should we run
in order to find a good model from our class? For example, determining the parameter
vector which minimizes the training error might be done using a familiar least-squares
minimization algorithm, when the model h is a function being fit to some data x.
Sometimes we can use software that was designed, generically, to perform optimiza-
tion. In many other cases, we use algorithms that are specialized for ML problems, or for
particular hypotheses classes. Some algorithms are not easily seen as trying to optimize a
particular criterion. In fact, a historically important method for finding linear classifiers,
the perceptron algorithm, has this character.

Last Updated: 04/07/24 16:49:48

Artificial Intelligence Grade 12 Notes-Capstone Project CBSE Skill Education-Artificial Intelligence
90% (10)
Artificial Intelligence Grade 12 Notes-Capstone Project CBSE Skill Education-Artificial Intelligence
10 pages
Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
ML Unit-1
100% (2)
ML Unit-1
12 pages
170 Machine Learning Interview Questios - Greatlearning
100% (1)
170 Machine Learning Interview Questios - Greatlearning
57 pages
"Monica's Theme" - Artificial Intelligence - John Wiliams
No ratings yet
"Monica's Theme" - Artificial Intelligence - John Wiliams
2 pages
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
No ratings yet
MIT - Machine Learning Notes From Chapter 1 - 14 PDF
101 pages
This Story Paraphrased From A Post On 9/4/12
No ratings yet
This Story Paraphrased From A Post On 9/4/12
7 pages
ML
No ratings yet
ML
22 pages
Data Science Vijay1
No ratings yet
Data Science Vijay1
88 pages
Unit 3
No ratings yet
Unit 3
10 pages
ML Unit 1-Notes
No ratings yet
ML Unit 1-Notes
21 pages
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
No ratings yet
6.867 Lecture Notes: Section 1: Introduction: 1 Intro 2 2 Problem Class 3
10 pages
Introduction. Binary Classification and Bayes Optimal Classifier
No ratings yet
Introduction. Binary Classification and Bayes Optimal Classifier
7 pages
M2_AI_Chap1_neural-network
No ratings yet
M2_AI_Chap1_neural-network
60 pages
Ass
No ratings yet
Ass
8 pages
AIML Module-03
No ratings yet
AIML Module-03
40 pages
UNIT I
No ratings yet
UNIT I
17 pages
Two Marks - AU Exam
No ratings yet
Two Marks - AU Exam
5 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
45 pages
Notes on Machine_learning
No ratings yet
Notes on Machine_learning
88 pages
AI Unit 4 QA
No ratings yet
AI Unit 4 QA
22 pages
Class Diagrams: Object-Oriented Design
No ratings yet
Class Diagrams: Object-Oriented Design
12 pages
Robotics AI& ML Sample Questions
No ratings yet
Robotics AI& ML Sample Questions
11 pages
Topic 8 Basic Classification Methods
No ratings yet
Topic 8 Basic Classification Methods
2 pages
Machine Learning CS229/STATS229: Instructors: Moses Charikar, Tengyu Ma, and Chris Re
No ratings yet
Machine Learning CS229/STATS229: Instructors: Moses Charikar, Tengyu Ma, and Chris Re
40 pages
Question Difficulty - How To Estimate Without Norming, How To Use For Automated Grading
No ratings yet
Question Difficulty - How To Estimate Without Norming, How To Use For Automated Grading
10 pages
AI - Unit 4 - Notes
No ratings yet
AI - Unit 4 - Notes
6 pages
CS607 - FinalTerm Subjectives Solved With References by Moaaz
No ratings yet
CS607 - FinalTerm Subjectives Solved With References by Moaaz
16 pages
U-1 Capstone Q&A
No ratings yet
U-1 Capstone Q&A
10 pages
Reinforcement Learning: Parallelizing Genetic Algorithms
No ratings yet
Reinforcement Learning: Parallelizing Genetic Algorithms
5 pages
Unit-5 MECH 3-2
No ratings yet
Unit-5 MECH 3-2
14 pages
Machine Learning Models
No ratings yet
Machine Learning Models
11 pages
Domingos
No ratings yet
Domingos
9 pages
Section 2 - Introduction To Machine Learning-Bje Edits - Ipynb - Colab
No ratings yet
Section 2 - Introduction To Machine Learning-Bje Edits - Ipynb - Colab
7 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Index: Unit No Topic Page No
No ratings yet
Index: Unit No Topic Page No
5 pages
Unit 01 - Linear Classifiers and Generalizations - MD
No ratings yet
Unit 01 - Linear Classifiers and Generalizations - MD
23 pages
Machine Learning Techniques Quantum
No ratings yet
Machine Learning Techniques Quantum
159 pages
Question-Answers in Machine Learning
No ratings yet
Question-Answers in Machine Learning
14 pages
mlt 2022-23
No ratings yet
mlt 2022-23
22 pages
Machine Learning - Week 1
No ratings yet
Machine Learning - Week 1
1 page
Deep Learning and Inverse Problems: Ali Mohammad-Djafari Orcid Number:0000-0003-0678-7759, Ning Chu, Li Wang, Liang Yu
No ratings yet
Deep Learning and Inverse Problems: Ali Mohammad-Djafari Orcid Number:0000-0003-0678-7759, Ning Chu, Li Wang, Liang Yu
13 pages
Toward Open Set Recognition
No ratings yet
Toward Open Set Recognition
16 pages
Module 3
No ratings yet
Module 3
41 pages
ML Chapter 1
No ratings yet
ML Chapter 1
41 pages
deSouto2008a
No ratings yet
deSouto2008a
7 pages
ML 01
No ratings yet
ML 01
24 pages
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
No ratings yet
Machine Learning Basics: An Illustrated Guide For Non-Technical Readers
16 pages
Chapter 5 Artificial Intelligence notes
No ratings yet
Chapter 5 Artificial Intelligence notes
7 pages
Artificial Intelligence_KCS701_2021-22_AKTU_Solution.pdf.pdf.crdownload
No ratings yet
Artificial Intelligence_KCS701_2021-22_AKTU_Solution.pdf.pdf.crdownload
32 pages
105 Machine Learning Paper
No ratings yet
105 Machine Learning Paper
6 pages
AI
No ratings yet
AI
52 pages
machine-learning-lab-viva (1)
No ratings yet
machine-learning-lab-viva (1)
3 pages
AI ML Unit 4 QB
No ratings yet
AI ML Unit 4 QB
38 pages
Machine and Deep Learning (Nezar a. El-Kady)
No ratings yet
Machine and Deep Learning (Nezar a. El-Kady)
353 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
BHbcgICxiV5oW09k
No ratings yet
BHbcgICxiV5oW09k
11 pages
j077 2011 KulHar WileyTutorial
No ratings yet
j077 2011 KulHar WileyTutorial
14 pages
02jul2024 StaticMedia AI UNIT 2-CAPSTONE PROJECT NOTES 6759955093464609405
No ratings yet
02jul2024 StaticMedia AI UNIT 2-CAPSTONE PROJECT NOTES 6759955093464609405
6 pages
Workshop Master Revealed
From Everand
Workshop Master Revealed
Anil Soni
No ratings yet
Machine Learning: Fundamentals and Applications
From Everand
Machine Learning: Fundamentals and Applications
Fouad Sabry
No ratings yet
01_01_changing-objects-with-functions-lesson-notes-optional-download_Mutability - Changing Objects with Functions
No ratings yet
01_01_changing-objects-with-functions-lesson-notes-optional-download_Mutability - Changing Objects with Functions
4 pages
02_01_changing-objects-with-methods-lesson-notes-optional-download_Mutability - Changing Objects with Methods
No ratings yet
02_01_changing-objects-with-methods-lesson-notes-optional-download_Mutability - Changing Objects with Methods
15 pages
02_01_polymorphism-lab-review-and-practice-lesson-notes-optional-download_Polymorphism - Labs
No ratings yet
02_01_polymorphism-lab-review-and-practice-lesson-notes-optional-download_Polymorphism - Labs
20 pages
Chapter Classification
No ratings yet
Chapter Classification
12 pages
Asynchronous Transfer Mode: CS420/520 Axel Krings Sequence 11
No ratings yet
Asynchronous Transfer Mode: CS420/520 Axel Krings Sequence 11
25 pages
Costello - Written Testimony
No ratings yet
Costello - Written Testimony
9 pages
STAR 21 Strategic Technologies
100% (1)
STAR 21 Strategic Technologies
330 pages
The Fourth Industrial Revolution Opportunities and
No ratings yet
The Fourth Industrial Revolution Opportunities and
6 pages
Gartner - AI Use Cases, Implementation and Adoption
No ratings yet
Gartner - AI Use Cases, Implementation and Adoption
3 pages
Ethics and Data Science: It Matters
No ratings yet
Ethics and Data Science: It Matters
17 pages
The Effects of Automation and Artificial Intelligence On Employment and Reskilling
No ratings yet
The Effects of Automation and Artificial Intelligence On Employment and Reskilling
5 pages
9TH AI PT3 (1)
No ratings yet
9TH AI PT3 (1)
7 pages
Business Innovation Self-Assessment with Artificial Intelligence Support for Small and Medium-Sized Enterprises
No ratings yet
Business Innovation Self-Assessment with Artificial Intelligence Support for Small and Medium-Sized Enterprises
17 pages
Gartner AWS AI
No ratings yet
Gartner AWS AI
28 pages
Artificial Neural Network - Building Blocks - Tutorialspoint
No ratings yet
Artificial Neural Network - Building Blocks - Tutorialspoint
5 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
13 pages
Artificial Intelligence (AI Basic)
No ratings yet
Artificial Intelligence (AI Basic)
13 pages
Unit 1 AI
No ratings yet
Unit 1 AI
8 pages
Synopsis of Courses
No ratings yet
Synopsis of Courses
18 pages
Mid Term
No ratings yet
Mid Term
2 pages
Dr. Jeffrey Tan
No ratings yet
Dr. Jeffrey Tan
65 pages
Digital Disruption in Automobile - Toyota
No ratings yet
Digital Disruption in Automobile - Toyota
5 pages
NN Project
No ratings yet
NN Project
9 pages
BCA 6th Semester VSKUB
No ratings yet
BCA 6th Semester VSKUB
9 pages
Human Resource Management Module Academic Year 2024 - 2025 Individual Final Project
No ratings yet
Human Resource Management Module Academic Year 2024 - 2025 Individual Final Project
18 pages
Chapter 3
No ratings yet
Chapter 3
15 pages
AAST Comprehensive Exam
No ratings yet
AAST Comprehensive Exam
48 pages
AK Infosys
No ratings yet
AK Infosys
8 pages
Perspectives On Business Management & Economics: Volume VI - July 2022
No ratings yet
Perspectives On Business Management & Economics: Volume VI - July 2022
81 pages
Artificial Intelligence in The Creative Industries: A Review
No ratings yet
Artificial Intelligence in The Creative Industries: A Review
68 pages
XCS224N Module6 Slides
No ratings yet
XCS224N Module6 Slides
99 pages
Manyika GettingAIRight 2022
No ratings yet
Manyika GettingAIRight 2022
24 pages
Routine Even 2023-24
No ratings yet
Routine Even 2023-24
83 pages
Argumentative Essay - Artificial Intelligence (AI)
No ratings yet
Argumentative Essay - Artificial Intelligence (AI)
4 pages
Sample MCQ For Mid-Term Examination 2024-25
No ratings yet
Sample MCQ For Mid-Term Examination 2024-25
11 pages
Use Cases of Gen AI in Fintech
No ratings yet
Use Cases of Gen AI in Fintech
9 pages

Chapter Introduction

Uploaded by

Chapter Introduction

Uploaded by

CHAPTER 1

1.1.1 Supervised learning

Dn = {(x(1) , y(1) ), . . . , (x(n) , y(n) )},

1.1.2 Unsupervised learning

1.1.2.2 Density estimation

Last Updated: 04/07/24 16:49:48

1.1.4 Reinforcement learning

1.1.5 Other settings

• The data are independent and identically distributed (i.i.d.).

• The process generating the data might be adversarial.

Last Updated: 04/07/24 16:49:48

1.3 Evaluation criteria

• Minimizing maximum loss: the loss of the worst prediction

Last Updated: 04/07/24 16:49:48

1.4.1 Non-parametric models

1.4.2 Parametric models

Last Updated: 04/07/24 16:49:48

1.5 Model class and parameter fitting

Last Updated: 04/07/24 16:49:48

You might also like