Chapter1 Introduction
Chapter1 Introduction
Motivation
• Machine Learning
• Artificial Intelligence
• Optimization Engineering is all about decision makings
• Optimum Control
• Planning
• Markov Decision Process
• Influential Diagram
• Decision Tree
• Dynamic Control
• Game Theory
• Search
• Stochastic Programming
• Reinforcement Learning
• Bandit problem
…
Learning in Multi-agents
Repeated Games Reinforcement Learning
Model-free Reinforcement
Optimization Learning
Stochastic game
Multi-agent MDP
Multi stages
Optimization/ MDP/POMDP
Heuristic search Dynamic Control
Problem solving
Modeling Solving
Systems
maximize 𝑃𝑖 𝒙; 𝜃, 𝑈
𝒙
𝑖=1
Validation Verification
Are we building the right model? Does the algorithm capture all the
essential aspects of the model?
Is this solution good for the target system?
Data can help model more realistic and derive more accurate solution!
How to solve a problem using data
• Prediction : the new knowledge obtained by combining the data and model
Regression
Classification
Clustering
Statistics in Machine Learning
Data 𝑥1 𝑥2 ⋯ 𝑥𝑛 𝑦
model
Learning Using training data set, a learning algorithm finds the best
Functional form 𝑓(𝑥; 𝜃) hypothesis function ℎ(𝑥) that is believed to accurately
Is usually given Algorithm
predict the output 𝑦 for a given query input 𝑥
prediction 𝜃∗
Query input Predicted output
Hypothesis
𝑥∗ 𝑓(𝑥; 𝜃 ∗ )
𝑦∗
Input feature vector If 𝑦∗ ∈ ℝ : Regression
𝑥∗ = 𝑥∗1 , 𝑥∗2 , … 𝑥∗𝑛 If 𝑦∗ ∈ {1, … , 𝑁} : Classification
CHAPTER 1
Introduction
Contents
1. Introduction
2. Characterizing a Set of Measurements: Graphical Methods
3. Characterizing a Set of Measurements: Numerical Methods
4. How Inferences Are Made
5. Theory and Reality
6. Summary
1.1 Introduction
Motivations
• It is interesting to note that billions of dollars are spent each year by U.S. industry and government for
data from experimentation, sample surveys, and other data collection procedures.
• This money is expended solely to obtain information about phenomena susceptible to measurement in
areas of business, science, or the arts.
• The implications of this statement provide keys to the nature of the very valuable contribution that the
discipline of statistics makes to research and development in all areas of society.
1.1 Introduction
What is Statistics
• Webster’s New Collegiate Dictionary defines statistics as “a branch of mathematics dealing with the
collection, analysis, interpretation, and presentation of masses of numerical data.”
• Rice (1995), commenting on experimentation and statistical applications, states that statistics is
“essentially concerned with procedures for analyzing data, especially data that in some vague sense
have a random character.”
• Freund and Walpole (1987), among others, view statistics as encompassing “the science of basing
inferences on observed data and the entire problem of making decisions in the face of uncertainty.”
• Mood, Graybill, and Boes (1974) define statistics as “the technology of the scientific method” and add
that statistics is concerned with “(1) the design of experiments and investigations, (2) statistical
inference.”
1.1 Introduction
What is Statistics
Samples: 𝑌1 , 𝑌2 , … , 𝑌𝑛 Population
𝜃ҧ = 𝑇(𝑌1 , 𝑌2 , … , 𝑌𝑛 )
𝑌𝑖 ~𝑃 𝜃 : population dist.
Samples: 𝑌1 , 𝑌2 , … , 𝑌𝑛 𝜃: Fixed Parameter
𝜃ҧ = 𝑇(𝑌1 , 𝑌2 , … , 𝑌𝑛 )
⋮ ⋮
𝜃ҧ = 𝑇(𝑌1 , 𝑌2 , … , 𝑌𝑛 ) Samples: 𝑌1 , 𝑌2 , … , 𝑌𝑛
1.1 Introduction
Population vs. Sample
• Examples:
The preferences of voters for a president candidate
The voltage at three particular points in the guidance system for a spacecraft:
Presumably, this population would possess characteristics similar to the three systems in the sample
(conceptual population)
Measurements on patients in a medical experiment represent a sample from a conceptual population
consisting of all patients similarly afflicted today, as well as those who will be afflicted in the near future
(conceptual population)
1.1 Introduction
Goal of Statistics
• The study of statistics is concerned with
The design of experiments or sample surveys to obtain a specified quantity of information at
minimum cost
The optimum use of this information in making an inference about a population.
An inference about a population based on information contained in a sample from that
population and to provide an associated measure of goodness for the inference.
1.2 Characterizing a Set of Measurements: Graphical Methods
Descriptive Statistics
• We characterize a person by using height, weight, color of hair and eyes, and other descriptive
measures of the person’s physiognomy.
• Points of subdivision of the axis of measurement should be chosen so that it is impossible for a
measurement to fall on a point of division.
• The second guideline involves the width of each interval and consequently, the minimum
number of intervals needed to describe the data.
Generally speaking, we wish to obtain information on the form of the distribution of the
data.
1.2 Characterizing a Set of Measurements: Graphical Methods
Relative Frequency Histogram
1.2 Characterizing a Set of Measurements: Graphical Methods
Meaning and Limitation of Histogram
• The relative frequency distribution provides meaningful summaries of the information contained in
a set of data.
This is primarily due to the probabilistic interpretation that can be derived from the relative
frequency histogram.
If a measurement is selected at random from the original data set, the probability that it will
fall in a given interval is proportional to the area under the histogram lying over that interval.
• The relative frequency histograms presented provide useful information regarding the distribution
of sets of measurement, but histograms are usually not adequate for the purpose of making
inferences.
1.2 Characterizing a Set of Measurements: Graphical Methods
Other Examples
1854 Broad Street cholera outbreak
https://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak
1.2 Characterizing a Set of Measurements: Graphical Methods
Other Examples
https://www.gapminder.org/answers/how-does-income-relate-to-life-expectancy/
1.2 Characterizing a Set of Measurements: Graphical Methods
Other Examples
1.3 Characterizing a Set of Measurements: Numerical Methods
Numerical Descriptive Measure
Mode Variance
Coefficient of variation
1.3 Characterizing a Set of Measurements: Numerical Methods
Numerical Descriptive Measure
Mode Variance
Coefficient of variation
1.3 Characterizing a Set of Measurements: Numerical Methods
Measure of Central Tendency
DEFINITION 1.1
The mean of a sample of 𝑛 measured responses 𝑦1 , 𝑦2 , … , 𝑦𝑛 is given by
𝑛
1
𝑦ത = 𝑦𝑖
𝑛
𝑖=1
The variance of a sample of 𝑛 measurements 𝑦1 , 𝑦2 , … , 𝑦𝑛 is the sum of the square of the differences
between the measurements and their mean, divided by 𝑛 − 1. Symbolically, the sample variance is
𝑛
2
1 2
𝑠 = 𝑦𝑖 − 𝑦ത
𝑛−1
𝑖=1
The standard deviation of a sample of measurements is the positive square root of the variance; that
is,
𝑠= 𝑠2
The corresponding population standard deviation is denoted by 𝜎 = 𝜎 2 .
Although it is closely related to the variance, the standard deviation can be used to give a fairly
accurate picture of data variation for a single set of measurements.
1.3 Characterizing a Set of Measurements: Numerical Methods
Mound-Shaped Data Distribution
• Many distributions can be approximated by a bell-shaped frequency distribution known as a normal
curve.
• This example illustrates the potent role played by probability in making inferences.
1.4 How Inferences Are Made
Probability vs. Statistics
• Probabilists assume that they know the structure of the population of interest and use the theory of
probability to compute the probability of obtaining a particular sample.
• Statisticians use probability to make the trip in reverse—from the sample to the population
• Basic to inference making is the problem of calculating the probability of an observed sample.
As a result, probability is the mechanism used in making statistical inferences.
This is why we learn the probability first!!!!
𝜃 : Probability of having a head for each coin tossing ex. ∶ (Head, Head, Tail, … )
1.6 Theory and Reality
Why Theory is Necessary?
⋮ ⋮
𝜃 Samples: 𝑦1 , 𝑦2 , … , 𝑦𝑛 𝜃 𝑚
= 𝑇(𝑦1 , 𝑦2 , … , 𝑦𝑛 ) E 𝜃 𝜃
Empirical distribution
𝑌𝑖 ~𝑝 𝑌; 𝜃 : population distribution (Ch.3,4,5)
• Probability Theory (Ch.2 ~ Ch.6) plays an important role in inference by computing the probability of the occurrence of the sample and
connects the computed probability to the most probable target parameter.
• Estimator 𝜃 = 𝑇(𝑌1 , 𝑌2 , … , 𝑌𝑛 ) for a target parameter 𝜃 is a function of the random variables observed in a sample and therefore itself is a
random variable.
• Sampling distribution 𝑝 𝜃 can be used to evaluate the goodness of the estimator (confidence interval) and the errors (i.e., 𝛼 and 𝛽 errors) of
hypothesis testing.
Road Map on IE241 I
Population Experiment Design Statistics 𝑇 : Sampling Distribution 𝑝 𝜃 :
(Ch.12 not covered in IE241) Function of Random variables (Ch. 6) Central Limit Theorem (Ch. 7)
𝑝 𝜃
a random sample of size 𝑛
𝑌1 , 𝑌2 , … , 𝑌𝑛 𝜃መ = 𝑇(𝑌1 , 𝑌2 , … , 𝑌𝑛 )
𝜃 E 𝜃 𝜃
a theoretical model for the relative frequency
𝑌𝑖 ~𝑝 𝑌; 𝜃 : population distribution (Ch.3,4,5) histogram of the possible values of the statistic
• Probability Theory (Ch.2 ~ Ch.6) plays an important role in inference by computing the probability of the occurrence of the sample and
connects the computed probability to the most probable target parameter.
• Estimator 𝜃 = 𝑇(𝑌1 , 𝑌2 , … , 𝑌𝑛 ) for a target parameter 𝜃 is a function of the random variables observed in a sample and therefore itself is a
random variable.
• Sampling distribution 𝑝 𝜃 can be used to evaluate the goodness of the estimator (confidence interval) and the errors (i.e., 𝛼 and 𝛽 errors) of
hypothesis testing.