0% found this document useful (0 votes)
16 views

Chapter1 Introduction

This document introduces the motivation, goals and key concepts of statistics. It discusses how statistics is concerned with collecting and analyzing data to make inferences about populations based on samples. It also distinguishes between populations and samples, and how samples are used to draw conclusions about populations.

Uploaded by

정승안
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Chapter1 Introduction

This document introduces the motivation, goals and key concepts of statistics. It discusses how statistics is concerned with collecting and analyzing data to make inferences about populations based on samples. It also distinguishes between populations and samples, and how samples are used to draw conclusions about populations.

Uploaded by

정승안
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Introduction of IE241

Motivation

• Machine Learning
• Artificial Intelligence
• Optimization Engineering is all about decision makings
• Optimum Control
• Planning
• Markov Decision Process
• Influential Diagram
• Decision Tree
• Dynamic Control
• Game Theory
• Search
• Stochastic Programming
• Reinforcement Learning
• Bandit problem

What are the differences in these decision-making strategies?


What are the common aspects in these decision-making strategies?
Core Ideas

What type of decision making framework will be used?


• Single stage or multi stages
• Single decision maker or many decision makers
• Model based or model-free

“Decision makings under uncertainties”

How to model uncertainties?


• Epistemic uncertainty (systemic uncertainty):
Uncertainty arising through lack of knowledge
- Model uncertainty
- State uncertainty
• Aleatoric uncertainty (statistical uncertainty):
Uncertainty arising through an underlying stochastic system
Classified decision making methods

Learning in Multi-agents
Repeated Games Reinforcement Learning

Model-free Reinforcement
Optimization Learning

Static Game/Bayesian Game Dynamic game,


Data-driven

Stochastic game
Multi-agent MDP

Multi stages

Optimization/ MDP/POMDP
Heuristic search Dynamic Control
Problem solving

Real-world task Formal task (model) Algorithm (program)

Modeling Solving
Systems

maximize ෍ 𝑃𝑖 𝒙; 𝜃, 𝑈
𝒙
𝑖=1

Validation Verification
Are we building the right model? Does the algorithm capture all the
essential aspects of the model?
Is this solution good for the target system?

Data can help model more realistic and derive more accurate solution!
How to solve a problem using data

Problem Tools Measurement tools


How to solve a problem using data

Problem Tools Measurement tools

Data Algorithm or model Evaluation measures


Statistical Learning vs Machine Learning
Statistics in Machine Learning

data + model = prediction

• Data : observations, experience,…

• Model: a form of prior knowledge, assumptions, belief


 Functional model
 Probabilistic model

• Prediction : the new knowledge obtained by combining the data and model
 Regression
 Classification
 Clustering
Statistics in Machine Learning

Data 𝑥1 𝑥2 ⋯ 𝑥𝑛 𝑦

Training 𝑥11 𝑥12 ⋯ 𝑥1𝑛 𝑦1


data set 𝐷 𝑚 input 𝑥21 𝑥22 ⋯ 𝑥2𝑛 𝑦2
Feature 𝑚 outputs
𝐷= 𝑥𝑖 , 𝑦𝑖 ; 𝑖 = 1, … , 𝑚 vectors ⋮ ⋮ ⋮ ⋮
𝑥𝑖 = 𝑥𝑖1 , ⋯ , 𝑥𝑖𝑛 𝑥𝑚1 𝑥𝑚2 ⋯ 𝑥𝑚𝑛 𝑦𝑚
𝑦𝑖 ∈ ℝ or 𝑦𝑖 ∈ {1, … , 𝑁}

model
Learning Using training data set, a learning algorithm finds the best
Functional form 𝑓(𝑥; 𝜃) hypothesis function ℎ(𝑥) that is believed to accurately
Is usually given Algorithm
predict the output 𝑦 for a given query input 𝑥

prediction 𝜃∗
Query input Predicted output
Hypothesis
𝑥∗ 𝑓(𝑥; 𝜃 ∗ )
𝑦∗
Input feature vector If 𝑦∗ ∈ ℝ : Regression
𝑥∗ = 𝑥∗1 , 𝑥∗2 , … 𝑥∗𝑛 If 𝑦∗ ∈ {1, … , 𝑁} : Classification
CHAPTER 1
Introduction
Contents

1. Introduction
2. Characterizing a Set of Measurements: Graphical Methods
3. Characterizing a Set of Measurements: Numerical Methods
4. How Inferences Are Made
5. Theory and Reality
6. Summary
1.1 Introduction
Motivations

• It is interesting to note that billions of dollars are spent each year by U.S. industry and government for
data from experimentation, sample surveys, and other data collection procedures.

• This money is expended solely to obtain information about phenomena susceptible to measurement in
areas of business, science, or the arts.

• The implications of this statement provide keys to the nature of the very valuable contribution that the
discipline of statistics makes to research and development in all areas of society.
1.1 Introduction
What is Statistics
• Webster’s New Collegiate Dictionary defines statistics as “a branch of mathematics dealing with the
collection, analysis, interpretation, and presentation of masses of numerical data.”
• Rice (1995), commenting on experimentation and statistical applications, states that statistics is
“essentially concerned with procedures for analyzing data, especially data that in some vague sense
have a random character.”
• Freund and Walpole (1987), among others, view statistics as encompassing “the science of basing
inferences on observed data and the entire problem of making decisions in the face of uncertainty.”
• Mood, Graybill, and Boes (1974) define statistics as “the technology of the scientific method” and add
that statistics is concerned with “(1) the design of experiments and investigations, (2) statistical
inference.”
1.1 Introduction
What is Statistics

uncertainty the design of experiments


data making decisions
random character inference

Statistics is a theory of information, with inference making as its object


1.1 Introduction
Population vs. Sample
• Population
 The collection of all the elements of a universe of interest, i.e., target of interest.
• Sample
 A set of elements taken from the population under study, i.e., a subset of the population.
 Since it is impossible and/or impractical to examine the entire population in most cases, a sample
is used instead.
 If a sample is selected randomly, i.e., every element in the population has the same chance to be
chosen, then the sample is called random sample.

Samples: 𝑌1 , 𝑌2 , … , 𝑌𝑛 Population
𝜃ҧ = 𝑇(𝑌1 , 𝑌2 , … , 𝑌𝑛 )
𝑌𝑖 ~𝑃 𝜃 : population dist.
Samples: 𝑌1 , 𝑌2 , … , 𝑌𝑛 𝜃: Fixed Parameter
𝜃ҧ = 𝑇(𝑌1 , 𝑌2 , … , 𝑌𝑛 )

⋮ ⋮
𝜃ҧ = 𝑇(𝑌1 , 𝑌2 , … , 𝑌𝑛 ) Samples: 𝑌1 , 𝑌2 , … , 𝑌𝑛
1.1 Introduction
Population vs. Sample
• Examples:
 The preferences of voters for a president candidate
 The voltage at three particular points in the guidance system for a spacecraft:
 Presumably, this population would possess characteristics similar to the three systems in the sample
(conceptual population)
 Measurements on patients in a medical experiment represent a sample from a conceptual population
consisting of all patients similarly afflicted today, as well as those who will be afflicted in the near future
(conceptual population)
1.1 Introduction
Goal of Statistics
• The study of statistics is concerned with
 The design of experiments or sample surveys to obtain a specified quantity of information at
minimum cost
 The optimum use of this information in making an inference about a population.
 An inference about a population based on information contained in a sample from that
population and to provide an associated measure of goodness for the inference.
1.2 Characterizing a Set of Measurements: Graphical Methods
Descriptive Statistics
• We characterize a person by using height, weight, color of hair and eyes, and other descriptive
measures of the person’s physiognomy.

• Characterizing a population that consists of a set of measurements is important.


 The characterizations must be meaningful so that knowledge of the descriptive measures
enables us to clearly visualize the set of numbers.
 The characterizations possess practical significance so that knowledge of the descriptive
measures for a population can be used to solve a practical, non-statistical problem.

• There are two methods to characterize the population:


 Graphical Methods (visualization)
 Numerical Methods (inference)
1.2 Characterizing a Set of Measurements: Graphical Methods
Relative Frequency Histogram

• An individual population (or any set of measurements) can be characterized by a relative


frequency distribution, which can be represented by a relative frequency histogram.
 A graph is constructed by subdividing the axis of measurement into intervals of equal
width.
 A rectangle is constructed over each interval, such that the height of the rectangle is
proportional to the fraction of the total number of measurements falling in each cell.

Characterize the ten measurements:


2.1, 2.4, 2.2, 2.3, 2.7, 2.5, 2.4, 2.6, 2.6, and 2.9
1.2 Characterizing a Set of Measurements: Graphical Methods
How to Construct a Relative Frequency Histogram

• Points of subdivision of the axis of measurement should be chosen so that it is impossible for a
measurement to fall on a point of division.

• The second guideline involves the width of each interval and consequently, the minimum
number of intervals needed to describe the data.
 Generally speaking, we wish to obtain information on the form of the distribution of the
data.
1.2 Characterizing a Set of Measurements: Graphical Methods
Relative Frequency Histogram
1.2 Characterizing a Set of Measurements: Graphical Methods
Meaning and Limitation of Histogram
• The relative frequency distribution provides meaningful summaries of the information contained in
a set of data.
 This is primarily due to the probabilistic interpretation that can be derived from the relative
frequency histogram.
 If a measurement is selected at random from the original data set, the probability that it will
fall in a given interval is proportional to the area under the histogram lying over that interval.

• The relative frequency histograms presented provide useful information regarding the distribution
of sets of measurement, but histograms are usually not adequate for the purpose of making
inferences.
1.2 Characterizing a Set of Measurements: Graphical Methods
Other Examples
1854 Broad Street cholera outbreak

https://en.wikipedia.org/wiki/1854_Broad_Street_cholera_outbreak
1.2 Characterizing a Set of Measurements: Graphical Methods
Other Examples

https://www.gapminder.org/answers/how-does-income-relate-to-life-expectancy/
1.2 Characterizing a Set of Measurements: Graphical Methods
Other Examples
1.3 Characterizing a Set of Measurements: Numerical Methods
Numerical Descriptive Measure

• Numerical descriptive measure


 Numbers that have meaningful interpretations and that can be used to describe the
frequency distribution for any set of measurements

Numerical Descriptive Measure

Central Tendency Quartiles Variation Shape

Arithmetic Mean Range Skewness

Median Inter-quartile Range Kurtosis

Mode Variance

Geometric Mean Standard Deviation

Coefficient of variation
1.3 Characterizing a Set of Measurements: Numerical Methods
Numerical Descriptive Measure

• Numerical descriptive measure


 Numbers that have meaningful interpretations and that can be used to describe the
frequency distribution for any set of measurements
1.3 Characterizing a Set of Measurements: Numerical Methods
Numerical Descriptive Measure

• Numerical descriptive measure


 Numbers that have meaningful interpretations and that can be used to describe the
frequency distribution for any set of measurements

Numerical Descriptive Measure

Central Tendency Quartiles Variation Shape

Arithmetic Mean Range Skewness

Median Inter-quartile Range Kurtosis

Mode Variance

Geometric Mean Standard Deviation

Coefficient of variation
1.3 Characterizing a Set of Measurements: Numerical Methods
Measure of Central Tendency
DEFINITION 1.1
The mean of a sample of 𝑛 measured responses 𝑦1 , 𝑦2 , … , 𝑦𝑛 is given by
𝑛
1
𝑦ത = ෍ 𝑦𝑖
𝑛
𝑖=1

The corresponding population mean is denoted 𝜇


 We usually cannot measure the value of the population mean, 𝜇; rather, 𝜇 is an unknown constant
that we may want to estimate using sample information
 The mean of a set of measurements only locates the center of the distribution of data; by itself, it
does not provide an adequate description of a set of measurements.
 To describe data adequately, we must also define measures of data variability
1.3 Characterizing a Set of Measurements: Numerical Methods
Measure of Central Dispersion (Variation)
DEFINITION 1.2

The variance of a sample of 𝑛 measurements 𝑦1 , 𝑦2 , … , 𝑦𝑛 is the sum of the square of the differences
between the measurements and their mean, divided by 𝑛 − 1. Symbolically, the sample variance is
𝑛
2
1 2
𝑠 = ෍ 𝑦𝑖 − 𝑦ത
𝑛−1
𝑖=1

The corresponding population variance is denoted by the symbol 𝜎 2

 Notice that we divided by 𝑛 − 1 instead of by 𝑛 in our definition of 𝑠 2 .


 The theoretical reason for this choice of divisor is provided later, where we will show that 𝑠 2
defined this way provides a “better” estimator for the true population variance, 𝜎 2 .
 The larger the variance of a set of measurements, the greater will be the amount of variation
within the set.
1.3 Characterizing a Set of Measurements: Numerical Methods
Measure of Central Dispersion (Variation)
DEFINITION 1.3

The standard deviation of a sample of measurements is the positive square root of the variance; that
is,
𝑠= 𝑠2
The corresponding population standard deviation is denoted by 𝜎 = 𝜎 2 .

 Although it is closely related to the variance, the standard deviation can be used to give a fairly
accurate picture of data variation for a single set of measurements.
1.3 Characterizing a Set of Measurements: Numerical Methods
Mound-Shaped Data Distribution
• Many distributions can be approximated by a bell-shaped frequency distribution known as a normal
curve.

• Data possessing mound-shaped distributions have definite characteristics of variation, as expressed in


the following statement.
 𝜇 ± 𝜎 contains approximately 68% of the measurements.
 𝜇 ± 2σ contains approximately 95% of the measurements.
𝜇 ± 3𝜎 contains almost all of the measurements.
1.4 How Inferences Are Made
Inference Process
• The mechanism instrumental in making inferences can be well illustrated by analyzing our own
intuitive inference-making procedures.
• Suppose that two candidates are running for a public office in our community and that we wish
to determine whether our candidate, Jones, is favored to win.
 Phone calls to check the probability of winning of Jones
 Check if the relative frequency is > 0.5
 Results: sample size 20, all favor Jones
• What is our conclusion?
• Reasoning
 It is not impossible to draw 20 out of 20 favoring Jones when less than 50% of the
electorate favor him, but it is highly improbable.
 We conclude that he would win : the relative frequency is > 0.5

• This example illustrates the potent role played by probability in making inferences.
1.4 How Inferences Are Made
Probability vs. Statistics
• Probabilists assume that they know the structure of the population of interest and use the theory of
probability to compute the probability of obtaining a particular sample.
• Statisticians use probability to make the trip in reverse—from the sample to the population
• Basic to inference making is the problem of calculating the probability of an observed sample.
 As a result, probability is the mechanism used in making statistical inferences.
 This is why we learn the probability first!!!!

Probability theory compute the probability of the sample measurements


Probability Model Sample Data
𝜃 (parameters) : 𝑦 = (𝑦1 , … , 𝑦𝑛 ) :
characteristics of a model Measurements
Statistics infer the causes (i.e., parameters) that generated the observed data (samples)

𝜃 : Probability of having a head for each coin tossing ex. ∶ (Head, Head, Tail, … )
1.6 Theory and Reality
Why Theory is Necessary?

• A theory is a model or an approximation to reality.

• It’s built on simplifying assumptions.

• But it aims to provide good and useful information about reality.


Road Map on IE241 I
Population Experiment Design Statistics 𝑇 : Sampling Distribution 𝑝 𝜃෠ : (Ch. 7)
(Ch.12 not covered in IE241) Function of Random variables (Ch. 6)

Samples: 𝑦1 , 𝑦2 , … , 𝑦𝑛 𝜃෠ (1) = 𝑇(𝑦1 , 𝑦2 , … , 𝑦𝑛 ) 𝑝 𝜃෠

Samples: 𝑦1 , 𝑦2 , … , 𝑦𝑛 𝜃෠ (2) = 𝑇(𝑦1 , 𝑦2 , … , 𝑦𝑛 )

⋮ ⋮
𝜃 Samples: 𝑦1 , 𝑦2 , … , 𝑦𝑛 𝜃෠ 𝑚
= 𝑇(𝑦1 , 𝑦2 , … , 𝑦𝑛 ) E 𝜃෠ 𝜃෠
Empirical distribution
𝑌𝑖 ~𝑝 𝑌; 𝜃 : population distribution (Ch.3,4,5)

 Estimation: E 𝜃መ = 𝜃? (Ch.8 & Ch.9)


Parameter Inference:
(with goodness measures)  Hypothesis Testing: 𝜃መ = 𝜃? or 𝜃መ > 𝜃? (Ch.10)

• Probability Theory (Ch.2 ~ Ch.6) plays an important role in inference by computing the probability of the occurrence of the sample and
connects the computed probability to the most probable target parameter.
• Estimator 𝜃෠ = 𝑇(𝑌1 , 𝑌2 , … , 𝑌𝑛 ) for a target parameter 𝜃 is a function of the random variables observed in a sample and therefore itself is a
random variable.
• Sampling distribution 𝑝 𝜃෠ can be used to evaluate the goodness of the estimator (confidence interval) and the errors (i.e., 𝛼 and 𝛽 errors) of
hypothesis testing.
Road Map on IE241 I
Population Experiment Design Statistics 𝑇 : Sampling Distribution 𝑝 𝜃෠ :
(Ch.12 not covered in IE241) Function of Random variables (Ch. 6) Central Limit Theorem (Ch. 7)

𝑝 𝜃෠
a random sample of size 𝑛
𝑌1 , 𝑌2 , … , 𝑌𝑛 𝜃መ = 𝑇(𝑌1 , 𝑌2 , … , 𝑌𝑛 )

𝜃 E 𝜃෠ 𝜃෠
a theoretical model for the relative frequency
𝑌𝑖 ~𝑝 𝑌; 𝜃 : population distribution (Ch.3,4,5) histogram of the possible values of the statistic

 Estimation: E 𝜃መ = 𝜃? (Ch.8 & Ch.9)


Parameter Inference:
(with goodness measures)  Hypothesis Testing: 𝜃መ = 𝜃? or 𝜃መ > 𝜃? (Ch.10)

• Probability Theory (Ch.2 ~ Ch.6) plays an important role in inference by computing the probability of the occurrence of the sample and
connects the computed probability to the most probable target parameter.
• Estimator 𝜃෠ = 𝑇(𝑌1 , 𝑌2 , … , 𝑌𝑛 ) for a target parameter 𝜃 is a function of the random variables observed in a sample and therefore itself is a
random variable.
• Sampling distribution 𝑝 𝜃෠ can be used to evaluate the goodness of the estimator (confidence interval) and the errors (i.e., 𝛼 and 𝛽 errors) of
hypothesis testing.

You might also like