0% found this document useful (0 votes)
39 views

MetNum1 2023 1 Week 9 With Worked Examples

This document provides an overview of topics that will be covered in the second part of a statistics course. It discusses the evaluation methods, topics that will be covered in each week, required course materials, and provides an outline of the first week's lectures on introduction to statistics and probability.

Uploaded by

donbradman334
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views

MetNum1 2023 1 Week 9 With Worked Examples

This document provides an overview of topics that will be covered in the second part of a statistics course. It discusses the evaluation methods, topics that will be covered in each week, required course materials, and provides an outline of the first week's lectures on introduction to statistics and probability.

Uploaded by

donbradman334
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 60

CE23216 MetNum 1

Semester 2023/1

Week 9

Rehan Hussain, Ph.D.

1
Course overview

This course is divided into two parts:

— Part 1 (50%) consists of MATLAB programming, weeks 1–8


(including UTS), taught by Dr. Gede Adhyaksa.

— Part 2 (50%) consists of statistics and probability, weeks 9–16


(including UAS), taught by Dr. Rehan Hussain.

2
Evaluation for Part 2

1. Quiz (10%)
2. Homework assignment (10%)
3. Final exam (30%)

— Note: There will be at least 1 quiz and 3 homework assignments. However, the
schedule and topics depend on the flow of the class.

— At least one week’s notice will be given before the quiz.

3
Topics covered in Part 2

Week Title
Week 9 Introduction to statistics and probability
Week 10 Descriptive statistics and sampling techniques
Week 11 Probability theory
Week 12 Discrete and continuous probability distributions
Week 13 Variance, co-variance, and correlation
Week 14 Statistical inference methods
Week 15 Statistical analysis using Octave/MATLAB
Week 16 UAS

4
Course Materials

— All materials used in the class (e.g. lecture slides, worked examples) will be
uploaded to the e-learning system.

— Note that the content of this course is very introductory in nature and can be
found in many books or sources. Some suggested references include:

1. Johnson, R. A. & Bhattacharyya, G. K.,


“Statistics: Principles and Methods”, Wiley
Global Education, 6th Edition, 2014.

2. Montgomery, D. C. & Runger, G. C. “Applied


Statistics and Probability for Engineers”,
John Wiley & Sons, 6th Edition, 2014.

5
Outline

What's in this week’s lectures?

1. What is statistics?: basic definitions and examples, dot


diagrams

2. Simple probability: basic definitions, tree diagrams, Venn


diagrams

3. Interpolation: basic definitions, linear interpolation

6
Topic 1:
What is statistics?

7
Quote of the week

8
Definition of statistics

— Formally, statistics deals with the collection, presentation,


analysis, and interpretation of data.

— As engineers, it can help us to:

üMake decisions

ü Solve problems

ü Design products and processes

9
What is data?

— Data refers to facts or items of information, often numeric,


which can be:
üMeasured through observation (e.g. lab experiments)
üCollected (e.g. into a dataset)
üReported (e.g. in a thesis, article, or presentation)
üAnalyzed (e.g. through sorting or calculation)
üVisualized (e.g. in a table or graph)

10
Types of data

11
Interval vs ratio scales

— An interval scale is one where there is order and the difference between two
values is meaningful (e.g. time, temperature in Celsius).

— A ratio scale has all the properties of an interval scale, but the minimum value
(starting point) is 0. Hence, the ratio between two quantities is meaningful (e.g.
weight, length, temperature in Kelvin).

12
Statistics in daily life

— Statistics are everywhere in our daily lives! For example, think


about the recent Covid-19 pandemic.

— Every day, we were presented with numbers such as the number


of cases or number of deaths – these are examples of statistical
data.

— The data were often presented to us in the form of graphs or


tables, to make them easier to understand and compare.

13
Statistics in daily life

14
Sampling

— In statistics, we are often interested in studying large groups of people or


objects (e.g. all the citizens of a country) – this is known as a population.

— It would be very difficult and time consuming to study the whole


population, so we usually just take a small subset of the population to
study, known as a sample.

— For this method to produce reliable results, the sample must be


representative (share important characteristics with the population). Later
in the course, we will learn about some different sampling methods.

15
Example: population and sample

Consider the following statement from a news report:


“According to the New York Times, in a telephone survey, 54% of adult Americans said they
would vote for Candidate B as president. The results are based on a random sampling of 1500
adults from all over the country.”

Population - all 150+ million adult Sample - 1500 randomly-selected


Americans adults

Clearly, it would not be feasible to telephone every single adult, so the survey
makers generalize about the whole population based on the sample!
16
Branches of statistics

— Broadly speaking, there are two main types or branches of statistics:


descriptive and inferential.

Descriptive statistics Inferential statistics


uses information from a
deals with collecting, sample to draw
organizing, analyzing, conclusions about the
and interpreting data population from which the
sample was taken.

— In this course, we will learn about both types of statistics.

17
Variability

• Statistical techniques are useful for describing and


understanding the concept of variability.
• Variability means that successive observations of a system or
phenomenon do not produce the same result.
• Statistics gives us a framework for describing variability and for
learning about potential sources of variability.
• For chemical engineers, one example where controlling variability
is important in terms of ensuring the correct specification of
products from a chemical plant or process.
18
Example: variability in a process

19
Dot diagrams
• A dot diagram is a useful plot for displaying small sets of data (up to 20
observations).
• This plot allows us to easily see two features of the data: the location (or the
central value), and the scatter (or how spread apart the measurements are).

20
Worked Example 1: dot diagrams
A water treatment plant is designed to
ensure that the chlorine (Cl−) concentration
of effluent water varies between 12 and 14
ppm. For monitoring purposes, 8 effluent
samples were collected at different times
over a period of 24 hours and their Cl−
concentrations (in ppm) were measured,
as follows:

12.6, 12.9, 13.4, 12.3, 13.6, 13.5, 12.6,


13.1

Use a dot diagram to find out if the plant


achieved its target.

21
Worked Example 1: dot diagrams
Solution:

Cl− concentration

Using this diagram, we can easily see that the water treatment plant achieved its
target.

22
Class Exercise 1: variability

Here is an experiment in variability you can try yourself:


1. Use a virtual stopwatch on your smartphone (or a real
stopwatch, if you have one!).
2. Press ‘start’ and then try to stop the stopwatch at exactly
2.00 seconds.
3. Record the value that you stopped the stopwatch at,
rounding to the nearest 0.1 seconds.
4. Repeat the experiment 10 times and draw a dot diagram
of the results. Use the diagram to see how close your
measurements are to the desired time. Are more of the
data above or below the target?

23
Topic 2:
Simple probability

24
What is probability?

— Probability refers branch of mathematics concerning numerical descriptions


of how likely something is to happen.

— For example, if you roll a die (plural: dice), how likely are you to get a 6? If
you toss a coin, how likely are you to get a head?

25
Quantifying probability

— The probability of an event occurring can be described by a number between 0


and 1 (or a percentage), and is normally denoted as P.
Ø P = 0 (or 0%) means the event has no chance of happening.
Ø P = ½ (or 50%) means the event is just as likely to happen as not to happen.
Ø P = 1 (or 100%) means the event is certain to happen.

26
Random experiment

— An experiment is the process of observing a phenomenon that has variations in


its outcome. An experiment that can repeated in the same manner every time
and has an outcome which cannot be predicted is a random experiment.

— Two commonly-cited examples of random experiments are tossing a ‘fair coin’


and rolling an ‘unbiased die’ (note the use of the words fair and unbiased to
denote that all the outcomes are equally likely).

— Another example would be choosing an object from a set of objects that look
identical, but all have different properties (e.g., a deck of playing cards face
down).

27
Simple probability

— In general, for a random event, the probability of the event occurring is given
by
number of favorable outcomes
𝑃 event =
number of possible outcomes

— For example, when throwing a die, this means


This is the
1 number of sides
𝑃 6 = which have a six
This refers to 6
the probability of
This is the total
throwing a six number of sides

28
Worked Example 1: simple probability

If I toss two coins in succession, what is the probability of getting one head (H)
and one tail (T)?

Solution:
The four possible outcomes for tossing two coins are
HT, HH, TT, TH

Of these, two outcomes lead to one H and one T. Hence,

2 1
𝑃 HT or TH = =
4 2

29
Visualizing probability

— In probability, it is often
helpful to visualize the
problem, especially for
more complex situations.

— Two commonly used


visualization methods
are tree diagrams and
Venn diagrams.

30
Tree diagrams

— A tree diagram consists of ‘branches’ that are labelled with probabilities. For
example, consider tossing a single coin:

— Note that the sum of the probabilities for all the branches leaving a single point
(or node) is 1.
31
Worked Example 2: tree diagrams

In a jar, there are 11 balls (8 blue and


3 red). Andy takes out a ball at
random from the jar, then puts its
back. He then repeats the process.
What is the probability that Andy picks
out a red ball both times?

32
Worked Example 2: tree diagrams

Solution: 2nd draw


1st draw 3!
11
R2 R1 = a red ball was
3! picked in draw 1
11 R1
8!
11 B2 B1 = a blue ball was
3! R2 picked in draw 1
11
8!
11 B1 …and so on
8!
11 B2

To find the probability over multiple distinct outcomes (draws), multiply the values
on each branch. Hence, ! ! #
P(R1 and R2) = =
"" "" "$"

33
Venn diagrams

— A Venn diagram is a picture that represents the outcomes of an


experiment. It generally consists of a box or rectangle that represents the
set of total outcomes, S, together with circles (or some other shape) that
represent specific events. For example:

S Here, S
A represents a
single coin toss,
H and the event A
T
represents
getting a head.

34
Venn diagrams
Consider again the set of total outcomes for tossing two coins. We can write this as
S = { HT, HH, TT, TH }
If we define two events A and B, where A is getting tails on the first coin flip and B is getting tails on
the second coin flip. Hence,
A = { TT, TH } and B = { TT, HT }
On a Venn diagram, we can illustrate this as:

Notice that the


outcome TT is a
part of both A and
B – hence the
overlap between
the circles!

35
Worked Example 3: Venn diagrams

36
Worked Example 3: Venn diagrams

Solution:

37
Class exercise 2: probability

A survey found that 60% of Jakartans like to eat rice-based dishes for breakfast
(such as nasi kuning), 25% like to eat noodle-based dishes (such as mie goreng),
and 15% like to eat bubur ayam. If a person selects a rice-based breakfast, 60%
like to drink teh and the rest like to drink kopi. If they select mie goreng, 70% like to
drink kopi and the remainder like to drink teh. If they select bubur ayam, they are
equally likely to drink kopi or teh.
(a) Draw a probability tree diagram to show the possible outcomes.
(b) Based on this, calculate the probability that a randomly-selected Jakartan likes
to:
i. Eat rice-based dishes and drink teh for breakfast.
ii. Have kopi with their breakfast.
38
Topic 3:
Interpolation

39
Relationships between variables

— In science and engineering, we often desire to know the


mathematical relationship between two or more variables. In other
words, how changing an independent variable, x (e.g. temperature),
affects a dependent variable, y (e.g. rate of chemical reaction).

— However, it we try to measure this relationship experimentally, we are


limited by time and money, and can only collect a finite number of
data points.

40
What is interpolation?

— Interpolation refers to a method for estimating the values of unknown


data points based on the dataset that we have.

— Note that interpolation only allows us to find intermediate values (i.e.


those that are within the range of the known values). If we wish to
find values that are outside the range of known data, that is known
as extrapolation, which is a separate topic (we will look at this more
later in the course).

41
Types of interpolation

— To illustrate some of the different


types of interpolation, consider a
hypothetical scenario where an
engineer measures the
response of a system, f(x), to
different values of the input
variable x, collecting the
following dataset:

42
Types of interpolation

— Nearest-neighbor or piecewise constant interpolation: each interval between


data points is assigned the same value as the nearest data point.

— Pros: quick and easy, can be useful for


systems with large numbers of
variables and limited time to solve

— Cons: not very accurate

43
Types of interpolation

— Linear interpolation (lerp): data points are connected by straight lines, hence a
point (x,y) between two known points (x1,y1) and (x2,y2) is given by:

— Pros: quick and easy, better than nearest-


neighbor

— Cons: still may not be very accurate (depending


on the function), and the resulting function is not
differentiable at the different data points.

44
Types of interpolation

— Polynomial interpolation: generalization of lerp using a polynomial function (i.e.,


quadratic, cubic, or higher order polynomial). Using a polynomial of high enough
order ensures a smooth curve with no discontinuities.

— Pros: overcomes most of the problems of


lerp.

— Cons: can be computationally expensive


(takes a long time to calculate). If the
order of the polynomial is too high, this
can lead to oscillatory artifacts.

45
Types of interpolation

— Spline interpolation: instead of using a single higher-order polynomial function,


the spline algorithm joins together lower-order polynomials in such a way as to
obtain a smooth curve with no discontinuities.

— Pros: combines advantages of lerp and


polynomial methods.

— Cons: if the basis equations are not


properly set up, it may be computationally
very inefficient to find the solution.

46
Polynomial vs spline interpolation
— You might notice that the polynomial and spline interpolation curves in the previous two
slides look quite similar. However, they follow very different equations!

— The polynomial curve is of order 6:

— The spline curve is cubic, given by:

47
Multivariate interpolation
— All the types of interpolation described in the previous slides can also be applied to
systems with more than two variables. This is known as multivariate interpolation.

Comparison of some 1- and


2-dimensional
interpolations. The black
dots correspond to
interpolated points.

48
Accuracy of linear interpolation

— Linear interpolation is widely used due to its simplicity.

— However, the greater the curvature of the function we are trying to


estimate, the less accurate the interpolated value will be. We can
define the interpolation error as:

𝜀 = interpolated value − true value

— Let’s illustrate this using a known function.

49
Accuracy of linear interpolation

— Consider the function


𝑓 𝑥 = cos 𝑥 $ − 𝑥 + 3

shown on the right.

— We can see that the equation


cos 𝑥 $ − 𝑥 + 3 = 0

has a solution at 𝑥 = 2.83.

50
Accuracy of linear interpolation

— We can use linear interpolation on


the interval [2.5,3] to estimate this
solution.

— At
𝑥" = 2.5
𝑦" = cos 2.5$ − 2.5 + 3 = 1.4994

— At
𝑥$ = 3
𝑦$ = −0.9111
51
Accuracy of linear interpolation

— Using the linear interpolation


formula at y = 0,
(−0.9111 − 1.4994)
0 = 1.4994 + (𝑥 − 2.5)
(3 − 2.5)
we get 𝑥 = 2.81

— Therefore,
𝜀 = 2.81 − 2.83 = −0.02

In other words, the linear


interpolation underestimates the
true value at y = 0
52
Worked Example 4: interpolation

A biologist measures the activity of a newly-discovered enzyme as function of


temperature. The following data were collected:

Temperature (°C) Relative activity (%)


20 5.7
30 62.0
40 93.2
50 67.1
60 7.1

Use linear interpolation to estimate the activity at 35 oC.

53
Worked Example 4: interpolation

Solution:
The interval we are interested in is (30, 62.0) to (40, 93.2). Hence, x1 = 30, x2 = 40,
y1= 62.0, y2 = 93.2. Use the linear interpolation formula:

#!.$&'$.(
At x = 35, y = 62.0 + 35 − 30 = 77.6
)(&!(

Therefore, the activity of the enzyme at 35 oC is 77.6%.

54
Class Exercise 3: interpolation

A function f(x) is known to pass through the points


(24, 81) and (25, 83).
Using linear interpolation, find an estimate for the value f(24.75).

55
Problem Set 1

56
Question 1

Toss a 6-sided die 24 times (if you do not have a real die, use a
virtual one on your smartphone/computer) and record your
results, then answer the following:
(a) Make a dot diagram of the results
(b) Based on the dot diagram, did all the different outcomes occur
the same number of times? If not, which outcome occurred the
most times?
(c) What would you expect the dot diagram to look like if you
tossed the die 600 times instead?

57
Question 2

A coin is tossed three times.


(a) Draw a tree diagram to show all the possible
outcomes.
(b) Use the diagram to calculate the probability of
getting: (i) three tails, (ii) two heads, (iii) no tails, and
(iv) at least one tail.

58
Question 3
The Venn diagram shows the different majors that a group of high
school students have chosen to study in university.

Find the probability that:


(a) a student selected at random studies art or history or both but
not business studies.
(b) a student selected at random studies any two majors.

59
Question 4
A chemical engineering student wants to find out the specific heat capacity of water at 70
oC and 16 bar. She looks in her thermodynamics textbook and sees the following table:

Using linear interpolation, help the student find the desired value based on the given table.
Note: you will need to interpolate in two dimensions (bilinear interpolation)!
60

You might also like