MECH 262 - Notes (Statistics)
MECH 262 - Notes (Statistics)
Terminology:
● Population: Entire data set
● Sample: Subset of population
● Sample Space: all possible outcomes of data set
● Discrete: Fixed number of options
● Continuous: Infinite number of options
● Random/Stochastic variable: assigned number to identify outcome
● Distributions
○ Symmetric
○ Uniform
○ Bimodal
○ Skewed
○ J-Shaped
● Stochastic process
○ Random process
Describing data set:
● Central tendency
○ Mean
○ Median
○ Mode
● Dispersion
○ Standard deviation (root mean square)
○
Normal distribution:
● Also called gaussian distribution or bell curve
● Relates mean to standard deviation
●
Probability axioms:
● Probability: Likelihood that event will happen
● Axiom 1: probability is between 0 and 1
● Axiom 2: P=1 means event must happen
● Axiom 3: sum of all probabilities equals 1
Probability rules:
● Mutually exclusive
○ Events cannot occur at same time
○ RULE: 𝑃(𝐴∪𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
○ eg. Flipping coin and getting heads or tails
● Mutually inclusive
○ Two event may or may not occur together
○ RULE: 𝑃(𝐴∪𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴∩𝐵)
■ So we don’t double count overlap
● Independent events
○ Outcome of A doesn’t influence B
○ RULE: 𝑃(𝐴∩𝐵) = 𝑃(𝐴) × 𝑃(𝐵)
○ eg. Flipping two coins separately
● Dependent events:
○ Also called conditional probabilities
○ Probability of A given B happening
○ Denoted: 𝑃(𝐴|𝐵)
○ RULE: 𝑃(𝐴∩𝐵) = 𝑃(𝐵) 𝑃(𝐴|𝐵)
Probability distribution:
● Probability mass functions (PMF)
○ For discrete random variables
○ Mean: µ = Σ𝑥𝑖 𝑃(𝑥𝑖)
2 2
○ Variance: σ = Σ(𝑥𝑖 − µ) 𝑃(𝑥𝑖)
● Probability density function (PDF)
○ Since infinite number of outcomes, probability of given outcome is 0
■ ∴intervals must be used
○ Use integral instead of sums
○ 𝑃(𝑎 ≤ 𝑥 ≤ 𝑏) = ∫𝑓(𝑥) 𝑑𝑥
○ Mean: µ = ∫𝑥 𝑓(𝑥) 𝑑𝑥
2
○ Variance: σ = ∫(𝑥 − µ) 𝑓(𝑥) 𝑑𝑥
● Cumulative distribution function (CDF)
○ Use probability function to determine probability of event in certain range
■ Adjust bonds of integration
●
○ But hard to solve so use change of variables
(𝑥−µ)
○ 𝑧= σ
■ How many standard deviations away from mean
○ Set standard integral:
■
■ Set z1=0 to make it a single variable function
■ Use tables
● Using matlab to find probability
○ p=normcdf(z) where z is change of variable OR
■ Form -∞ to point (NOT 0)
○ p=normpdf(x, μ, σ)
■ Form -∞ to point (NOT 0)
Standard lognormal distribution:
● Strictly positive and occasionally very large
○ eg. Lifetime of equipment
● Logarithm that is normally distributed
○ Take ln of variable apply standard normal distribution
Exponential distribution
● Likelihood of event increase or decreases exponentially with time
−λ𝑥
● PDF function: 𝑓(𝑥, λ) = λ𝑒 where λ is rate parameter
1
● Mean (μ) = standard deviation (σ) = λ
−λ𝑥1 −λ𝑥2
● CDF: 𝑃(𝑥1 ≤ 𝑥 ≤ 𝑥2) = 𝑒 −𝑒
■ If n>30 →
Interval estimation of mean:
● Determining error in our sample mean
●
○ δ is confidence interval
○ Standard in 95% confidence interval (ASME standard)
● Confidence level (C)
○ Probability that population mean (μ) lies within confidence interval
○ 𝐶 = 1 − α where α is level of significance
○ C is % chance event will happen
○ α is % chance event will not happen
● Assume standard deviation of sample is equal to standard deviation of population
○ 𝑆= σ
𝑧α/2𝑆
● δ= where α is significance
𝑛
○ To find zα/2 reverse the z process using tables
○ Using matlab
■ z = norminv(p)
■ Linkes probability to z value
○ z1 is at norminv(α/2) AND z2 is at norminv(C+α/2)
● One-sided intervals
○ Only interested in upper of lower limit
■ Upper:
■ Lower:
○ DON’T divide α by 2 since all area (probability) is on one side
Student’s t-distribution:
● Use when n<30
● Same procedure as normal distribution BUT
○ Use t instead of z
○ Matlab:
α
■ tinv(p, nu) where 𝑝𝑢𝑝𝑝𝑒𝑟 = 𝐶 + 2
■ ν (nu) is degree of freedom
● As ν→∞, distribution approaches normal distribution
● As ν→∞, distribution flattens and widens
Estimation of population variance:
2
● Use chi-squared (χ ) distribution
● Use matlab
○ chi2inv(p, nu)
2
● χ is only positive therefore bounds are:
α α
○ 𝑝1 = 2
𝑝2 = 𝐶 + 2
Correlation
Linear correlation:
● Linear correlation coefficient (rx, y)
○ 𝑟 = 1 → strong positive correlation
○ 𝑟 =− 1 → strong negative correlation
○ 𝑟 =± 0. 1 → no correlation
● Only provides data on correlation
○ NO slope
○ NO non-linear correlation
● Matlab:
○ corr(x, y) where x and y are arrays of values
● Significance of linear correlation coefficient
○ ↗ data points → ↗ significance
○ Table gives minimum correlation coefficient needed to accept correlation
○ Depends on
■ #of data points sampled
■ Significance level wanted (α) (%that correlation is due to pure chance)