0% found this document useful (0 votes)
326 views3 pages

Stats Midterms Cheat Sheet

This document defines statistical terms and concepts related to experimental design, sampling methods, data organization and visualization, measures of central tendency and variation, probability, and the relationship between two variables. It discusses population vs. sample, types of variables and scales of measurement, sources of data, types of samples including probability and non-probability, comparing sampling methods, errors in surveys, methods for organizing and visualizing data, measures of central tendency like mean and median, measures of variation like range and standard deviation, probability formulas, and the relationship between two variables.

Uploaded by

jibberish yo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
326 views3 pages

Stats Midterms Cheat Sheet

This document defines statistical terms and concepts related to experimental design, sampling methods, data organization and visualization, measures of central tendency and variation, probability, and the relationship between two variables. It discusses population vs. sample, types of variables and scales of measurement, sources of data, types of samples including probability and non-probability, comparing sampling methods, errors in surveys, methods for organizing and visualizing data, measures of central tendency like mean and median, measures of variation like range and standard deviation, probability formulas, and the relationship between two variables.

Uploaded by

jibberish yo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Stats Definitions - designed experiment (direct control -Measurement Error (eg bad qn/ 

Ogive/Cumulative % Polygon (x-


- Variable: characteristic of an item or over who gets treatment) Hawthorne effect) axis: variable of interest, y-axis:
individual - observational studies (no control) cumulative %)
- Data: set of individual values Organizing - 2 Numerical Variables
associated with a variable Types of Samples - Categorical  Scatter Plot
- Statistics: methods that help transform - Non-probability  Summary Table (1 variable)  Time Sequence
data into useful information  Judgement (get opinions of  Contingency Table (2 variables)
- Population: all the items/individuals experts) - Numerical Central Tendency
about which you want to draw a  Convenience (easy)  Ordered Array (rank from min to - Central Tendency: extent to which all
conclusion - Probability max) data values group around a central
- Sample: proportion of population  Simple Random (equal chance  Frequency Distribution value
selected for analysis of being picked) o Class, frequency, n N

- Population Parameter: summarizes  Systematic (pick every kth frequency %, cum -


∑ xi , ∑ xi
i=1
the value of a specific variable of a person, where frequency, cum % x= μ= i=1
population n N
number of people o 5-15 classes
- Sample Statistic: summarizes the value k= n+1
sample ¿ ¿ ¿ o - Median (position) =
of a specific variable for sample data 2
 Stratified (divide pop into strata range - Mode = most common value
class interval=
& select sample to mimics its no . of groups desired
Types of Variables - Should use both mean and median
characteristics)  Cumulative Distribution
- Categorical since mean is affected by extreme
 Cluster (divide pop into clusters;
 Nominal (defined categories)  outliers
each representative of pop) Visualizing
 nominal scale - Summary
 Ordinal (ordered categories)  Variation
Comparing Sampling Methods  Bar Chart
ordinal scale - Variation: amount of dispersion/
- Simple Random/Systematic  Pie/Doughnut Chart
- Numerical scattering of values
 Simple to use  Pareto Chart (bar chart with
 Discrete (counting) - Range = Xmax - Xmin (ignores distribution
 Not a good representative of decreasing order of frequency + of data & is sensitive to outliers)
 Continuous (measurement) population’s characteristics cumulative polygon) n N
 Uses either interval scale (no 0 - S =∑ ¿ ¿ ¿), σ =∑ ¿ ¿ ¿
2 2
- Stratified - Contingency
point) or ratio scale (true 0  Ensures representation  Side by Side Bar Chart i =1 i=1
n N
scale) - Cluster  Doughnut Chart - S=√ ∑ ¿ ¿ ¿ , σ =√ ∑ ¿ ¿ ¿
 Cost effective - Ordered Array i=1 i=1
Sources of Data  Less efficient S
 Stem-and-Leaf Display
- Data distributed by organizations or - Coefficient of Variation: CV = ×100
- Frequency/Cumulative Distribution x
individuals (eg weather report, financial Survey Errors  Histogram (x-axis: midpoint, y-  Always in %
statements) - Coverage Error/Selection Bias (exclude axis: freq/ rel freq/ freq %)  Measures relative variation to
- Survey people)  Polygon (x-axis: midpoint, y- mean
- Data collected by ongoing business - Nonresponse Error axis: freq %)
activities (eg Big Data) - Sampling Error (always exists)
 Can be used to compare Q2-min > Q2-min = Q2-min < - Empirical: based on data collected  No. of events in one area of
variability of 2/more data sets max-Q2 max-Q2 max-Q2 - Subjective: based on experience, opportunity is independent of
with different units Q1-min > Q1-min = Q1-min < opinion and analysis of a situation the no. of events in other areas
x −x max-Q3 max-Q3 max-Q3 of opportunity
- Z-score: Z=
S Q2-Q1 > Q3- Q2-Q1 = Q3- Q2-Q1 < Q3- Probability Formulas  Probability that 2 or more
 Extreme outlier if Z < -3.0 or Q2 Q2 Q2 - P(A U B) = P(A) + P(B) – P(A n B) events occur in an area of
Z > +3.0 *Just need to fulfill 2/3 conditions P ( A n B) opportunity approaches 0 as the
Measure of R/S Between 2 Variables - P(A|B) =
P (B) area of opportunity shrinks
Shape - Covariance: measures the strength of -If 2 events are independent, then P(A|  Avg no. of events = λ
- Shape: pattern of distribution the linear relationship X & Y B) = P(A)
−λ
e λ
x

- - P ( X=x|λ )=
- Skewness: measures extent to which Discrete Random Variables X!
n
data values are not symmetrical (X)
 Left-skewed (-ve):
∑ ( X i− X)(Y i −Y ) 1 - μ=E - μ= λ
- σ 2=λ
¿ - σ =E ( X ) −E(X )
2 2 2
mean<median<mode cov ( X , Y )= i=1 =
n−1 n−1 - σ =λ
 Symmetric:  cov(X,Y)>0  X & Y tend to Binomial Distribution
mean=median=mode move in the same direction - Characteristics: Normal Distribution
 Right-skewed (+ve): X−μ
 cov(X,Y)<0  X & Y tend to  Fixed number of observations - Z=
mode<median<mean σ
move in the opposite  Each observation is categorized
- Kurtosis: affects peakedness of curve - Empirical Rules:
direction into success or failure
 Constant probability 1. μ ± σ=0.6826
 cov(X,Y)=0  X & Y are
 Observations are independent 2. μ ± 2σ =0.9544
independent
- 3. μ ±3 σ=0.9973
- Coefficient of Correlation: measures
n! n −x - Evaluating Normality
P ( X=x|n , π )=
x
the relative strength of the linear π (1−π)
x ! ( n−x ) !  Construct charts/graphs (check
relationship between X & Y
- μ=E ( X )=nπ for symmetry and bell shape)
cov (X , Y )
- r=  Compute descriptive stats
S X SY - σ 2=nπ (1−π )
o Mean≈ Median≈ Mode
5 Number Summary *population coefficient of correlation = p - σ =√ nπ (1−π )
sample coefficient of correlation = r o IQR ≈ 1.33 σ
- min, Q1, Q2, Q3, max - Shape:
 Closer to -1  stronger -ve o Range ≈ 6 σ
n+1  π < 0.5  right-skewed
- Q1 (position) = relationship  Check theoretical properties
4  π > 0.5  left-skewed
 Closer to +1  stronger +ve o Empirical rule holds
n+1 o ~80% lies within
- Q2 (position) = relationship Poisson Distribution
2  Closer to 0  weaker linear x ± 1.28 σ
3(n+1) - Characteristics:
- Q3 (position) = relationship  Probability that an event occurs
4 Uniform Distribution
- IQR = Q3 – Q1 in one area of opportunity is the
Assessing Probability same for all areas of
Left- Symmetric Right- - Priori: based on prior knowledge opportunity
skewed skewed
- Uniform Distribution: probability
distribution that has equal probabilities
for all possible outcomes

- Probability = base x height


1
- f ( x )= if a ≤ X ≤ b
(b−a)
- Otherwise, f ( x )=0
a+ b
- μ=
2
2
(b−a)
- σ =√
12

You might also like