0% found this document useful (0 votes)
5 views12 pages

Statistics - Reviewer

The document provides an overview of statistics, including types of data (categorical and scale), measures of central tendency (mean, median, mode), and measures of variation (range, variance, standard deviation). It also covers covariance, correlation, skewness, kurtosis, and statistical tests such as the Chi-Square test and T-tests. The content is structured into weekly sections, detailing key concepts, formulas, and examples relevant to statistical analysis.

Uploaded by

Divlen Jacutin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views12 pages

Statistics - Reviewer

The document provides an overview of statistics, including types of data (categorical and scale), measures of central tendency (mean, median, mode), and measures of variation (range, variance, standard deviation). It also covers covariance, correlation, skewness, kurtosis, and statistical tests such as the Chi-Square test and T-tests. The content is structured into weekly sections, detailing key concepts, formulas, and examples relevant to statistical analysis.

Uploaded by

Divlen Jacutin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

WEEK 1 - PRELIMS WHAT IS STATISTICS?

-​ Statistics is a study that deals with


TYPES OF DATA the collection, organization,
analysis, interpretation, and
1.​ Categorical (Qualitative Data) - presentation of data to
Categorical data represents understand or solve problems. It
labels, groups, or categories. It also helps in making informed
does not havea numerical decisions based on numerical/
meaning but is used to classify empirical evidence.
observations. TYPES OF STATISTICS:
a.​ Nominal: Categories with 1.​ Descriptive Statistics –
no specific order (e.g., Summarizes and presents given
gender, colors, car data in a meaningful way which
brands). can be a representation of the
b.​ Ordinal: Categories with a entire population or a sample.
meaningful order/rank but ○​ Measures of Central
unequal intervals (e.g., Tendency: Mean, Median,
customer satisfaction Mode
ratings: Poor, Fair, Good, ○​ Measures of Dispersion:
Excellent). Range, Variance,
2.​ Scale (Quantitative Data) - Scale Standard Deviation
data consists of numerical values 2.​ Inferential Statistics – analyzes
that represent measurable sample data to make predictions
quantities like height, weight, or or inferences about a larger
temperature. population. It involves drawing a
a.​ Discrete: integers, whole conclusion.
numbers, or counting ○​ Hypothesis Testing
numbers (number of ○​ Confidence Intervals
employees, number of ○​ Regression Analysis
products sold, allowance).
b.​ Continuous: contains
fractions and decimals (
(e.g., weight, height,
temperature)

​ SCALE MEASUREMENT:
●​ Ratio - there is real zero.
●​ Interval - no real zero.
●​ Ordinal - ranking
●​ Nominal - numbers
WEEK 2 - PRELIMS Measures of Variation:
Measures of Central Tendency: -​ refers to how spread out or
-​ It indicates the center of the dispersed the data points are in a
data set. Subject to outliers. dataset from the central
(vulnerability of data using mean tendency. It helps measure the
to compare). consistency or inconsistency of
1.​ Mean (Average) values and provides insight into
●​ The sum of all values divided by the reliability of data.
the number of values.
●​ Formula: Mean=∑ X/N 1.​ Range
●​ Best for: Normally distributed data ●​ The difference between the
without extreme values. highest and lowest values in a
2.​ Median dataset.(higher range means the
●​ The middle value when data is data are spreadout.)
arranged in ascending order. ●​ Formula:
●​ Best for: Skewed data or data Range=Maximum
with outliers. Value−Minimum Value
●​ More stable than descriptive ●​ Example: If the highest salary in a
because it is not susceptible to company is ₱80,000 and the
outliers. (like dili ma change if nay lowest is ₱20,000, the range is
dako nga amount.) ₱60,000.
●​ Limitation: Only considers two
PARAMETER - population values and ignores distribution.
STATISTICS(n) - SAMPLE 2.​ Variance:
●​ Measures the average squared
3.​ Mode difference from the mean.
●​ The most frequently occurring ●​ High variance = More spread out
value in a dataset. data.
●​ A dataset can have one mode ●​ Low variance = Data points are
(unimodal), two modes close to the mean.
(bimodal), or more (multimodal).
●​ Example: In 2, 3, 3, 4, 5, 5, 5, 6, the
mode is 5 (appears most
frequently).
●​ Best for: Categorical data or
identifying common values in a
dataset.
Standard Deviation Example Calculation:
-​ The square root of variance; Consider the dataset: 5, 7, 9, 12, 15, 18,
shows the average distance of 22, 26, 30
data points from the mean. ●​ Q1 (25th percentile) = 9
●​ Q3 (75th percentile) = 22
●​ IQR = 22 - 9 = 13
Formulas:

Interquartile Range (IQR)


Answer:
-​ The Interquartile Range (IQR) is a
measure of statistical dispersion
that shows the spread of the
middle 50% of a dataset/ divided
intro 4 equal parts(25%). It is useful
for identifying variability while
reducing the impact of outliers.
IQR 5 VALUES SUMMARY:
1.​ Maximum - X largest
2.​ Minimum - X smallest
3.​ Q3
4.​ Q4
5.​ Q1.
How to Calculate IQR:
1.​ Arrange the data in ascending
order.
2.​ Find the first quartile (Q1) – the
median of the lower half (25th
percentile).
3.​ Find the third quartile (Q3) – the NOTE: If 5 ra ang sample (n),
median of the upper half (75th automatically apply the IQR.
percentile). = Xsmallest, Q1,Q2,Q3,Xlargest
4.​ Compute the IQR using the
formula: IQR=Q3−Q1
N(big N)usually refers to a population
size, while n(small n) refers to a sample
size.
CO-VARIANCE VS. CORRELATION ○​ +1: Perfect positive
correlation
Covariance ○​ 0: No correlation
-Covariance measures the direction of The strength of the relationship can be
the relationship between two variables. categorized as:
It indicates how changes in one ●​ Weak: A correlation coefficient
variable are related to changes in close to 0
another. ●​ Moderate: A correlation
●​ Positive Covariance: When both coefficient between 0.3 and 0.7
variables move in the same ●​ Strong: A correlation coefficient
direction, meaning if one between 0.7 and 1
increases, the other increases as ●​ Very Strong: A correlation
well, and if one decreases, the coefficient close to 1
other decreases.
●​ Negative Covariance: When the
variables move in opposite
directions, meaning if one
increases, the other decreases,
and vice versa.
●​ Zero Covariance: A covariance
of 0 means there is no relationship
between the variables, indicating
that they are independent of
each other.

Correlation
- Correlation measures both the strength
and direction of the relationship
between two variables. Unlike
covariance, which only indicates the
direction of the relationship, correlation
provides a clearer picture of how
strongly the two variables are related.

Strength: It indicates the strength


of the relationship. The strength is
measured by the correlation
coefficient, which ranges from -1
to +1:
○​ -1: Perfect negative
correlation
WEEK 3 - PRELIMS ●​ Moderate skew: Between -1 and
1.
Skewness ●​ Extreme skew: Below -1
- Skewness indicates the degree of (negatively skewed) or above 1
asymmetry or lack of symmetry in a (positively skewed).
data distribution. Can We Run Statistical Tests?
●​ Positively Skewed (Right ●​ Yes, you can run tests on skewed
Skewed): data, but you might need to
○​ The data is concentrated normalize or transform the data
on the left side of the (e.g., log transformation) to meet
distribution. assumptions of normality.
○​ The mean is greater than Kurtosis
the median, and the Kurtosis measures the "tailedness" of the
median is greater than the distribution and the extent of outliers.
mode. ●​ Leptokurtic:
○​ Example: Sales data, ○​ Steeper curve
where a few very high (light-tailed).
values pull the mean to ○​ High kurtosis (> 3).
the right. ●​ Platykurtic:
●​ Negatively Skewed (Left ○​ Flatter curve
Skewed): (heavy-tailed).
○​ The data is concentrated ○​ Low kurtosis (< 3).
on the right side of the ●​ Kurtosis values:
distribution. ○​ 3: Normal distribution
○​ The mean is less than the (mesokurtic).
median, and the median is ○​ Less than 3: Negative
less than the mode. kurtosis (platykurtic).
○​ Example: Retirement data, ○​ Greater than 3: Positive
where fewer people retire kurtosis (leptokurtic).
early, so the mean is lower. ●​ Kurtosis Classification:
Pearson’s Second Coefficient of ○​ Between -1 and +1:
Skewness Excellent.
This measures the skewness of a ○​ Between -2 and +2:
distribution: Acceptable.
●​ Symmetrical distribution: Between Measures of Variation
-0.5 and 0.5 (close to 0). Variation or dispersion measures how
●​ Negatively skewed: Between -1 spread out the data is. Key measures
and -0.5. include:
●​ Positively skewed: Between 0.5 ●​ Range: Difference between the
and 1. maximum and minimum values.
●​ Variance: Measure of the Correlation
average squared deviations from Correlation measures both the strength
the mean. and direction of the relationship
●​ Standard Deviation: Square root between two variables:
of the variance. ●​ Range: From -1 to +1.
●​ Interquartile Range (IQR): The ○​ +1: Perfect positive
difference between the 3rd correlation.
quartile (Q3) and the 1st quartile ○​ -1: Perfect negative
(Q1). correlation.
○​ IQR = Q3 - Q1. ○​ 0: No correlation.
○​ The median is also known ○​ Closer to +1: Stronger
as Q2. positive correlation.
○​ IQR is resistant to outliers. ○​ Closer to -1: Stronger
Variables negative correlation.
●​ Independent Variable (X): Cohen’s D
○​ Also called the predictor Cohen’s D measures the effect size or
or explanatory variable. the difference between two groups.
●​ Dependent Variable (Y): ●​ Weak: Small effect size.
○​ Also called the outcome ●​ Moderate: Medium effect size.
or result variable. ●​ Strong: Large effect size.
Relationship Between Two Numerical ●​ Very Strong: Very large effect
Variables size.
●​ Discrete vs. Discrete: Both
variables are discrete
(countable).
●​ Continuous vs. Continuous: Both
variables are continuous
(measurable).
Covariance
Covariance measures the direction of
the relationship between two variables:
●​ Positive Covariance: If the two
variables move in the same
direction.
○​ Example: High IQ tends to
correlate with high
academic performance.
●​ Negative Covariance: If the two
variables move in opposite
directions.
WEEK 4 - PRELIMS Compares two
different groups
Statistical Analysis and Chi-Square Test (e.g., male vs.
Understanding Frequency and female
Percentages intelligence).
1. Frequency – The number of times a ■​ Paired Sample
response appears in the dataset.​ T-test: Compares
Example: Counting how many the same group at
answered 4 or 5 in a survey. two different times
(e.g., pre-test vs.
2. Steps to Calculate Frequency in post-test).
excel: ○​ Regression Analysis & SEM
(Structural Equation
Modeling):
■​ More advanced
tests requiring
statistical software
like SPSS or AMOS.
Non-Parametric Tests
●​ Used when working with
categorical data (nominal or
ordinal).
●​ Measures association or
3. Determining the Range: relationships rather than
Use the MAX function to find the highest magnitude.
value in the dataset. ●​ Common Non-Parametric Test:
○​ Chi-Square Test
Parametric vs. Non-Parametric Tests ■​ Test of
Parametric Tests Independence:
●​ Used for numerical data when Examines the
making inferences about a relationship
population. between two
●​ Involves measuring magnitude, categorical
effect, or influence between variables.
variables. ■​ Test of Correlation:
●​ Common Parametric Tests: Examines the
○​ T-test: Determines if there is relationship
a significant difference between two
between two groups. numerical variables.
■​ Independent ○​ Example: Analyzing the
Sample T-test: relationship between
gender and pet ○​ Fe values are computed
preference (dog vs. cat). for each cell using the
formula:
Chi-Square Test (χ² Test)
1.​ Understanding the 2x2 Table
Structure:
○​ Rows represent one
categorical variable (e.g.,
gender). Steps to Perform Chi-Square Test in Excel
○​ Columns represent 1.​ Input the observed frequencies
another categorical (Fo) into an Excel table.
variable (e.g., pet 2.​ Calculate the expected
preference). frequencies (Fe):
○​ Degrees of Freedom (df) Use the formula:​
Formula: =Frequency(Row Total * Column Total) /
df=(rows−1)×(columns−1) Grand Total) -itype ang number of
Key Terms: grand total.
○​ Fo (Observed Frequency):
The actual count from the 3.​ Compute the Chi-Square value
dataset. (χ²) using the formula:
○​ Fe (Expected Frequency): ○​ =CHITEST(Highlight
The theoretically expected Fo,Highlight Fe)
count, calculated as: 4.​ Find the Chi-Square critical value
Fe=N/k using the CHIINV function:
where N is the total sample =CHIINV(probability,degrees_of_f
size, and k is the number reedom)
of categories. ○​ Use a significance level
(alpha) of 0.05.
2.​ Example Data:​ ○​ Compare the computed
χ² value to the critical
Gender Dog Cat Total value:
○​ If χ² > critical value, reject
Male 5 6 11 the null hypothesis
(variables are
Female 20 13 33 dependent).
Total 25 19 44 ●​ If χ² ≤ critical value, fail to reject
the null hypothesis (variables are
○​ Fo values are directly independent).
taken from the table.
Conclusion H₀, we do not reject it, though this
●​ If χ² test shows a significant does not confirm that H₀ is true.
relationship, the two categorical 4. Level of Significance (Alpha, α)
variables are dependent. The level of significance (α) represents
●​ If not, there is no strong evidence the probability of rejecting a true null
of a relationship. hypothesis (Type I error). Common
values are 0.05 (5%) or 0.01 (1%),
1. Statistical Inference meaning there is a 5% or 1% risk of
Statistical inference is the process of concluding a false effect when none
using sample data to make conclusions exists.
about a population. It involves 5. One-Tailed Test vs. Two-Tailed Test
estimating population parameters (e.g., ●​ One-tailed test: Tests if a
mean, proportion) and testing parameter is greater than or less
hypotheses based on sample statistics. than a certain value (directional).
Methods of inference include estimation Example: Testing if a new drug
(confidence intervals) and hypothesis increases effectiveness.
testing. ●​ Two-tailed test: Tests if a
2. Tests of Hypothesis parameter is different from a
A test of hypothesis is a statistical certain value, regardless of
procedure used to decide whether a direction (non-directional).
statement (hypothesis) about a Example: Checking if a new
population parameter is supported by teaching method results in a
sample data. It involves formulating a different average test score.
null hypothesis (H₀) and an alternative 6. Independent Variable and
hypothesis (H₁), selecting a significance Dependent Variable in a Schema
level (α), computing a test statistic, and A schema representing the relationship
making a decision based on critical between independent and dependent
values or p-values. variables typically looks like this:
3. Statistical Hypothesis - Rejection vs. Independent Variable (IV) → Affects →
Non-Rejection of Hypothesis Dependent Variable (DV)
A statistical hypothesis is an assumption ●​ Independent Variable (IV): The
about a population parameter. The two variable that is manipulated or
possible decisions in hypothesis testing categorized (e.g., teaching
are: method).
●​ Rejection of the null hypothesis ●​ Dependent Variable (DV): The
(H₀): If the sample evidence outcome that is measured (e.g.,
strongly contradicts H₀, we reject student test scores).
it in favor of H₁. Example schema:
●​ Non-rejection of H₀: If the 📘 Teaching Method (IV) → Student
evidence is insufficient to reject Performance (DV)
7. Statement of the Problem 0.05), reject the null
A statement of the problem defines the hypothesis. Otherwise, fail
main research question(s) being to reject the null
investigated. It should be clear, concise, hypothesis.
and researchable, specifying the key 5.​ CHINV Function:
variables and the study's purpose. ○​ Purpose: Calculates the
Example:​ critical value of the
"Does implementing a digital budgeting chi-square distribution for a
system improve financial management given significance level
among small business owners?" and degrees of freedom.
Would you like further elaboration on ○​ The critical value obtained
any of these? from CHINV is compared
to the calculated
WEEK 5 - PRELIMS (LAST DISCUSSION) chi-square test statistic to
decide whether to reject
Excel Analysis Steps (Hospital length of or fail to reject the null
stay): hypothesis.

1.​ Identify Data Type: Hypothesis Testing Details:


○​ Compare ordinal vs.
ordinal data for the length 1.​ Research Question:
of stay and coverage ○​ Can you conclude that
categories. there is a relationship
2.​ Calculate Totals: between the length of
○​ Compute column totals hospital stay and
and row totals for hospitalization coverage
accurate data costs?
interpretation. 2.​ Hypotheses:
3.​ Compute Expected Frequency: ○​ Null Hypothesis (H₀):
○​ Necessary for statistical Length of stay and
analysis, possibly for a hospitalization costs are
chi-square test. not related.
4.​ CHITEST Function: ○​ Alternative Hypothesis (H₁):
○​ Purpose: Calculates the Length of stay and
p-value of the chi-square hospitalization costs are
test to determine the related.
significance of the 3.​ Significance Level (Alpha):
observed data compared ○​ α = 0.05 (threshold level of
to the expected data. significance)
○​ If the p-value is less than 4.​ P-value Interpretation:
the significance level (α =
○​ Less than 0.01: Highly ○​ If the difference is less than
significant. Thus, reject the 4: Assume equal
null hypothesis. variances.
○​ Greater than 0.05: Not
significant. Thus, do not T-test Analysis Steps(based on
reject the null hypothesis. assignment):
○​ Less than 0.05: Significant.
Thus, reject the null 1. Hypotheses Formulation:
hypothesis.
●​ Null Hypothesis (Ho):
5.​ Test of Hypothesis:
○​ Method B is as effective as
○​ Null Hypothesis (H₀): Reject
Method A.
or do not reject based on
p-value or chi-square
comparison.
○​ Alternative Hypothesis (H₁):
●​ Alternative Hypothesis (H1):
Accepted if H₀ is rejected.
○​ Method B is more effective
than Method A.
Conclusion and Interpretation:
○​ This suggests a One-tailed
●​ If the null hypothesis is rejected, it test since the direction of
indicates a significant effectiveness is specified.
relationship between the length
of hospital stay and the Type of T-test:
hospitalization cost coverage.
●​ One-tailed T-test:
●​ If the null hypothesis is not
○​ Used when the research
rejected, it suggests no significant
hypothesis suggests a
relationship between the two
specific direction (e.g.,
variables.
Method B is greater than
T-TEST ANALYSIS Method A).
Variance Assumptions: ○​ Rejection Region:
■​ If Right-tailed: The
●​ The decision on variance
rejection region is
assumption is based on the
on the right side of
difference between the
the distribution (for
variances:
hypotheses
○​ If the difference is greater suggesting an
than 4: Assume unequal increase or greater
variances. than).
■​ If Left-tailed: The
rejection region is
on the left side (for
hypotheses ●​ For Positive T-stat or P-value:
suggesting a ○​ If T-stat > T-critical or
decrease or less P-value > Alpha → Reject
than). the Null Hypothesis
●​ Two-tailed T-test: ○​ If T-stat ≤ T-critical or
○​ Used when the hypothesis P-value ≤ Alpha → Do Not
does not suggest a Reject the Null Hypothesis
specific direction (e.g., ●​ For Negative T-stat or P-value:
there is a difference but ○​ If -T-stat < -T-critical or
unsure whether greater or P-value < Alpha → Reject
lesser). the Null Hypothesis
○​ Rejection Region: Both ○​ If -T-stat ≥ -T-critical or
ends of the distribution. P-value ≥ Alpha → Do Not
Reject the Null Hypothesis
2. Level of Significance (Alpha):
6. Conclusion
●​ It is the threshold for rejecting the
null hypothesis. 7.Insight
●​ Standard values are 0.05 or 0.01,
depending on the desired
confidence level

3. Calculate Test Statistic:

●​ T-stat (or T-value): The calculated


value from the T-test formula,
which measures the difference
relative to the variation in your
sample data.

4. Determine Critical Value:

●​ T-critical (or Tabulated T-value):


○​ Obtained from the
T-distribution table using
the degrees of freedom
and the chosen alpha
level.
○​ It serves as the cutoff point
for the rejection region.

5. Compare and Decision Rule:

You might also like