0% found this document useful (0 votes)
13 views

Statistics - Exam Reviewer (Final)

The document provides a comprehensive overview of key concepts in statistics, including definitions, types of statistics, data types, measurement scales, and hypothesis testing. It explains descriptive and inferential statistics, measures of central tendency and variation, and outlines the formulation of research questions and hypotheses. Additionally, it discusses the identification of appropriate statistical tests, the interpretation of results, and the importance of frequency in data analysis.

Uploaded by

Divlen Jacutin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Statistics - Exam Reviewer (Final)

The document provides a comprehensive overview of key concepts in statistics, including definitions, types of statistics, data types, measurement scales, and hypothesis testing. It explains descriptive and inferential statistics, measures of central tendency and variation, and outlines the formulation of research questions and hypotheses. Additionally, it discusses the identification of appropriate statistical tests, the interpretation of results, and the importance of frequency in data analysis.

Uploaded by

Divlen Jacutin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Definition of Key Concepts in Statistics

What is Statistics?

●​ Statistics is a study that deals with the collection, organization, analysis,


interpretation, and presentation of data to understand or solve problems.
It helps in making informed decisions based on numerical/empirical
evidence.

Types of Statistics:

1.​ Descriptive Statistics – Summarizes and presents given data in a


meaningful way, which can be a representation of the entire population
or a sample.
○​ Measures of Central Tendency: Mean, Median, Mode
○​ Measures of Dispersion: Range, Variance, Standard Deviation
2.​ Inferential Statistics – Analyzes sample data to make predictions or
inferences about a larger population. It involves drawing a conclusion.
○​ Hypothesis Testing
○​ Confidence Intervals
○​ Regression Analysis

Identification of Data Types and Measurement Scales

Types of Data

1.​ Categorical (Qualitative Data) - Represents labels, groups, or categories. It


does not have numerical meaning but is used to classify observations.
○​ Nominal: Categories with no specific order (e.g., gender, colors, car
brands).
○​ Ordinal: Categories with a meaningful order/rank but unequal
intervals (e.g., customer satisfaction ratings: Poor, Fair, Good,
Excellent).
2.​ Scale (Quantitative Data) - Consists of numerical values that represent
measurable quantities like height, weight, or temperature.
○​ Discrete: Integers, whole numbers, or counting numbers (number of
employees, number of products sold, allowance).
○​ Continuous: Contains fractions and decimals (e.g., weight, height,
temperature).

Scale Measurement:

●​ Ratio: There is a real zero.


●​ Interval: No real zero.
●​ Ordinal: Ranking
●​ Nominal: Numbers

Problems for Descriptive Statistics

Measures of Central Tendency - Indicates the center of the data set. Subject to
outliers. (Vulnerability of data using mean to compare).

1.​ Mean (Average) - The sum of all values divided by the number of values.
○​ Formula: Mean= Summation of X/N
○​ Best for: Normally distributed data without extreme values.
2.​ Median - The middle value when data is arranged in ascending order.
○​ Best for: Skewed data or data with outliers.
○​ More stable than the mean because it is not susceptible to outliers.
3.​ Mode - The most frequently occurring value in a dataset.
○​ Example: In 2, 3, 3, 4, 5, 5, 5, 6, the mode is 5.
○​ Best for: Categorical data or identifying common values in a
dataset.

PARAMETER - population
STATISTICS(n) - SAMPLE
Measures of Variation - Refers to how spread out or dispersed the data points
are in a dataset from the central tendency. It helps measure the consistency or
inconsistency of values and provides insight into the reliability of data.

1.​ Range - Difference between the highest and lowest values.


○​ Formula: Range= Max−Min
○​ Example: Highest salary is ₱80,000, lowest is ₱20,000 → Range =
₱60,000.
○​ Limitation: Only considers two values and ignores distribution.
2.​ Variance
○​ Measures the average squared difference from the mean.
○​ High Variance: More spread out data.
○​ Low Variance: Data points are close to the mean.

3.​ Standard Deviation


○​ The square root of variance; shows the average distance of data
points from the mean.
4.​ Interquartile Range (IQR) - The Interquartile Range (IQR) is a measure of
statistical dispersion that shows the spread of the middle 50% of a dataset/
divided intro 4 equal parts(25%). It is useful for identifying variability while
reducing the impact of outliers.

IQR 5 VALUES SUMMARY:


○​ Maximum - X largest
○​ Minimum - X smallest
○​ Q3
○​ Q4
○​ Q1.

How to Calculate IQR:

1.​ Arrange the data in ascending order.


2.​ Find the first quartile (Q1) – the median of the lower half (25th percentile).
3.​ Find the third quartile (Q3) – the median of the upper half (75th
percentile).
4.​ Compute the IQR using the formula: IQR=Q3−Q1

​ N(big N)usually refers to a population size,

n(small n) refers to a sample size.

Example:

Consider the dataset: 5, 7, 9, 12, 15, 18, 22, 26, 30

■​ Q1 (25th percentile): 9
■​ Q3 (75th percentile): 22
■​ IQR: 22 - 9 = 13

Formulas:
Answer:

NOTE: If 5 ra ang sample (n), automatically apply the IQR.


= Xsmallest, Q1,Q2,Q3,Xlargest

Formulation of Research Question, Hypothesis, Type of Variables

Variables

●​ Independent Variable (X): The variable that is manipulated or


categorized.
●​ Dependent Variable (Y): The outcome that is measured.

Formulating Hypotheses

1.​ Null Hypothesis (H₀): No significant relationship/difference.


2.​ Alternative Hypothesis (H₁): A significant relationship/difference exists.

Identifying the Appropriate Test

Parametric vs. Non-Parametric Tests

●​ Parametric Tests: Parametric tests are used to answer questions involving


magnitude, effects, or influence of a variable to another variable, or test of
differences (i.e., comparison of means). Used for numerical data when
making inferences about a population (e.g., t-test).
●​ Non-Parametric Tests: Non-parametric tests on the other hand, are tests for
categorical data (i.e., nominal vs. nominal, nominal vs. ordinal, ordinal vs.
ordinal) Used for categorical data (e.g., chi-square test).

Chi-square test or test of independence is a common non-parametric test. Key


concepts for chi-square test are:

●​ Contingency table (e.g., 2 x 2, 3 x 5)


●​ Observed frequency (fo)- actual counts
●​ Expected frequency (fe) - calculated as column total*row total/Grand total
●​ Grand total (N) - the sum of the column totals or the row totals
●​ Degrees of freedom (df) - calculated as no. of rows - 1 multiplied by no. columns
-1
●​ the probability - to be compared with the level of significance (to conclude
whether the variables are independent or not)
●​ the Chi-square value

Limitations of Chi-square test


* All participants measured must be independent, meaning that an individual cannot fit
in more than one category. Example, in our sample exercise, we tested the association
between gender (male & female) and pet preference (dog & cat). A male respondent
must not prefer both dog and cat, a female respondent must not prefer both dog and
cat. Otherwise, the chi-square test will not be appropriate.
* The data must be a frequency data, meaning, actual counts.
* No more than 20% of the expected frequencies should be less than 5 in a contingency
table and all individual expected counts are 1 or greater. If this rule is violated, the
Chi-Square test may not be valid, and an alternative test (such as Fisher’s Exact Test)
should be considered. See examples for this rule in the next material.
*If the sample size is less than 50, Chi-square test is not appropriate. Instead, Fisher's
Exact Test is recommended.

Tests of Relationship

1.​ Test of Independence or Association (Chi-Square Test)


○​ Determines if two categorical variables are related.
○​ Formula: χ2=∑(O−E)2E\chi^2 = \sum \frac{(O - E)^2}{E}χ2=∑E(O−E)2​
where O = Observed frequency, E = Expected frequency.
2.​ Test of Relationship for Numerical Variables (Correlation Test)
○​ Measures the strength and direction of a relationship between two
numerical variables.
○​ Correlation Coefficient (r):

Determining the results of two numerical variables.

■​ 0 - 0.19: Very weak


■​ 0.20 - 0.39: Weak
■​ 0.40 - 0.59: Moderate
■​ 0.60 - 0.79: Strong
■​ 0.80 - 1.00: Very strong
3.​ T-test (Comparison of Means)
○​ Independent Sample T-test: Compares two different groups.
○​ Paired Sample T-test: Compares the same group at two different
times.
○​ One-tailed T-test: Tests if one mean is greater or less than another.
○​ Two-tailed T-test: Tests if means are different in any direction.

Identifying One-Tailed or Two-Tailed Test

●​ One-tailed test: Tests if a parameter is greater or less than a value.


●​ Two-tailed test: Tests if a parameter is different from a value, regardless of
direction.

In detail:

●​ One-tailed T-test:
○​ Used when the research hypothesis suggests a specific direction (e.g.,
Method B is greater than Method A).
○​ Rejection Region:
■​ If Right-tailed: The rejection region is on the right side of the
distribution (for hypotheses suggesting an increase or greater than).
■​ If Left-tailed: The rejection region is on the left side (for hypotheses
suggesting a decrease or less than).
●​ Two-tailed T-test:
○​ Used when the hypothesis does not suggest a specific direction (e.g.,
there is a difference but unsure whether greater or lesser).
○​ Rejection Region: Both ends of the distribution.

Level of Significance (Alpha):

●​ It is the threshold for rejecting the null hypothesis.


●​ Standard values are 0.05 or 0.01, depending on the desired confidence level

Interpreting Results of Hypothesis Testing

1.​ Statistical Hypothesis - Rejection vs. Non-Rejection


○​ Reject H₀: If the sample evidence strongly contradicts it.
○​ Do not reject H₀: If evidence is insufficient to reject it.
2.​ Level of Significance (α)
○​ Common values: 0.05 (5%) or 0.01 (1%).
○​ If p-value < α, reject H₀.
○​ If p-value ≥ α, do not reject H₀.
3.​ Decision Rule for T-Test
○​ For Positive T-stat or P-value:
i.​ If T-stat > T-critical or P-value > Alpha → Reject the Null Hypothesis
ii.​ If T-stat ≤ T-critical or P-value ≤ Alpha → Do Not Reject the Null
Hypothesis
○​ For Negative T-stat or P-value:
i.​
ii.​ If -T-stat < -T-critical or P-value < Alpha → Reject the Null Hypothesis
iii.​ If -T-stat ≥ -T-critical or P-value ≥ Alpha → Do Not Reject the Null
Hypothesis
4.​ Conclusion
○​ If the test shows significance, the alternative hypothesis is
accepted.
○​ If not, the null hypothesis is not rejected.

OTHER IMPORTANT NOT IN THE POINTERS:


FREQUENCY
In statistics, frequency is the number of times a value appears in a set of data. It can
also be described as the number of times an event occurs in an experiment or study.

How is frequency expressed?


• The number of times a value occurs
• The number of times a value occurs compared to the total number of values
• Ways to express relative frequencies

For example, in our exercise:


* The number of times male or females appears in our data set.
* The number of times job experience of below 5 years or 5 years and above occur in
our data set.
* The number of times very good, good, fair, or poor job performance occur in our data
set.
Other examples:
• Counting how many students scored a particular mark on a test
• Counting how many people own dogs in a survey

Importance of Frequency in Statistics


In statistics, "frequency" is important because it tells you how often a specific value
appears within a dataset, allowing you to understand the distribution of data, identify
patterns, and visualize how frequently certain values occur, which is crucial for
interpreting and analyzing data effectively; essentially, it provides a clear picture of
which data points are most common and which are less frequent within a set.

Key points about frequency in statistics:


• Frequency distribution: A table or graph that displays the frequency of each unique
value in a dataset, providing a visual representation of data distribution.
• Frequency table: A table that shows how often an event happened
• Understanding patterns: By looking at the frequency of different values, researchers
can identify trends, outliers, and areas of concentration within the data.
• Data Analysis Applications: Frequency analysis is used in various fields like market
research, quality control, social science, and healthcare to analyze customer behavior,
product defects, demographic trends, and more.
• Relative Frequency: The proportion of times a value appears within a dataset, often
expressed as a percentage, which allows for comparison between different data sets
even if the total sample sizes vary.

Elements in Hypothesis Testing (Steps) for T-tests


1. Ho (null hypothesis)
H1 (alternative hypothesis)
2. α = level of significance
3. T statistic (T-value) = calculated T
4. Rejection region = t-critical or tabulated t
5. Compare: T > t or -T < -t ; P < α
6. Conclusion: Reject or Do not reject Ho
7. Insights: Interpretation of results, added with related literature (APA style, 7th edition)

Explore the following concepts:


1. statistical inference
2. tests of hypothesis
3. statistical hypothesis - rejection vs. non-rejection of hypothesis
4. level of significance (alpha)
5. one-tailed test vs. two-tailed test
6. independent variable and dependent variable in a schema
7. statement of the problem

You might also like