0% found this document useful (0 votes)
7 views5 pages

STATS 201

The document provides an overview of various statistical concepts, including data types (qualitative and quantitative), frequency distribution, graphical representation, measures of central tendency, binomial distribution, hypothesis testing, correlation, and regression. Each section defines key terms, explains types and methods, and discusses advantages and limitations. The content serves as a foundational guide for understanding statistical analysis and its applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views5 pages

STATS 201

The document provides an overview of various statistical concepts, including data types (qualitative and quantitative), frequency distribution, graphical representation, measures of central tendency, binomial distribution, hypothesis testing, correlation, and regression. Each section defines key terms, explains types and methods, and discusses advantages and limitations. The content serves as a foundational guide for understanding statistical analysis and its applications.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

1.

Data and Its Types

●​ Definition: Data are facts, figures, or information collected, analyzed, and


interpreted.
●​ Types:
○​ Qualitative (Categorical): Descriptive, non-numerical.
■​ Nominal: Categories with no inherent order (e.g., blood groups A, B, O).
■​ Ordinal: Categories with a meaningful order but unequal intervals (e.g.,
pain scale: mild, moderate, severe).
○​ Quantitative (Numerical): Measurable, numerical.
■​ Discrete: Countable, usually whole numbers (e.g., number of patients).
■​ Continuous: Can take any value within a range (e.g., height, weight,
temperature).

2. Frequency Distribution
●​ Definition: A table or graph that shows the number of times each value or group
of values occurs in a dataset.
●​ Types:
○​ Ungrouped: Lists each distinct value and its frequency.
○​ Grouped: Organizes data into class intervals and shows the frequency within
each interval. Includes class limits, class boundaries, class width, and class
midpoint.
○​ Relative Frequency: Proportion of observations in each category/interval
(frequency / total number of observations).
○​ Cumulative Frequency: Total number of observations up to and including a
specific category/interval.

3. Graphical Representation
●​ Purpose: To visually summarize and present data, making patterns and trends
easier to understand.
●​ Types:
○​ Categorical Data:
■​ Bar Chart: Compares frequencies of different categories (bars don't
touch).
■​ Pie Chart: Shows proportions of different categories as slices of a circle.
○​ Numerical Data:
■​ Histogram: Represents the frequency distribution of continuous data (bars
touch).
■​ Frequency Polygon: Line graph connecting the midpoints of the tops of
histogram bars.
■​ Ogive (Cumulative Frequency Curve): Line graph showing cumulative
frequencies.
■​ Scatter Plot: Shows the relationship between two quantitative variables.
■​ Box Plot (Box and Whisker Plot): Displays the distribution of data based on
quartiles, median, and outliers.

4. Measures of Central Tendencies: Mean, Median, and Mode


●​ Definition: Single values that attempt to describe the "center" or "typical" value
of a dataset.
○​ Mean (Average): Sum of all values divided by the number of values (xˉ=n∑xi​​).
■​ Advantages: Uses all data points, easy to calculate.
■​ Disadvantages: Sensitive to outliers.
○​ Median (Middle Value): The middle value when data is arranged in order. For
even number of data points, it's the average of the two middle values.
■​ Advantages: Not affected by outliers.
■​ Disadvantages: Doesn't use all data points.
○​ Mode (Most Frequent Value): The value that appears most often in the
dataset. A dataset can have no mode, one mode (unimodal), or multiple
modes (bimodal, trimodal, etc.).
■​ Advantages: Easy to identify, useful for categorical data.
■​ Disadvantages: Not always unique, may not represent the center well.

5. Binomial Distribution
●​ Definition: A discrete probability distribution that describes the probability of
obtaining a certain number of successes in a fixed number of independent
Bernoulli trials (experiments with only two possible outcomes: success or failure).1
●​ Conditions (BINS):
○​ Binary outcome (success or failure).
○​ Independent trials.
○​ Number of trials is fixed (n).
○​ Same probability of success (p) for each trial.
●​ Probability Mass Function: P(X=k)=(kn​)pk(1−p)n−k, where:
○​ n = number of trials
○​ k = number of successes
○​ p = probability of success in a single trial
○​ (kn​)=k!(n−k)!n!​(binomial coefficient)
●​ Mean: μ=np
●​ Variance: σ2=np(1−p)

6. Hypothesis Testing
●​ Definition: A formal procedure used to determine whether there is enough
statistical evidence to reject a null hypothesis in favor of an alternative
hypothesis.2
●​ Hypotheses:
○​ Null Hypothesis (H0​): A statement of no effect or no difference (the status
quo).
○​ Alternative Hypothesis (H1​or Ha​): A statement that contradicts the null
hypothesis (what the researcher wants to find evidence for). Can be
one-tailed (directional) or two-tailed (non-directional).
●​ Types of Errors:
○​ Type I Error (False Positive, α): Rejecting a true null hypothesis. The
probability of making a Type I error is the significance level (α).
○​ Type II Error (False Negative, β): Failing to reject a false null hypothesis. The
power of a test is 1−β, the probability of correctly rejecting a false null
hypothesis.
●​ Significance Level (α): The probability of rejecting the null hypothesis when it is
true (commonly 0.05).
●​ P-value: The probability of obtaining test results at least as extreme as the
observed results, assuming the null hypothesis is true.3 If the p-value is less than4
α, we reject H0​.
●​ Parametric Tests: Statistical tests that assume the data follows a specific
distribution (usually normal) and make assumptions about population parameters.
Examples: t-tests, ANOVA, Pearson correlation.
●​ Non-Parametric Tests: Statistical tests that do not rely on specific distributional
assumptions. Used when data is not normally distributed or is ordinal/nominal.
Examples: Chi-square tests, Mann-Whitney U test, Kruskal-Wallis test, Spearman
correlation.

7. Correlation
●​ Definition: A statistical measure that describes the extent to which two or more
variables fluctuate together. It indicates the strength and direction of a linear
relationship.
●​ Types:
○​ Positive Correlation: Both variables increase or decrease together (e.g.,
height and weight).
○​ Negative Correlation: As one variable increases, the other decreases (e.g.,
study time and exam anxiety).
○​ No Correlation: No linear relationship between the variables.
●​ Measures:
○​ Pearson Correlation Coefficient (r): Measures the strength and direction of
a linear relationship between two continuous variables.5 Ranges from -1 to +1.
■​ r=+1: Perfect positive correlation
■​ r=−1: Perfect negative correlation
■​ r=0: No linear correlation
○​ Spearman's Rank Correlation Coefficient (ρ): Measures the strength and
direction of a monotonic relationship (not necessarily linear) between two
ordinal or continuous variables after ranking them.
●​ Advantages: Helps identify relationships between variables, useful for prediction
(in conjunction with regression).
●​ Importance: Provides insights into how variables are associated, guides further
research.
●​ Limitations: Correlation does not imply causation! Can be affected by outliers.
Only measures linear (Pearson) or monotonic (Spearman) relationships.

8. Regression
●​ Definition: A statistical method used to model the relationship between a
dependent variable (outcome) and one or more independent variables
(predictors).6 It aims to predict the value of the dependent variable based on the
values of the independent variables.7
●​ Types:
○​ Simple Linear Regression: One independent variable predicts a dependent
variable. The model is a straight line: Y=a+bX, where:
■​ Y = dependent variable
■​ X = independent variable
■​ a = y-intercept (value of Y when X=0)
■​ b = slope (change in Y for a one-unit change in X)
○​ Multiple Linear Regression: Two or more independent variables predict a
dependent variable.
●​ Purpose: Prediction, explanation of relationships between variables.
●​ Assumptions of Linear Regression: Linearity, independence of errors,
homoscedasticity (constant variance of errors), normality of errors.
●​ R-squared (R2): Coefficient of determination, represents the proportion of the
variance in the dependent variable that is predictable from the independent8
variable(s). Ranges from 0 to 1.

You might also like