Basic Stats
Basic Stats
1. Terminology:
01
classified into two types: qualitative data (non-numerical) and quantitative data
(numerical).
0/
● Variables: Variables are characteristics or attributes that can take on different values. They
60
can be classified as qualitative variables (nominal or ordinal) and quantitative variables
(discrete or continuous).
24
● Frequency Distribution: A frequency distribution is a tabular representation of data that
08
shows the number of times each value occurs. It helps in understanding the pattern and
distribution of data.
0
81
● Measures of Central Tendency: Measures of central tendency are used to determine the
centre or average of a set of data. The three commonly used measures are mean, median,
and mode.
||
● Measures of Dispersion: Measures of dispersion provide information about the spread or
in
variability of data. The commonly used measures are range, quartiles, variance, and
e.
standard deviation.
r
su
● Random Variables: Random variables are variables whose values depend on the outcome
of a random event. They can be classified as discrete random variables or continuous
@
random variables.
ct
● Correlation: Correlation measures the strength and direction of the linear relationship
between two variables. It is represented by the correlation coefficient, which ranges from
-1 to 1.
01
Measures of central tendency is the process of describing a complete data set by using
a central value of that data set. The three commonly used measures of central tendency are the
0/
mean, median, and mode. Let's explore each of them in more detail:
60
A) MEAN: It is the ratio of the sum of the values of the items in a series to the total amount of
24
data.
08
i) Arithmetic Mean: The arithmetic mean, often referred to as the average, is the most
commonly used measure of central tendency. It is calculated by summing up all the values
0
in a dataset and dividing the sum by the total number of values.
81
||
in
e.
ii) Geometric Mean: The geometric mean is used when dealing with quantities that are
r
calculated by taking the nth root of the product of n values in a dataset, where n represents
the total number of values. Consider, if x1, x2 …. Xn are the observation, then the G.M is
u
defined as:
ed
@
a ct
nt
co
iii) Harmonic Mean: The harmonic mean is used when dealing with rates, ratios, or
average speeds. It is the reciprocal of the arithmetic mean of the reciprocals of the values in
the dataset. If we have a set of observations given by x1, x2, x3....xn. The reciprocal terms of
this data set will be 1/x1, 1/x2, 1/x3....1/xn. Thus, the harmonic mean formula is given by
01
0/
60
24
Relation between AM, GM and HM:
The products of the harmonic mean (HM) and the arithmetic mean (AM) will always be
08
equal to the square of the geometric mean (GM) of the given data set.
GM^2 = HM × AM.
0
81
Also, HM ≤ GM ≤ AM.
||
Note the following:
The arithmetic mean is used when the data values have the same units.
in
The geometric mean is used when the data set values have differing units.
When the values are expressed in rates we use harmonic mean.
r e.
su
B) MEDIAN: The value of the middle-most observation obtained after arranging the data in
ascending order is called the median of the data.
u
ed
i) Median Formula when ‘n’ is odd: The median formula of a given set of numbers, say
having 'n' odd number of observations, can be expressed as
t @
ac
nt
ii) Median Formula when ‘n’ is even: The median formula of a given set of numbers say
co
01
C) MODE: In statistics, the mode formula is used to calculate the mode or modal value of a
0/
given set of data. It is defined as the value that is repeatedly occurring in a given set. That
means, the value or number in a data set, which has a high frequency or appears more
60
frequently is called mode or modal value. It can be expressed as:
24
0 08
All measures of central tendency as above changes by same amount as the change of origin
and also changes in the same ratio due to a change in scale i.e. if y=(x-a)/b then Cx = b.Cy+a
3. Measures of Dispersion: 81
||
The dispersion or scatter in the data is measured based on the observations and the types of the
in
● Range: Given a data set, the range can be defined as the difference between the maximum
r
H-S
where H is the largest value and S is the smallest value in a data set.
u
ed
● Variance: The average squared deviation from the mean of the given data set is known as
the variance. This measure of dispersion checks the spread of the data about the mean. It
@
can be expressed as
t
ac
nt
co
01
0/
60
24
08
● Standard Deviation: The square root of the variance gives the standard deviation. Thus, the
standard deviation also measures the variation of the data about the mean. It can be
0
expressed as
81
||
in
where
co
● Mean Deviation: The mean deviation gives the average of the data's absolute deviation
about the central points. These central points could be the mean, median, or mode. It can be
expressed as
01
0/
60
24
Mean Deviation is least when it is measured around Median.
08
● Quartile Deviation: Quartile deviation can be defined as half of the difference between the
third quartile and the first quartile in a given data set. It can be expressed as
0
81
All measures of dispersion as above remains unchanged due to a change of origin but
changes in the same ratio due to a change in scale i.e. if y=(a-x)/b then Dx = b.Dy
||
in
4. Probability:
r e.
Probability can be defined as the ratio of the number of favourable outcomes to the total number
su
of outcomes of an event.
u
ed
t @
ac
The following terms in probability theory help in a better understanding of the concepts of
nt
probability.
co
❖ Favorable Outcome: An event that has produced the desired result or expected event is
called a favourable outcome. For example, when we roll two dice, the possible/favourable
outcomes of getting the sum of numbers on the two dice as 4 are (1,3), (2,2), and (3,1).
❖ Trial: A trial denotes doing a random experiment.
❖ Random Experiment: An experiment that has a well-defined set of outcomes is called a
random experiment. For example, when we toss a coin, we know that we would get ahead
01
or tail, but we are not sure which one will appear.
❖ Event: The total number of outcomes of a random experiment is called an event.
0/
❖ Equally Likely Events: Events that have the same chances or probability of occurring are
called equally likely events. The outcome of one event is independent of the other. For
60
example, when we toss a coin, there are equal chances of getting a head or a tail.
❖ Exhaustive Events: When the set of all outcomes of an event is equal to the sample space, we
24
call it an exhaustive event.
❖ Mutually Exclusive Events: Events that cannot happen simultaneously are called mutually
08
exclusive events. For example, the climate can be either hot or cold. We cannot experience
the same weather simultaneously.
0
81
NOTE: Rules for Probability:
● The maximum probability of an event is its sample space (sample space is the total
r
same time) then the probability of A or B occurring is the probability of A plus the
t
ac
probability of B.
nt
01
SIS 2018 Q14, 16
SIS 2019 Q28,30
SIS 2020 Q15, 26, 34
0/
SIS 2021 Q48
60
DSE 2001 Q30
2021 Q14, 47, 48
24
2022 Q46
08
CUET 2019 Q63, 96
2020 Q72
0
2021 Q77,18
81
2022 Q21
Hyderabad 2017 Q8
r
2021 Q21
ed
t @
ac
nt
co