Statistic Interview Questions and Answers by Jeevan Raj
Statistic Interview Questions and Answers by Jeevan Raj
1. Introduction to Statistics
Statistics is a Branch of Mathematics which deals with the collection, analysis,
interpretation & Presentation of data
One can drive meaningful insights and trend in the data by applying various statistical
method on it
2. Why Statistics
Statistics is important for making Investment, Prediction & Decisions
It can be a Powerful tool when performing the act of the data science
It is used to solve complex real time problems.
3. What is Population & Sample
Population: Population is a set of all the object or Individuals, considered for study,
It may contains finite or infinite elements
Sample: Sample is the subset of population, which is used as the representative of
the population
# Before understanding Sampling we want to understand what is Biased Samples
Biased Samples:- Biased samples occur when one or more parts of the population
are favored over others.
◼ There are 2 types of Biased Samples
1. Convenience Sample
2. Voluntary Response Sample
Convenience Sample:- Only includes people who are easy to reach this is more
biased way of making survey, Not everyone in the population have an equal
chance of being part of the sample only people who are convenience to the
researcher will be interviewed
Statistical Interview Questions By Jeevan Raj
Voluntary Response Sample :- Voluntary response sample consists of people that
have chosen to include themselves, In this sample the researcher lets people come
to him this is also biased sampling method
An example of a simple random sample would be the names of 25 employees being chosen out
of 250 employees. In this case, the population is all 250 employees, and the sample is random
because each employee has an equal chance of being chosen.
Statistical Analysis
Mean:-
• It is one of the measures of central tendency. It is the most commonly used
measure
• Defined as the sum of all the observations divided by the total number of
observations
Merits:-
• It is based on all the observations
Demerits:-
• It can not be calculated for the categorical data
• It is affected by extreme observations (outliers)
Statistical Interview Questions By Jeevan Raj
MEDIAN:-
• It is the middlemost observation when the data is arranged in the ascending
or descending order
• If the number of observations in the data is even, the median is average of
two middlemost values
• If the number of observations in the data is odd, the median is the
middlemost observation
• The median divides the data into two equal parts. i.e., there will be an equal
number of observations below and above the median
Merits:
• It is not affected by the presence of outliers
Demerits:
• It is not based on all observations; thus, it is not a very good representative
Mode:-
• It is the observation in the data with highest frequency. i.e. the most
occurring observation
• The data can have more than one mode
• The group of observations with one mode is called as unimodal
• The group of observations with two modes is called as bimodal
• The group of observations with more than two modes is called as
multimodal
Merits:
• It is not affected by the presence of outliers
• It can be calculated for numerical as well as categorical data
Demerits:
• It is not based on all observations; thus, it is not a very good representative
Statistical Interview Questions By Jeevan Raj
Quantiles:-
• These are the values that divides the dataset into equal parts
• Quantiles are also called as ‘partition values’
• Median divides the dataset into two equal parts
• Quartiles divides the dataset into four equal parts, Deciles divides the dataset
into 10 equal parts and Percentile divides the dataset into 100 equal parts
Quartiles:-
Deciles:-
Measure of Dispersion:-
● It is a technique that summarizes the variation in the data points
● If the dispersion is less for the set of observations then the reliability of the
measures of central tendency is more
● Range, variance, standard deviation are some of the measures of dispersion
Range
● It is defined as the difference between the largest and smallest observation
i.e. Range = Xn - X1
Where Xn is the largest value and X1 is the smallest value in the dataset
● It depends only on the two extreme observations in the data
● It does not provide a proper idea about the variation in the data
Statistical Interview Questions By Jeevan Raj
7. What is Variance ?
● It measures the spread of the observations from its mean
● It is defined as the arithmetic mean of squares of deviations calculated from
the mean. It is given by the formula:
Note:-
The Higher the variance, the more the data is spread out.
Variance
Total number of
observations
Sk = 0 Distribution is symmetric.
Where,
X, Y = Mean of X and Y resp.
Xi, Yi = Elements in X and Y
n = Number of observations
• The Normal distribution is the proper term for a probability bell curve.
• In a Standard Normal Distribution the Mean is Zero and Standard Deviation
=1
• It has zero skew and kurtosis of 3
• Normal Distribution are symmetric, but not all symmetric distribution are
normal distribution.
• The Standard normal distribution has 2 Parameters:-
(i) Mean
(ii) Standard Deviation
For the Normal Distribution the Empirical formula state that 68% of observation
are with in 1st SD of the Mean and 95% are with in Second Standard Deviation and
99.7% are with in 3rd Standard Deviation
Statistical Interview Questions By Jeevan Raj
T-Test Z-Test
T-Test refers to a Type of Parametric Test, That is
Z-Test implies a hypothesis Test, If the Mean of
applied to identify, How the mean of 2-Set of
2 data set are different from each other
data differ from one another when variance is
when variance is given
not given
Permutation Combination
Permutation refers to the different ways of Combination refers to different ways of
arranging a set of objects in a sequential order is choosing or selecting items from a large set of
known as Permutation objects, such that their order doesn't matter.
7. If you have skewed data how do you make them normal distribution?
We can make the skewed data into normal distribution by using the data
transformation techniques such as log transformation, square root transformation
etc.,
Note:-
If it is Right Skewed then we want to use square root, cube root, log transformation
If it is Left Skewed then we go for square, Exponential Transformation.
Central Limit Theorem states that, suppose we have a population with mean( ) μ
and standard deviation (σ), which are IIND [Identically Independently Normally
Distributed] and if we take random samples from the population with replacement,
then the distribution of sample means will be approximately normally distributed
So, to continue with the coin flip example: Bernoulli distribution gives you the
probability of “success” (say, landing on heads) when flipping the coin just once
(that’s your Bernoulli trial). If you flip the coin five times, binomial distribution
will calculate the probability of success (landing on heads) across all five coin
flips.