0% found this document useful (0 votes)
20 views

Statistic Interview Questions and Answers by Jeevan Raj

Statistics interview questions

Uploaded by

Shashank Achar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Statistic Interview Questions and Answers by Jeevan Raj

Statistics interview questions

Uploaded by

Shashank Achar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Statistical Interview Questions By Jeevan Raj

1. Introduction to Statistics
Statistics is a Branch of Mathematics which deals with the collection, analysis,
interpretation & Presentation of data
One can drive meaningful insights and trend in the data by applying various statistical
method on it
2. Why Statistics
Statistics is important for making Investment, Prediction & Decisions
It can be a Powerful tool when performing the act of the data science
It is used to solve complex real time problems.
3. What is Population & Sample
Population: Population is a set of all the object or Individuals, considered for study,
It may contains finite or infinite elements
Sample: Sample is the subset of population, which is used as the representative of
the population
# Before understanding Sampling we want to understand what is Biased Samples
Biased Samples:- Biased samples occur when one or more parts of the population
are favored over others.
◼ There are 2 types of Biased Samples
1. Convenience Sample
2. Voluntary Response Sample
Convenience Sample:- Only includes people who are easy to reach this is more
biased way of making survey, Not everyone in the population have an equal
chance of being part of the sample only people who are convenience to the
researcher will be interviewed
Statistical Interview Questions By Jeevan Raj
Voluntary Response Sample :- Voluntary response sample consists of people that
have chosen to include themselves, In this sample the researcher lets people come
to him this is also biased sampling method

So to overcome this bias in the sampling we came up with different sampling


techniques
--- There are mainly 2 different type of sampling techniques
◼ Probability Sampling
◼ Non Probability Sampling
Probability Sampling :- Probability sampling is based on the fact that every
member of a population has a equal number of chance of being selected.
Non Probability Sampling :- Non probability sampling involves non random
selection based on convivence, and not every individual has a chance of being
included
Probability Sampling has 4 types:-
1. Simple Random Sampling
2. Systematic Sampling
3. Cluster Sampling
4. Stratified Sampling
Statistical Interview Questions By Jeevan Raj
Simple Random Sampling:- A probability sample in which every member of a
study population has an equal chance of selection.

An example of a simple random sample would be the names of 25 employees being chosen out
of 250 employees. In this case, the population is all 250 employees, and the sample is random
because each employee has an equal chance of being chosen.

Systematic Sampling:- In systematic sampling the first element is selected


randomly from a list or from sequential and then every nth element is selected

Cluster Sampling :- Cluster sampling involves randomly selecting clusters of


elements from population and subsequently selecting every element in a each
selected cluster
Statistical Interview Questions By Jeevan Raj
Stratified Sampling:- Stratified Sampling is a procedure that involves dividing the
population in groups or strata defined by the presence of certain characteristics and
then random sampling from each stratum

The difference between cluster sampling and stratified sampling is that, In


Stratified sampling sample includes each element in different stratum but in cluster
sampling include element only from one selected cluster.

4. Random Variable (x):-


A random variable is a variable whose possible value is the outcome of a random
experiment
A random variable is generally denoted by x
There are 2-types of Random variable
1. Discrete Random Variable
2. Continuous Random Variable
5. Discrete Random Variable [Finite]
It is a variable that takes countably finite number of discrete values
Ex:-
i. No of Pages in Books
ii. No of tyres produced in a factory in a particular day
iii. No of vehicle passing through security gate in a hour
Statistical Interview Questions By Jeevan Raj
6. Continuous Random Variable [Infinite]
It is a variable that takes all the infinite number of values with in a range
Ex:-
i. Height of a person
ii. Weight of a person
7. Type of Analysis
There are 2 types of Analysis
i. Descriptive Analysis
ii. Inferential Analysis

Statistical Analysis

1. Used to describe the sample or summarize 1. Used to make conclusions and


the insights about the sample generalization about the Population

2. It is also called as descriptive statistics 2. It is also called as Inferential statistics

3. Types of Descriptive analysis measures are: 3. Types of inferential Analysis

(i) Measure of Central Tendency (i) Linear Regression Analysis

(ii) Measure of Dispersion (ii) Analysis of Variance

(iii) Skewness & Kurtosis (iii) Confidence Interval

(iv) Covariance & Correlation

8. Measure of Central Tendency:-


The central position of the data can be distinguished by the measure of central
tendency
There are different measure of central tendency
i. Mean
ii. Median
iii. Mode
Statistical Interview Questions By Jeevan Raj
iv. Quantiles
Mean
Median
Mode

Mean:-
• It is one of the measures of central tendency. It is the most commonly used
measure
• Defined as the sum of all the observations divided by the total number of
observations

where N is the total number of observations

Merits:-
• It is based on all the observations
Demerits:-
• It can not be calculated for the categorical data
• It is affected by extreme observations (outliers)
Statistical Interview Questions By Jeevan Raj
MEDIAN:-
• It is the middlemost observation when the data is arranged in the ascending
or descending order
• If the number of observations in the data is even, the median is average of
two middlemost values
• If the number of observations in the data is odd, the median is the
middlemost observation
• The median divides the data into two equal parts. i.e., there will be an equal
number of observations below and above the median
Merits:
• It is not affected by the presence of outliers
Demerits:
• It is not based on all observations; thus, it is not a very good representative

Mode:-
• It is the observation in the data with highest frequency. i.e. the most
occurring observation
• The data can have more than one mode
• The group of observations with one mode is called as unimodal
• The group of observations with two modes is called as bimodal
• The group of observations with more than two modes is called as
multimodal
Merits:
• It is not affected by the presence of outliers
• It can be calculated for numerical as well as categorical data
Demerits:
• It is not based on all observations; thus, it is not a very good representative
Statistical Interview Questions By Jeevan Raj

Quantiles:-
• These are the values that divides the dataset into equal parts
• Quantiles are also called as ‘partition values’
• Median divides the dataset into two equal parts
• Quartiles divides the dataset into four equal parts, Deciles divides the dataset
into 10 equal parts and Percentile divides the dataset into 100 equal parts
Quartiles:-

• Quartiles divide the dataset into four equal parts


• The first quartile (Q1) divides the data such that, 25% of the data is below
Q1 and 75% above it
• The second quartile (Q2) is the median as it divides the data in two equal
halves
• The third quartile (Q3) divides the data such that, 75% of the data lies below
Q3 and 25% above it

Deciles:-

● Deciles divide the dataset into ten equal parts


● The first decile (D1) divides the data such that, 10% of the data is below D1
and 90% above it
Statistical Interview Questions By Jeevan Raj
● The fifth decile (D5) divides the data into two equal parts;
i.e. D5 = median = Q2
PERCENTILES:-
● Percentiles divide the dataset into 100 equal parts
● The first percentile (P1) divides the data such that, 1% of the data is below
P1 and 99% above it
● The 25th percentile (P25) divides the data such that, 25% of the data is below
P25 and 75% above it; this is equal to the first quartile (Q1)
● The 50th percentile (P50) divides the data into two equal part
i.e. P50 = D5 = Q2 = Median
● The 75th percentile (P75) divides the data such that, 75% of the data is below
P75 and 25% above it; this is equal to the third quartile (Q3)

Measure of Dispersion:-
● It is a technique that summarizes the variation in the data points
● If the dispersion is less for the set of observations then the reliability of the
measures of central tendency is more
● Range, variance, standard deviation are some of the measures of dispersion

Range
● It is defined as the difference between the largest and smallest observation
i.e. Range = Xn - X1
Where Xn is the largest value and X1 is the smallest value in the dataset
● It depends only on the two extreme observations in the data
● It does not provide a proper idea about the variation in the data
Statistical Interview Questions By Jeevan Raj
7. What is Variance ?
● It measures the spread of the observations from its mean
● It is defined as the arithmetic mean of squares of deviations calculated from
the mean. It is given by the formula:

Note:-
The Higher the variance, the more the data is spread out.

The value of the ith


element
Mean of all the
observations

Variance

Total number of
observations

8. What is Standard Deviation ?


It is obtained by taking the positive square root of the variance
Note:- Standard deviation tells about how the observations are spread from the
mean.

9. What is Interquartile Range (IQR)


It is defined as the difference between the third and the first quartile
It is given as:
Statistical Interview Questions By Jeevan Raj
IQR returns the range of middle 50% of the data It is used to detect the outliers in
the data
Usually we consider an observation to be an outlier if it is outside the interval
(Q1 - 1.5*IQR, Q3 + 1.5*IQR)

10. What is Skewness


● It is defined as the lack of symmetry or departure from the symmetry
● The distribution is said to be skewed if it is elongated on either side
● The distribution of the data is right-skewed if it is elongated on the right side
● The distribution of the data is left-skewed if it is elongated on the left side

Coefficient of skewness Interpretation

Sk < 0 Distribution is negatively skewed.

Sk = 0 Distribution is symmetric.

Sk > 0 Distribution is positively skewed.


Statistical Interview Questions By Jeevan Raj
11. What is Kurtosis
● It is the measure of tailedness or Peakness of the distribution
● It is a measure that defines how the tail of the distribution varies from
normal distribution

Value Thickness of Tails Interpretation

Kurtosis < 0 Thin Distribution is platykurtic

Kurtosis = 0 Normal Distribution is mesokurtic

Kurtosis > 0 Thick Distribution is leptokurtic


Statistical Interview Questions By Jeevan Raj
12. What is Covariance
It measures how the two variables vary together
It is calculated as:
● The magnitude of the covariance is not easy to interpret as it is not
normalized, the normalized version of covariance is often calculated which
is known as the ‘correlation coefficient’
● The covariance only provides the direction of the relationship

Where,
X, Y = Mean of X and Y resp.
Xi, Yi = Elements in X and Y
n = Number of observations

13. What is Correlation ?


It measures the degree to which two variables increase or decrease in parallel
It is the normalized form of covariance. It ranges between -1 to +1
It is calculated as:

--------------------------------- Basics are clear ----------------------------------------


Statistical Interview Questions By Jeevan Raj
Very important interview questions
1. What is Normal Distribution?
The Normal Distribution are also known as gaussian distribution, It is a Probability
distribution that is symmetric about the mean showing that data near mean are
more frequent in occurrence than data far from the mean. In Graphical form,
Normal Distribution will appear as a bell curve

• The Normal distribution is the proper term for a probability bell curve.
• In a Standard Normal Distribution the Mean is Zero and Standard Deviation
=1
• It has zero skew and kurtosis of 3
• Normal Distribution are symmetric, but not all symmetric distribution are
normal distribution.
• The Standard normal distribution has 2 Parameters:-
(i) Mean
(ii) Standard Deviation
For the Normal Distribution the Empirical formula state that 68% of observation
are with in 1st SD of the Mean and 95% are with in Second Standard Deviation and
99.7% are with in 3rd Standard Deviation
Statistical Interview Questions By Jeevan Raj

2. When to Z-Test & T-Test


Condition for using T-Test
1. Population Standard Deviation (σ)
2. Sample size (n) is less than 30

3. Difference Between T-Test & Z-Test

T-Test Z-Test
T-Test refers to a Type of Parametric Test, That is
Z-Test implies a hypothesis Test, If the Mean of
applied to identify, How the mean of 2-Set of
2 data set are different from each other
data differ from one another when variance is
when variance is given
not given

Based on Student T-Distribution Based on Normal Distribution


Where Population Variance is Unknown Where Population variance is Known
Sample Size is Small Sample size is Large
'n<=30' 'n>30'
Statistical Interview Questions By Jeevan Raj
4. What is the Difference Between Permutation and combination?

Permutation Combination
Permutation refers to the different ways of Combination refers to different ways of
arranging a set of objects in a sequential order is choosing or selecting items from a large set of
known as Permutation objects, such that their order doesn't matter.

nPr = n!/(n-r)! nCr = n!/[r! (n-r)!]

5. What is Confidence Interval?


Confidence Interval is a range of values, So there is a specified probability that the
value of a parameter lies within it.

6. What is P-Value & Mention its Significance?


• P-Value is a Probability value, A P-Value measures the probability of
obtaining the observed result, By Assuming that the null Hypothesis is True.
• The Lower the P-Value, The greater the statistical significance of the
observed difference
• P-Value helps you to accept or reject the Null Hypothesis & Alternative
Hypothesis
• In General if the P-Value is < 0.05 or 5% of Level of Significance,
Then we reject the Null Hypothesis. If it is >0.05 or 5% of level of
significance then we fail to reject the Null Hypothesis
Statistical Interview Questions By Jeevan Raj

7. If you have skewed data how do you make them normal distribution?
We can make the skewed data into normal distribution by using the data
transformation techniques such as log transformation, square root transformation
etc.,
Note:-
If it is Right Skewed then we want to use square root, cube root, log transformation
If it is Left Skewed then we go for square, Exponential Transformation.

8. What is Central Limit Theorem?

Central Limit Theorem states that, suppose we have a population with mean( ) μ
and standard deviation (σ), which are IIND [Identically Independently Normally
Distributed] and if we take random samples from the population with replacement,
then the distribution of sample means will be approximately normally distributed

9. What General Conditions must be satisfied to hold the central limit


theorem.
1. The data must be sampled randomly
2. The sample values must be independent of each other
3. The sample size must be sufficiently large. Generally it should be greater or
equal to 30

10. What is Hypothesis Testing?


Hypothesis Testing is used to evaluate the Mutual Exclusive Statement on
Population using Sample
Statistical Interview Questions By Jeevan Raj

There are 2 Types:


1. Null Hypothesis (Ho)
2. Alternative Hypothesis (Ha)
Ho- There is No significant difference between the population mean ie., mu = muo
Ha – There is Significant difference between the population mean ie, mu != muo

11. What are the Steps to be fallowed in Hypothesis Testing?


1. Formulation of Null & Alternative Hypothesis
2. Specifying of Level of Significance
3. Selection of Test Statistics
4. Finding the Critical value from the Table, Using the Level of Significance &
Degree of freedom
5. Decision about Null Hypothesis & Alternative Hypothesis, Comparing the
significance of test statistics
6. Writing the conclusion based on the Problem Statement

12. What is the difference between Type1 and Type2 error?


They are Known as Decision Error in Statistics.
Statistical Interview Questions By Jeevan Raj
Type 1 Error:- Rejecting the Null Hypothesis when it’s True
Type 2 Error:- Fail to reject Null hypothesis when it’s False
Ex:-
In a Health care use case if we want to predict a person having cancer or not
Null Hypothesis (Ho) = Person Don’t have cancer
Alternative Hypothesis (Ha) = Person Having cancer
In Medical Use case we should reduce Type 2 Error
In Mail Spam detection we should reduce Type 1 Error

13. What is the Difference between PDF & PMF?


PDF:- Probability Density Function is used for continuous distribution
Ex:- Normal Distribution
PMF:- Probability Mass Function is used for Discrete Distribution
Ex:- Bernoulli Distribution, Binomial Distribution, Poisson Distribution

14. What is Bernoulli Distribution?


The Bernoulli distribution is a special case of the binomial distribution where a
single trial is conducted (so n would be 1 for such a binomial distribution).
Bernoulli distribution is a discrete probability distribution, meaning it’s
concerned with discrete random variables. A discrete random variable is one that
has a finite or countable number of possible values—the number of heads you get
when tossing three coins at once, or the number of students in a class.
Bernoulli distribution applies to events that have one trial and two possible
outcomes. These are known as Bernoulli trials. Think of any kind of experiment
that asks a yes or no question—for example, will this coin land on heads when I
flip it? Will I roll a six with this die? Will I pick an ace from this deck of cards?
Will voter X vote “yes” in a political referendum? Will student Y pass their Stats
Mock?
Statistical Interview Questions By Jeevan Raj

15. What is Binomial Distribution?


The binomial distribution is the discrete probability distribution that gives only two
possible results in an experiment, either success or failure.

1. There are 2 potential outcomes per trial (head or tail)


2. The probability of success (p) is the same across all trials
ex :- If you toss 10 coin the probability of getting head = 0.5 or 1/2
3. The number of trials(n) is fixed
4. Each trial is independent

16. Difference Between Bernoulli and Binomial Distribution?

A Bernoulli distribution is a type of binomial distribution. We know that Bernoulli


distribution applies to events that have one trial (n = 1) and two possible outcomes
for example, one coin flip (that’s the trial) and an outcome of either heads or tails.
When we have more than one trial—say, we flip a coin five times binomial
distribution gives the discrete probability distribution of the number of “successes”
in that sequence of independent coin flips (or trials).

So, to continue with the coin flip example: Bernoulli distribution gives you the
probability of “success” (say, landing on heads) when flipping the coin just once
(that’s your Bernoulli trial). If you flip the coin five times, binomial distribution
will calculate the probability of success (landing on heads) across all five coin
flips.

17. What is Poisson Distribution?


A Poisson distribution is a discrete probability distribution. It gives the probability
of an event happening a certain number of times (k) within a given interval of time
or space. The Poisson distribution has only one parameter, λ (lambda), which is the
mean number of events.
Statistical Interview Questions By Jeevan Raj

You might also like