Data Science Interview Preparation 1
Data Science Interview Preparation 1
First Edition
2. INTERVIEW QUESTIONS
2.1 Statistics interview Question
2.2 Regression analysis Questions
2.3 Logistic regression Questions
2.4 Machine learning Questions
2.5 Coding Questions
2.6 Guestiates Questions
2.7 Puzzle Questions
2.8 HR Questions.
3. MISCELLANEOUS
3.1 Tell me about yourself.
3.2 How to be motivated during
placement season?
Clearing a data science interview isn't a big deal if you
are well prepared. But the fact is that data science
borrows concepts from many fields and that's why you
need to have knowledge of all those fields to clear a
data science interview.
Technical Tests
Interviews
10-2-4 Rule
Also, appear for the online mock test that you can get
on the website given above.
Nitin Mukesh
While preparing for data science interviews, you
should take notes and also prepare from questions
that have already been asked in companies in
interviews.
Basic Statistic
Probability
Bayes Theorem
Mean, Median, Mode and its properties
Random variables
Skewness, Kurtosis
Expectations and Variance with properties
Correlation and Covariance
Nitin Mukesh
Statistical Distributions
Binomial distribution
Poisson Distribution
Geometric Distribution
Exponential Distributions
Normal Distributions
Uniform Distributions
Gamma and Beta Distributions
Hypothesis Testing
t-test
f-test
chi-square test
ANOVA
Interpretation of p-value
Central Limit Theorem
Recommended book:
Fundamental of Mathematical statistics
by Gupta and Kapoor
An Introduction to Probability and
statistics by Rohatgi and Saleh Nitin Mukesh
Machine learning Algorithms
Linear regression
Logistic regression
KNN
SVM
Decision Tree
LDA and QDA
Random Forest
Bagging
Boosting
PCA
Naive Bayes classifier
Ridge Regression
Lasso Regression
Elastic Net
K-Means clustering
Nitin Mukesh
Data Science Concepts
Bias-variance tradeoff
Precision and recall
ROC-AUC curve
Handling missing data
Outlier handling
Cross-validation
Overfitting and Underfitting
Regularization
SQL Concepts
Joins in SQL
Constraints in SQL
Primary Key and Foreign Key
Queries in SQL
TRUNCATE, DELETE and DROP
Use of Aggregate Functions
WHERE, HAVING and GROUP BY clause
Nitin Mukesh
Data Structure Concepts
Array
Linked List
Stack
Queue
Binary search tree
Binary Search
Sorting algorithms
Programming Concepts
Other concepts
Puzzles
Case study and Guestimates
Nitin Mukesh
If you are appearing for a data science profile, most
of the time, questions are based on the topics that
have been mentioned above.
Nitin Mukesh
Make a nice resume: A resume is very important
for any job interview. Resume-making can be a
tedious task, but a good resume is not made in a
single day.
Simplicity is key.
This way, even if you are not selected, now you have
some experience. That's how ML models also learn.
With more data, the models get better. Nitin Mukesh
Do your own research: If you are selected for an
interview, read about the company. Do your own
research. Find about what the company does, how are
their work environment, how is the working culture.
ASKED IN DATA
SCIENCE
INTERVIEWS
Commonly Asked Topics
Probability Theory
Sampling Theory
Hypothesis Testing
Distribution Theory
Time series Analysis
Design of Experiment
Nitin Mukesh
Prove E(x+y)=E(x) + E(y) (Delhivery)
What strong and weak stationarity? (ICICI
Securities)
Explain ARIMA forecasting step by step. (ICICI
Securities)
What is CLT, z-test, t-test, F test? (ICICI Securities)
When is a t-test used instead of a z-test? (ICICI
Securities)
What do you mean by Replication and
Randomization? (IPSOS MMA)
Trick question on the application of chi-square
test. (ICICI Securities)
Test of equality of proportions. (ICICI Securities)
Structure of a box plot. (ICICI Lombard)
What does correlation mean? What type of
relationship can be found by this? (ANZ Bank)
What is the importance of stationarity in a time
series? (ANZ Bank)
What is skewness in data and how will you remove
it? (LafargeHolcim)
Role of R squared, r, F test in determining features
in Machine learning.
What is stationary time series and why it is
important? (Cognizant)
Define Randomised Block design? Where do we
use RBD? (Cognizant) Nitin Mukesh
What is stratified random sampling and how it is
different from systematic random sampling?
(Cognizant)
What is the p-value? Explain with the help of an
example related to marketing research. (IPSOS
MMA)
What are the assumptions of ANOVA? (JSS, ANZ)
explain CRD (Completely Randomized Design)?
(IPSOS MMA)
What is AB testing? (ICICI Securities)
Consider two fair dice that have been rolled. What
is the probability of obtaining sum 8?(IPSOS MMA)
The lengths of a professor's classes have a
continuous uniform distribution between 50.0 min
and 52.0 min. If one such class is randomly
selected, find the probability that the class length
is more than 51.7 min. (64squares)
Difference between stratified and cluster sampling.
(JSS)
When we use censoring and what are type-1 and
type-2 censoring in Reliability. (JSS)
ANOVA is generally used when we want to test the
significance of the difference between more than
two independent sample means. Why we don’t use
pairwise t-test in this case? (JSS, Accenture)
Nitin Mukesh
What are type-1 error and p-value in inference?
(JSS, Accenture)
What do you mean by hypothesis testing? (Eli Lilly)
There are 3 ants sitting on three corners of a
triangle. All ants randomly pick a direction and
start moving along the edge of the triangle. What is
the probability that any two ants collide?
(Accenture, Delhivery)
You have 9 red balls and 1 black ball in a bag. You
are picking balls 10 times with replacement. What
is the probability of getting at least one black ball
from the bag? (Accenture)
Explain Central Limit Theorem. (ANZ, Wipro)
What is the normal distribution and what is the
shape of the curve? (WNS Global)
What is the relationship between media, mode and
mean in positively skewed data? (ANZ)
What are statistic and data science? (ANZ)
You have to estimate the average height of the
population of India, how will you go about it? (ICICI
Lombard)
What is the use of binomial distribution and its
PMF? (Wipro)
What do you mean by selection bias? (Wipro)
ASKED IN REAL
INTERVIEWS
1.What are the assumptions of linear regression?
(Delhivery, ANZ bank, Citi Bank, Accenture )
2. What is the meaning of multicollinearity? (ANZ,
Amazon)
3. How to detect multicollinearity? (Amazon,
Delhivery)
4. What do you understand by VIF (Variance Inflation
Factor)? (Amazon)
5. What is the difference between R-squared and
adjusted R-squared? (Delhivery, ANZ bank, Citi
Bank, Accenture)
6. How to deal with multicollinearity in data? (Citi,
Accenture)
7. Explain forward and backward elimination?
(Accenture)
8. How PCA works? (ICICI securities, Amazon,
Miko.ai)
9. Explain Ridge and Lasso Regression? (Delhivery,
ANZ bank, Citi Bank, Accenture, Amazon)
10. Can SVM be used for regression? (Miko.ai)
11. What is the curse of dimensionality? Can you give
an example?
12. What is the difference between the coefficient of
determination, and coefficient of correlation?
13. Give methods of variable selection in Regression
Analysis? (Delhivery, ANZ bank, ICICI securities)
14. Why do we perform the residual analysis? (ANZ)
15. What are L1 and L2 penalization? (Miko.ai)
16. What is heteroscadasticity? How does it affect the
regression coefficients? (ANZ)
17. Why does only VIF>10 implies that there is
multicollinearity, why not choose vif>8? (IDFC First
Bank)
18. In my dataset, if I have 100 observations and 1500
features, do you think whether I would be able to fit
the regression model onto that or not? (IDFC First
Bank)
19. For a single variable, how will you detect outliers?
(ICICI Lombard)
20. How correlation between two variables will
change in presence of an outlier? Will it increase,
decrease or remain constant? Explain how, using its
formula. (ICICI Lombard)
21. What are influential and leverage points? Which
of them have more effect on the model? (ANZ, Wells
Fargo)
22. Does multicollinearity impact the prediction of a
machine learning algorithm? (Wells Fargo)
***
LOGISTIC REGRESSION
INTERVIEW QUESTIONS
ASKED IN
PLACEMENT
Logistic Regression
Logistic regression is a supervised Machine
learning algorithm used for classification.
It is mainly used to model a binary dependent
variable.
Nitin Mukesh
What is the difference between linear and logistic
regression? (ANZ. Wells Fargo, Accenture)
Can you use Logistic regression for regression
problems? (Accenture)
Explain the working of Logistic Regression? (ANZ,
Delhivery)
What do you mean by Generalized Linear Model?
(ICICI Lombard)
Is logistic regression a Generalized Linear Model?
How Support Vector Machine different from
Logistic regression?
Write the expression for the Logistic function.
What do you mean by deviance in Logistic
regression? (ANZ)
What are penalized logistic regression models?
Why did you use them? How are they better than
logistic regression models? (FCS Limited)
Differentiate between Lasso, Ridge, and Elastic
Net. (Wells Fargo)
How to handle imbalanced data in classification?
Which metric to use in this case? Why is accuracy
not a good measure in this case? (FCS Limited)
How will you deal with overfitting in the case of
Logistic Regression? (Delhivery)
Interpretation of parameters of the logistic
regression model. (ANZ) Nitin Mukesh
Steps to build a Logistic Regression model from
scratch. (ANZ)
If the accuracy of the model is 95%, is it good?
(ANZ)
Difference between Logistic regression and
decision tree. (Delhivery)
Difference between Logistic regression and
Random Forest. (ANZ)
What do you mean by cost function? (Amazon)
What is a convex function? (Amazon)
Explain Gradient Descent Algorithm. (Meru cab)
How will you use logistic regression if your data
has more than 2 classes?
***
Nitin Mukesh
INTERVIEW QUESTIONS
ON CLASSIFICATION
ALGORITHMS
Commonly Asked
Topics
Logistic regression
KNN
SVM
Decision Tree
Random Forest
Bagging
LDA & QDA
Naive Bayes classifier
Classification Metrics
Overfitting and underfitting
Nitin Mukesh
Difference between logistic and SVM. (Accenture)
How to deal with overfitting in Logistic Regression?
(Delhivery)
If the accuracy of the model is 95%, is it good?
(ANZ)
Does standardization have any impact on the
performance of the model in the case of a random
forest? (Wells Fargo)
How to handle imbalanced data in classification?
Which metric to use in this case? Why is accuracy
not a good measure in this case?(FCS Limited,
Citi)
Explain the Logistic regression model. (ANZ,
Miko.ai, IPSOS)
Suppose you have 200 variables, will you consider
all the variables to build a model? (ANZ)
Difference between random forest and logistic
regression? (ANZ)
What is a random forest and how it differs from
Adaboost? (Meru cabs)
Explain bagging in Machine learning? (TCS
innovation lab)
Explain F score, precision, recall, ROC curve. (Meru
cabs)
When can you say, KNN model is overfitted?
(Delhivery)
Write the loss function of logistic and linear
regression. (Delhivery)
How is the prediction made in the random forest?
(FCS Limited)
How is XGBoost different from gradient boosting?
(Wells Fargo)
What will happen if we use MSE in logistic
regression? (Delhivery)
If you have categorical features in your data, then
how will you find the Nearest Neighbor using KNN?
(ANZ)
Explain the working of the Support Vector Machine
Algorithm? (ICICI Lombard, Cognizant)
What is the kernel trick in SVM? Why is it
computationally efficient? (FCS Limited)
Explain the complete working of the decision tree
algorithm. (WNS Global, TCS innovation lab)
What is a confusion matrix? (ANZ)
Difference between Linear regression vs logistic
regression. (Wells Fargo)
What is underfitting and overfitting for example?
(Accenture)
What are the steps to build a model from scratch?
(ANZ)
Why random forests are called random? (FCS
Limited) Nitin Mukesh
Can Ridge or Lasso be used for NLP? (Miko.ai,
NLP)
What is the difference between Bagging and
Boosting? (Miko.ai, TCS Innovation Lab)
Differentiate between SVM and Random Forest?
(Miko.ai)
Which is better, Random forest or decision tree?
(TCS Innovation Lab)
What is the Ginni index and entropy in the decision
tree? (Cognizant)
What do you mean by deviance in Logistic
regression? (ANZ)
What is a convex function? (Amazon)
What are the criteria to evaluate a Logistic
regression model? (MMA IPSOS)
***
Nitin Mukesh
Commonly Asked
Topics
Loops
Functions
Recursion
Dynamic Programming
Sorting algorithm
Binary Search
Coding questions are commonly asked in data science
interviews. But, if you are sitting for banking
companies like ANZ, Citi, ICICI, IDFC first bank, etc, you
can expect no or minimal coding questions.
Codebasics: Link
***
ASKED IN
PLACEMENT
Why Guestimates?
We’ll start by discussing guesstimates, for which
candidates are asked to come up with a figure,
usually the size of a market or the number of
objects in an area or to estimate a particular data.
Nitin Mukesh
How to solve?
The most important thing to remember about
brainteasers, guesstimates, or even simple math
questions that are designed to be stressful is to
let your interviewer see how your mind works.
And, that's what your interviewer wants to see.
Nitin Mukesh
Conclusion
See, you don't have to be exact in your answer.
But, you have to show how can you break down a
problem to solve it efficiently. That is what
interviewer is interested in.
Nitin Mukesh
Amount of revenue from the mobile industry in India
(KPMG)
Estimate the number of trees on the IIT Bombay campus.
Number of Maggi consumed on campus in a month.
(Deloitte)
Number of smokers in Hyderabad (Deloitte)
Number of Vada Paos served in the XYZ Mess (Deloitte)
Number of Maggi packets sold on the first day of the
relaunch (Deloitte)
Number of flights taking off in a day from the Delhi Airport
(Deloitte)
The number of taxis in Bombay. (Deloitte)
Number of people who have ever lived on the Earth
(Deloitte)
Number of vehicles in a toll gate of your choice (Amazon)
Number of calls to our customer care in a week (Amazon)
Number of people using FB on campus (Amazon)
The market for cricket bats in the country (Amazon)
The market for leather shoes (Amazon)
Estimate the number of people on Patna Gandhi Maidan.
(ANZ)
Market size for electric insect repellant (KPMG)
Estimate market size of electric bulbs in India (KPMG)
Guesstimate number of daily flights in India (KPMG)
Estimate market size of online retail in India (Amazon)
Estimate the number of office chairs in Delhi (EY)
Estimate number of autos in India (Accenture)
Estimate number of Gmail app users in India
(Accenture)
Nitin Mukesh
Number of ticket counters required for Lucknow metro
station (Amazon)
Titan is launching a watch priced at Rs. 12,000. Estimate
market size (Accenture)
Royal Enfield is launching a new bike priced at 2 lakh in India.
Estimate market size. (Deloitte)
Estimate Revenue of Make my Trip in a year (Deloitte)
Number of people at Lucknow airport in a day (i3)
Number of cabs required at Lucknow airport (i3)
Calculate the total amount of revenue from the mobile
industry in India (KPMG)
Estimate the amount of food wasted inside IIT Bombay
hostels. (Accenture)
You have to estimate the average height of the population of
India, how will you go about it? (ICICI Lombard)
Suppose the vaccine for Corona is ready. How will you plan
its delivery? (ICICI Lombard)
Estimate the number of masks which are disposed of every
day in a city. (ANZ Bank)
Estimate the number of flights going from Mumbai airport in
a week? (ANZ Bank)
Estimate the number of persons you need to run the
campaign at the airport? (ANZ Bank)
***
Nitin Mukesh
Puzzles for Data
science interview
ASKED IN
PLACEMENT
How to answer puzzle
questions?
When the interviewer asks you a puzzle, you don't
need to tell an answer directly or come to conclusion
at once.
***
HR QUESTIONS FOR
DATA SCIENCE
INTERVIEW
ASKED IN
PLACEMENT
Why HR Questions?
After the initial rounds of candidates selection, every
company holds an HR interview round to get to know
the potential candidates personally.
Nitin Mukesh
Though there can be a variety of questions that can
be asked in the HR round depending on the job role
and your degree. But, here are some questions that
are more common in any interview:
***
Nitin Mukesh
It is the most important question in the interview. This is
the first question you will be asked. Though, if asked in
the technical round, the answer will be different than
what you say if asked in HR Round.
Educational Background
Strength
Hobbies
Interests
Conclusion
You see your peers getting placed and here still you
are waiting for another chance. Maybe someone with
no experience or less technical skills than you is
getting placed and still, you are not.
Self-doubt starts to kick in and you will start thinking
that maybe you are not made for this job. All those
lucky people got selected and you didn't.
But believe me, there is a lucky day for you. You can go
through all sadness. It's the consequences of what is
happening to us.
Keep going. You are going to learn a lot when one day
you will be placed.
That's all.
All the best for your next data science job interview!!!
😇
***
For more Data science placement
preparation resources, Please scan:
https://forms.gle/JNMLr6zK6yYAo7UG9