170 Machine Learning Interview Questions and Answer For 2021
170 Machine Learning Interview Questions and Answer For 2021
A Machine Learning interview calls for a rigorous interview process where the
candidates are judged on various aspects such as technical and programming skills,
knowledge of methods, and clarity of basic concepts. If you aspire to apply for
machine learning jobs, it is crucial to know what kind of interview questions generally
recruiters and hiring managers may ask.
Before we deep dive further, if you are keen to explore a course in Artificial
Intelligence & Machine Learning do check out our AIML Courses available at Great
Learning. Anyone could expect an average Salary Hike of 48% from this course.
Participate in Great Learning’s career accelerate programs and placement drives
and get hired by our pool of 500+ Hiring companies through our programs.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 1/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
This is an attempt to help you crack the machine learning interviews at major
product-based companies and start-ups. Usually, machine learning interviews at
major companies require a thorough knowledge of data structures and algorithms.
In the upcoming series of articles, we shall start from the basics of concepts and
build upon these concepts to solve major interview questions. Machine learning
interviews comprise many rounds, which begin with a screening test. This comprises
solving questions either on the whiteboard or solving it on online platforms like
HackerRank, LeetCode etc.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 2/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Here, we have compiled a list of frequently asked top 100 machine learning interview
questions that you might face during an interview.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 3/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Additional Information: ASR (Automatic Speech Recognition) & NLP (Natural Language
Processing) fall under AI and overlay with ML & DL as ML is often utilized for NLP and
ASR tasks.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 4/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
The machine learns using labelled data. The model is trained on an existing data set
before it starts making decisions with the new data.
The target variable is categorical: Logistic regression, Naive Bayes, KNN, SVM, Decision
Tree, Gradient Boosting, ADA boosting, Bagging, Random forest etc.
The machine is trained on unlabelled data and without any proper guidance. It
automatically infers patterns and relationships in the data by creating clusters. The
model learns through observations and deduced structures in the data.
C. Reinforcement Learning:
The model learns through a trial and error method. This kind of learning involves an
agent that will interact with the environment to create actions and then discover
errors or rewards of that action.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 5/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Machine Learning involves algorithms that learn from patterns of data and then
apply it to decision making. Deep Learning, on the other hand, is able to learn
through processing data on its own and is quite similar to the human brain where it
identifies something, analyse it, and makes a decision.
Supervised learning technique needs labeled data to train the model. For example,
to solve a classification problem (a supervised learning task), you need to have label
data to train the model and to classify the data into your labeled groups.
Unsupervised learning does not need any labelled dataset. This is the main key
difference between supervised learning and unsupervised learning.
There are various means to select important variables from a data set that include
the following:
So, there is no certain metric to decide which algorithm to be used for a given
situation or a data set. We need to explore the data using EDA (Exploratory Data
Analysis) and understand the purpose of using the dataset to come up with the best
fit algorithm. So, it is important to study all the algorithms in detail.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 7/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Covariance measures how two variables are related to each other and how one
would vary with respect to changes in the other variable. If the value is positive it
means there is a direct relationship between the variables and one would increase
or decrease with an increase or decrease in the base variable respectively, given
that all other conditions remain constant.
Correlation quantifies the relationship between two random variables and has only
three specific values, i.e., 1, 0, and -1.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 8/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 9/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Deep Learning is a part of machine learning that works with neural networks. It
involves a hierarchical structure of networks that set up a process to help machines
learn the human logics behind any action. We have compiled a list of the frequently
asked deep leaning interview questions to help you prepare.
What is overfitting?
Overfitting is a type of modelling error which results in the failure to predict future
observations effectively or fit additional data in the existing model. It occurs when a
function is too closely fit to a limited set of data points and usually ends with more
parameters read more…
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 10/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Both are errors in Machine Learning Algorithms. When the algorithm has limited
flexibility to deduce the correct observation from the dataset, it results in bias. On the
other hand, variance occurs when the model is extremely sensitive to small
fluctuations.
If one adds more features while building a model, it will add more complexity and we
will lose bias but gain some variance. In order to maintain the optimal amount of
error, we perform a tradeoff between bias and variance based on the needs of a
business.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 11/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Bias stands for the error because of the erroneous or overly simplistic assumptions in
the learning algorithm . This assumption can lead to the model underfitting the data,
making it hard for it to have high predictive accuracy and for you to generalize your
knowledge from the training set to the test set.
Variance is also an error because of too much complexity in the learning algorithm.
This can be the reason for the algorithm being highly sensitive to high degrees of
variation in training data, which can lead your model to overfit the data. Carrying too
much noise from the training data for your model to be very useful for your test data.
Standard deviation refers to the spread of your data from the mean. Variance is the
average degree to which each point differs from the mean i.e. the average of all
data points. We can relate Standard deviation and Variance because it is the square
root of Variance.
14. A data set is given to you and it has missing values which
spread along 1standard deviation from the mean. How much
of the data would remain untouched?
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 12/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
It is given that the data is spread across mean that is the data is spread across an
average. So, we can presume that it is a normal distribution. In a normal distribution,
about 68% of data lies in 1 standard deviation from averages like mean, mode or
median. That means about 32% of the data remains uninfluenced by missing values.
Data set about utilities fraud detection is not balanced enough i.e. imbalanced. In
such a data set, accuracy score cannot be the measure of performance as it may
only be predict the majority class label correctly but in this case our point of interest
is to predict the minority label. But often minorities are treated as noise and ignored.
So, there is a high probability of misclassification of the minority label as compared
to the majority label. For evaluating the model performance in case of imbalanced
data sets, we should use Sensitivity (True Positive rate) or Specificity (True Negative
rate) to determine class label wise performance of the classification model. If the
minority class label’s performance is not so good, we could do the following:
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 13/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Identifying missing values and dropping the rows or columns can be done by using
IsNull() and dropna( ) functions in Pandas. Also, the Fillna() function in Pandas
replaces the incorrect values with the placeholder value.
A Time series is a sequence of numerical data points in successive order. It tracks the
movement of the chosen data points, over a specified period of time and records
the data points at regular intervals. Time series doesn’t require any minimum or
maximum time input. Analysts often use Time series to examine data according to
their specific requirement.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 14/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
“KickStart your Artificial Intelligence Journey with Great Learning which offers high-
rated Artificial Intelligence courses with world-class training by industry leaders.
Whether you’re interested in machine learning, data mining, or data analysis, Great
Learning has a course for you!”
Gradient Descent and Stochastic Gradient Descent are the algorithms that find the
set of parameters that will minimize a loss function.
The difference is that in Gradient Descend, all training samples are evaluated for
each set of parameters. While in Stochastic Gradient Descent only one training
sample is evaluated for the set of parameters identified.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 15/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
tune.
On the other hand, the disadvantage is that they are prone to overfitting.
Random forests are a significant number of decision trees pooled using averages or
majority rules at the end. Gradient boosting machines also combine decision trees
but at the beginning of the process unlike Random forests. Random forest creates
each tree independent of the others while gradient boosting develops one tree at a
time. Gradient boosting yields better outcomes than random forests if parameters
are carefully tuned but it’s not a good option if the data set contains a lot of
outliers/anomalies/noise as it can result in overfitting of the model.Random forests
perform well for multiclass object detection. Gradient Boosting performs well when
there is data which is not balanced such as in real time risk assessment.
Confusion matrix (also called the error matrix) is a table that is frequently used to
illustrate the performance of a classification model i.e. classifier on a set of test data
for which the true values are well-known.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 16/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Support is a measure of how often the “item set” appears in the data set and
Confidence is a measure of how often a particular rule has been found to be true.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 17/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
P(X=x) = ∑YP(X=x,Y)
Given the joint probability P(X=x,Y), we can use marginalization to find P(X=x). So, it is
to find distribution of one random variable by exhausting cases on other random
variables.
The phrase is used to express the difficulty of using brute force or grid search to
optimize a function with too many inputs.
Dimensionality reduction techniques like PCA come to the rescue in such cases.
The idea here is to reduce the dimensionality of the data set by reducing the number
of variables that are correlated with each other. Although the variation needs to be
retained to the maximum extent.
The variables are transformed into a new set of variables that are known as Principal
Components’. These PCs are the eigenvectors of a covariance matrix and therefore
are orthogonal.
NLP or Natural Language Processing helps machines analyse natural languages with
the intention of learning them. It extracts information from data by applying
machine learning algorithms. Apart from learning the basics of NLP, it is important to
prepare specifically for the interviews.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 18/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Which of the following architecture can be trained faster and needs less amount of
training data
b. Transformer architecture
Read more…
A data point that is considerably distant from the other similar data points is known
as an outlier. They may occur due to experimental errors or variability in
measurement. They are problematic and can mislead a training process, which
eventually results in longer training time, inaccurate models, and poor results.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 19/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Univariate method – looks for data points having extreme values on a single variable
Minkowski error – reduces the contribution of potential outliers in the training process
Normalisation adjusts the data; regularisation adjusts the prediction function. If your
data is on very different scales (especially low to high), you would want to normalise
the data. Alter each column to have compatible basic statistics. This can be helpful
to make sure there is no loss of accuracy. One of the goals of model training is to
identify the signal and ignore the noise if the model is given free rein to minimize
error, there is a possibility of suffering from overfitting. Regularization imposes some
control on this by providing simpler fitting functions over complex ones.
Normalization and Standardization are the two very popular methods used for
feature scaling. Normalization refers to re-scaling the values to fit into a range of [0,1].
Standardization refers to re-scaling data to have a mean of 0 and a standard
deviation of 1 (Unit variance). Normalization is useful when all parameters need to
have the identical positive scale however the outliers from the data set are lost.
Hence, standardization is recommended for most applications.
Bernoulli Distribution can be used to check if a team will win a championship or not, a
newborn child is either male or female, you either pass an exam or not, etc.
Binomial distribution is a probability with only two possible outcomes, the prefix ‘bi’
means two or twice. An example of this would be a coin toss. The outcome will either
be heads or tails.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 20/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Poisson distribution helps predict the probability of certain events happening when
you know how often that event has occurred. It can be used by businessmen to
make forecasts about the number of customers on certain days and allows them to
adjust supply according to the demand.
Exponential distribution is concerned with the amount of time until a specific event
occurs. For example, how long a car battery would last, in months.
Visually, we can check it using plots. There is a list of Normality checks, they are as
follow:
Shapiro-Wilk W Test
Anderson-Darling Test
Martinez-Iglewicz Test
Kolmogorov-Smirnov Test
D’Agostino Skewness Test
At any given value of X, one can compute the value of Y, using the equation of Line.
This relation between Y and X, with a degree of the polynomial as 1 is called Linear
Regression.
The value of B1 and B2 determines the strength of the correlation between features
and the dependent variable.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 21/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
variable in the regression is numerical (or continuous) while that for classification is
categorical (or discrete).
If you have categorical variables as the target when you cluster them together or
perform a frequency count on them if there are certain categories which are more in
number as compared to others by a very significant number. This is known as the
target imbalance.
Example: Target column – 0,0,0,1,0,2,0,0,1,1 [0s: 60%, 1: 30%, 2:10%] 0 are in majority. To fix
this, we can perform up-sampling or down-sampling. Before fixing this problem let’s
assume that the performance metrics used was confusion metrics. After fixing this
problem we can shift the metric system to AUC: ROC. Since we added/deleted data
[up sampling or downsampling], we can go ahead with a stricter algorithm like SVM,
Gradient boosting or ADA boosting.
40. List all assumptions for data to be met before starting with
linear regression.
Linear relationship
Multivariate normality
No or little multicollinearity
No auto-correlation
Homoscedasticity
41. When does the linear regression line stop rotating or finds
an optimal spot where it is fitted on data?
A place where the highest RSquared value is found, is the place where the line comes
to rest. RSquared represents the amount of variance captured by the virtual linear
regression line with respect to the total variance captured by the dataset.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 22/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Since the target column is categorical, it uses linear regression to create an odd
function that is wrapped with a log function to use regression as a classifier. Hence, it
is a type of classification technique and not a regression. It is derived from cost
function.
43. What could be the issue when the beta value for a certain
variable varies way too much in each subset when regression
is run on different subsets of the given dataset?
Variations in the beta values in every subset implies that the dataset is
heterogeneous. To overcome this problem, we can use a different model for each of
the clustered subsets of the dataset or use a non-parametric model such as
decision trees.
Here’s a list of the top 101 interview questions with answers to help you prepare. The
first set of questions and answers are curated for freshers while the second set is
designed for advanced users.
Functions in Python refer to blocks that have organised, and reusable codes to
perform single, and related events. Functions are important to create better
modularity for applications which reuse high degree of coding. Python has a number
of built-in functions read more…
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 23/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
columns).
Yes, it is possible to use KNN for image processing. It can be done by converting the
3-dimensional image into a single-dimensional vector and using the same as input
to KNN.
K-Means is Unsupervised Learning, where we don’t have any Labels present, in other
words, no Target Variables and thus we try to cluster the data based upon their
coordinates and try to establish the nature of the cluster based on the elements
filtered for that cluster.
SVM has a learning rate and expansion rate which takes care of this. The learning
rate compensates or penalises the hyperplanes for making all the wrong moves and
expansion rate deals with finding the maximum separation area between classes.
49. What are Kernels in SVM? List popular kernels used in SVM
along with a scenario of their applications.
The function of kernel is to take data as input and transform it into the required form.
A few popular Kernels used in SVM are as follows: RBF, Linear, Sigmoid, Polynomial,
Hyperbolic, Laplace, etc.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 24/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
function, be it linear or radial, which purely depends upon the distribution of data,
one can build a classifier.
They are superior to individual models as they reduce variance, average out biases,
and have lesser chances of overfitting.
In decision trees, overfitting occurs when the tree is designed to perfectly fit all
samples in the training data set. This results in branches with strict rules or sparse
data and affects the accuracy when predicting samples that aren’t part of the
training set.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 25/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Outlier is an observation in the data set that is far away from other observations in
the data set. We can discover outliers using tools and functions like box plot, scatter
plot, Z-Score, IQR score etc. and then handle them based on the visualization we
have got. To handle outliers, we can cap at some threshold, use transformations to
reduce skewness of the data and remove outliers if they are anomalies or errors.
There are mainly six types of cross validation techniques. They are as follow:
K fold
Stratified k fold
Leave one out
Bootstrapping
Random search cv
Grid search cv
Yes, it is possible to test for the probability of improving model accuracy without
cross-validation techniques. We can do so by running the ML model for say n
number of iterations, recording the accuracy. Plot all the accuracies and remove the
5% of low probability values. Measure the left [low] cut off and right [high] cut off. With
the remaining 95% confidence, we can say that the model can go as low or as high
[as mentioned within cut off points].
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 26/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Principal Component Analysis creates one or more index variables from a larger set
of measured variables. Factor Analysis is a model of the measurement of a latent
variable. This latent variable cannot be measured with a single variable and is seen
through a relationship it causes in a set of y variables.
59. How can we use a dataset without the target variable into
supervised learning algorithms?
Input the data set into a clustering algorithm, generate optimal clusters, label the
cluster numbers as the new target variable. Now, the dataset has independent and
target variables present. This ensures that the dataset is ready to be used in
supervised learning algorithms.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 27/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Linear separability in feature space doesn’t imply linear separability in input space.
So, Inputs are non-linearly transformed using vectors of basic functions with
increased dimensionality. Limitations of Fixed basis functions are:
Inductive Bias is a set of assumptions that humans use to predict outputs given
inputs that the learning algorithm has not encountered yet. When we are trying to
learn Y from X and the hypothesis space for Y is infinite, we need to reduce the scope
by our beliefs/assumptions about the hypothesis space which is also called
inductive bias. Through these assumptions, we constrain our hypothesis space and
also get the capability to incrementally test and improve on the data using hyper-
parameters. Examples:
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 28/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
The metric used to access the performance of the classification model is Confusion
Metric. Confusion Metric can be further interpreted with the following terms:-
True Positives (TP) – These are the correctly predicted positive values. It implies that
the value of the actual class is yes and the value of the predicted class is also yes.
True Negatives (TN) – These are the correctly predicted negative values. It implies
that the value of the actual class is no and the value of the predicted class is also no.
False positives and false negatives, these values occur when your actual class
contradicts with the predicted class.
Now,
Recall, also known as Sensitivity is the ratio of true positive rate (TP), to all
observations in actual class – yes
Recall = TP/(TP+FN)
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 29/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Precision is the ratio of positive predictive value, which measures the amount of
accurate positives model predicted viz a viz number of positives it claims.
Precision = TP/(TP+FP)
Accuracy = (TP+TN)/(TP+FP+FN+TN)
F1 Score is the weighted average of Precision and Recall. Therefore, this score takes
both false positives and false negatives into account. Intuitively it is not as easy to
understand as accuracy, but F1 is usually more useful than accuracy, especially if
you have an uneven class distribution. Accuracy works best if false positives and
false negatives have a similar cost. If the cost of false positives and false negatives
are very different, it’s better to look at both Precision and Recall.
68. Plot validation score and training score with data set size
on the x-axis and another plot with model complexity on the
x-axis.
For high bias in the models, the performance of the model on the validation data set
is similar to the performance on the training data set. For high variance in the
models, the performance of the model on the validation set is worse than the
performance on the training set.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 30/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Chain rule for Bayesian probability can be used to predict the likelihood of the next
word in the sentence.
Naive Bayes classifiers are a series of classification algorithms that are based on the
Bayes theorem. This family of algorithm shares a common principle which treats
every pair of features independently while being classified.
Naive Bayes is considered Naive because the attributes in it (for the class) is
independent of others in the same class. This lack of dependence between two
attributes of the same class creates the quality of naiveness.
Naive Bayes classifiers are a family of algorithms which are derived from the Bayes
theorem of probability. It works on the fundamental assumption that every set of two
features that is being classified is independent of each other and every feature
makes an equal and independent contribution to the outcome.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 31/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Prior probability is the percentage of dependent binary variables in the data set. If
you are given a dataset and dependent variable is either 1 or 0 and percentage of 1 is
65% and percentage of 0 is 35%. Then, the probability that any new input for that
variable of being 1 would be 65%.
Marginal likelihood is the denominator of the Bayes equation and it makes sure that
the posterior probability is valid by making its area 1.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 32/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Probability is the measure of the likelihood that an event will occur that is, what is the
certainty that a specific event will occur? Where-as a likelihood function is a function
of parameters within the parameter space that describes the probability of
obtaining the observed data.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 33/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
This is a trick question, one should first get a clear idea, what is Model Performance? If
Performance means speed, then it depends upon the nature of the application, any
application related to the real-time scenario will need high speed as an important
feature. Example: The best of Search Results will lose its virtue if the Query results do
not appear fast.
If Performance is hinted at Why Accuracy is not the most important virtue – For any
imbalanced data set, more than Accuracy, it will be an F1 score than will explain the
business case and in case data is imbalanced, then Precision and Recall will be more
important than rest.
Temporal Difference Learning Method is a mix of Monte Carlo method and Dynamic
programming method. Some of the advantages of this method include:
1. It is a biased estimation.
2. It is more sensitive to initialization.
In Under Sampling, we reduce the size of the majority class to match minority class
thus help by improving performance w.r.t storage and run-time execution, but it
potentially discards useful information.
For Over Sampling, we upsample the Minority class and thus solve the problem of
information loss, however, we get into the trouble of having Overfitting.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 34/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
clusters of the same class have an equal number of instances and all classes have
the same size
Exploratory Data Analysis (EDA) helps analysts to understand the data better and
forms the foundation of better models.
Visualization
Univariate visualization
Bivariate visualization
Multivariate visualization
Outlier Detection – Use Boxplot to identify the distribution of Outliers, then Apply IQR
to set the boundary for IQR
Scaling the Dataset – Apply MinMax, Standard Scaler or Z Score Scaling mechanism
to scale the data.
Feature Engineering – Need of the domain, and SME knowledge helps Analyst find
derivative fields which can fetch more information about the nature of the data
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 35/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Prepare the suitable input data set to be compatible with the machine
learning algorithm constraints.
Enhance the performance of machine learning models.
Some of the techniques used for feature engineering include Imputation, Binning,
Outliers Handling, Log transform, grouping operations, One-Hot encoding, Feature
split, Scaling, Extracting date.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 36/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Decision trees have a lot of sensitiveness to the type of data they are trained on.
Hence generalization of results is often much more complex to achieve in them
despite very high fine-tuning. The results vary greatly if the training data is changed
in decision trees.
Hence bagging is utilised where multiple decision trees are made which are trained
on samples of the original data and the final result is the average of all these
individual models.
Boosting is the process of using an n-weak classifier system for prediction such that
every weak classifier compensates for the weaknesses of its classifiers. By weak
classifier, we imply a classifier which performs poorly on a given data set.
It’s evident that boosting is not an algorithm rather it’s a process. Weak classifiers
used are generally logistic regression, shallow decision trees etc.
There are many algorithms which make use of boosting processes but two of them
are mainly used: Adaboost and Gradient Boosting and XGBoost.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 37/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 38/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
When choosing a classifier, we need to consider the type of data to be classified and
this can be known by VC dimension of a classifier. It is defined as cardinality of the
largest set of points that the classification algorithm i.e. the classifier can shatter. In
order to have a VC dimension of at least n, a classifier must be able to shatter a
single given configuration of n points.
Operations (insertion, deletion) are faster Linked list takes linear time, making
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 39/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Arrays are of fixed size Linked lists are dynamic and flexible
The meshgrid( ) function in numpy takes two arguments as input : range of x-values
in the grid, range of y-values in the grid whereas meshgrid needs to be built before
the contourf( ) function in matplotlib is used which takes in many inputs : x-values, y-
values, fitting curve (contour line) to be plotted in grid, colours etc.
Meshgrid () function is used to create a grid using 1-D arrays of x-axis inputs and y-
axis inputs to represent the matrix indexing. Contourf () is used to draw filled
contours using the given x-axis inputs, y-axis inputs, contour line, colours etc.
Hashing is a technique for identifying unique objects from a group of similar objects.
Hash functions are large keys converted into small keys in hashing techniques. The
values of hash functions are stored in data structures which are known hash table.
Advantages:
Disadvantages:
Neural Networks requires processors which are capable of parallel processing. It’s
unexplained functioning of the network is also quite an issue as it reduces the trust in
the network in some situations like when we have to show the problem we noticed to
the network. Duration of the network is mostly unknown. We can only know that the
training is finished by looking at the error value but it doesn’t give us optimal results.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 40/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Conversion of data into binary values on the basis of certain threshold is known as
binarizing of data. Values below the threshold are set to 0 and those above the
threshold are set to 1 which is useful for feature engineering.
Code:
import pandas
import numpy
array = dataframe.values
A = array [: 0:7]
B = array [:7]
binaryA = binarizer.transform(A)
numpy.set_printoptions(precision=5)
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 41/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Example:
In the above case, fruits is a list that comprises of three fruits. To access them
individually, we use their indexes. Python and C are 0- indexed languages, that is, the
first index is 0. MATLAB on the contrary starts from 1, and thus is a 1-indexed language.
Now that we know what arrays are, we shall understand them in detail by solving
some interview questions. Before that, let us see the functions that Python as a
language provides for arrays, also known as, lists.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 42/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
On the contrary, Python provides us with a function called copy. We can copy a list to
another just by calling the copy function.
new_list = old_list.copy()
We need to be careful while using the function. copy() is a shallow copy function, that
is, it only stores the references of the original list in the new list. If the given argument
is a compound data structure like a list then python creates another object of the
same type (in this case, a new list) but for everything inside old list, only their
reference is copied. Essentially, the new list consists of references to the elements of
the older list.
Hence, upon changing the original list, the new list values also change. This can be
dangerous in many applications. Therefore, Python provides us with another
functionality called as deepcopy. Intuitively, we may consider that deepcopy()
would follow the same paradigm, and the only difference would be that for
each element we will recursively call deepcopy. Practically, this is not the case.
deepcopy() preserves the graphical structure of the original compound data. Let us
understand this better with the help of an example:
import copy.deepcopy
a = [1,2]
c = deepcopy(b)
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 43/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
This is the tricky part, during the process of deepcopy() a hashtable implemented as
a dictionary in python is used to map: old_object reference onto new_object
reference.
Therefore, this prevents unnecessary duplicates and thus preserves the structure of
the copied compound data structure. Thus, in this case, c[0] is not equal to a, as
internally their addresses are different.
Normal copy
>>> b = list(a)
>>> a
>>> b
>>> a[0][1] = 10
>>> a
Deep copy
>>> b = copy.deepcopy(a)
>>> a
>>> b
>>> a[0][1] = 9
>>> a
Now that we have understood the concept of lists, let us solve interview questions to
get better exposure on the same.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 44/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
We need to reach the end. Therefore, let us have a count that tells us how near we
are to the end. Consider the array A=[1,2,3,1,1]
Hence, we have a fair idea of the problem. Let us come up with a logic for the same.
Let us start from the end and move backwards as that makes more sense
intuitionally. We will use variables right and prev_r denoting previous right to keep
track of the jumps.
Initially, right = prev_r = the last but one element. We consider the distance of an
element to the end, and the number of jumps possible by that element. Therefore, if
the sum of the number of jumps possible and the distance is greater than the
previous element, then we will discard the previous element and use the second
element’s value to jump. Try it out using a pen and paper first. The logic will seem
very straight forward to implement. Later, implement it on your own and then verify
with the result.
def min_jmp(arr):
n = len(arr)
count = 0
# We start from rightmost index and travesre array to find the leftmost index
while True:
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 45/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
for j in (range(prev_r-1,-1,-1)):
right = j
if prev_r != right:
prev_r = right
else:
break
count += 1
print(min_jmp(n, arr))
98. Given a string S consisting only ‘a’s and ‘b’s, print the last
index of the ‘b’ present in it.
When we have are given a string of a’s and b’s, we can immediately find out the first
location of a character occurring. Therefore, to find the last occurrence of a
character, we reverse the string and find the first occurrence, which is equivalent to
the last occurrence in the original string.
Here, we are given input as a string. Therefore, we begin by splitting the characters
element wise using the function split. Later, we reverse the array, find the first
occurrence position value, and get the index by finding the value len – position -1,
where position is the index value.
def split(word):
return [(char) for char in word]
a = input()
a= split(a)
a_rev = a[::-1]
pos = -1
for i in range(len(a_rev)):
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 46/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
if a_rev[i] == ‘b’:
pos = len(a_rev)- i -1
print(pos)
break
else:
continue
if pos==-1:
print(-1)
A = [1,2,3,4,5]
A <<2
[3,4,5,1,2]
A<<3
[4,5,1,2,3]
There exists a pattern here, that is, the first d elements are being interchanged with
last n-d +1 elements. Therefore we can just swap the elements. Correct? What if the
size of the array is huge, say 10000 elements. There are chances of memory error,
run-time error etc. Therefore, we do it more carefully. We rotate the elements one by
one in order to prevent the above errors, in case of large arrays.
n = len( arr)
arr[i] = arr[i + 1]
arr[n-1] = tmp
n = len (arr)
rot_left_once ( arr, n)
Given an array arr[] of N non-negative integers which represents the height of blocks
at index I, where the width of each block is 1. Compute how much water can be
trapped in between blocks after raining.
#||
# |_|
Solution: We are given an array, where each element denotes the height of the block.
One unit of height is equal to one unit of water, given there exists space between the
2 elements to store it. Therefore, we need to find out all such pairs that exist which
can store water. We need to take care of the possible cases:
Therefore, let us find start with the extreme elements, and move towards the centre.
n = int(input())
# left =[arr[0]]
# we use two arrays left[ ] and right[ ], which keep track of elements greater than all
left.append(max(left[-1], elem) )
water = 0
# once we have the arrays left, and right, we can find the water capacity between
these arrays.
if add_water > 0:
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 48/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
water += add_water
print(water)
Simply put, eigenvectors are directional entities along which linear transformation
features like compression, flip etc. can be applied.
Eigenvalues are the magnitude of the linear transformation features along each
direction of an Eigenvector.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 49/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Ans. No, logistic regression cannot be used for classes more than 2 as it is a binary
classifier. For multi-class classification algorithms like Decision Trees, Naïve Bayes’
Classifiers are better suited.
Ans. Classifier penalty, classifier solver and classifier C are the trainable
hyperparameters of a Logistic Regression Classifier. These can be specified
exclusively with values in Grid Search to hyper tune a Logistic Classifier.
Ans. The most important features which one can tune in decision trees are:
1. Splitting criteria
2. Min_leaves
3. Min_samples
4. Max_depth
Ans. It is a situation in which the variance of a variable is unequal across the range of
values of the predictor variable.
111. Is ARIMA model a good fit for every time series problem?
Ans. No, ARIMA model is not suitable for every type of time series problem. There are
situations where ARMA model and others also come in handy.
ARIMA is best when different standard temporal structures require to be captured for
time series data.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 50/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Ans. If very few data samples are there, we can make use of oversampling to
produce new data points. In this way, we can have new data points.
Ans. The gamma value, c value and the type of kernel are the hyperparameters of an
SVM model.
Ans. Pandas profiling is a step to find the effective number of usable data. It gives us
the statistics of NULL values and the usable values and thus makes variable selection
and data selection for building models in the preprocessing phase very effective.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 51/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
PCA takes into consideration the variance. LDA takes into account the distribution of
classes.
Manhattan
Minkowski
Tanimoto
Jaccard
Mahalanobis
Ans. We should use ridge regression when we want to use all predictors and not
remove any as it reduces the coefficient values but does not nullify them.
Ans. Random Forest, Xgboost and plot variable importance charts can be used for
variable selection.
Ans. Bagging is the technique used by Random Forests. Random forests are a
collection of trees which work on sampled data from the original dataset with the
final prediction being a voted average of all trees.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 52/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Ans. High bias error means that that model we are using is ignoring all the important
trends in the model and the model is underfitting.
To reduce underfitting:
Sometimes it also gives the impression that the data is noisy. Hence noise from data
should be removed so that most important signals are found by the model to make
effective predictions.
Increasing the number of epochs results in increasing the duration of training of the
model. It’s helpful in reducing the error.
1 = not correlated.
Between 1 and 5 = moderately correlated.
Greater than 5 = highly correlated.
Ans. A categorical predictor can be treated as a continuous one when the nature of
data points it represents is ordinal. If the predictor variable is having ordinal data
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 53/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
then it can be treated as continuous and its inclusion in the model increases the
performance of the model.
Ans. A pipeline is a sophisticated way of writing software such that each intended
action while building a model can be serialized and the process calls the individual
functions for the individual tasks. The tasks are carried out in sequence for a given
sequence of data points and the entire process can be run onto n threads by use of
composite estimators in scikit learn.
Ans. We can use a custom iterative sampling such that we continuously add
samples to the train set. We only should keep in mind that the sample used for
validation should be added to the next train sets and a new sample is used for
validation.
1. Reduces overfitting
2. Shortens the size of the tree
3. Reduces complexity of the model
4. Increases bias
Ans. The normal distribution is a bell-shaped curve. Most of the data points are
around the median. Hence approximately 68 per cent of the data is around the
median. Since there is no skewness and its bell-shaped.
A very small chi-square test statistics implies observed data fits the expected data
extremely well.
Example – “Stress testing, a routine diagnostic tool used in detecting heart disease,
results in a significant number of false positives in women”
Example – “it’s possible to have a false negative—the test says you aren’t pregnant
when you are”
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 55/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Ans. Error is a sum of bias error+variance error+ irreducible error in regression. Bias
and variance error can be reduced but not the irreducible error.
Type I and Type II error in machine learning refers to false values. Type I is equivalent
to a False positive while Type II is equivalent to a False negative. In Type I error, a
hypothesis which ought to be accepted doesn’t get accepted. Similarly, for Type II
error, the hypothesis gets rejected which should have been accepted in the first
place.
Naive Bayes:
Work well with small dataset compared to DT which need more data
Lesser overfitting
Smaller in size and faster in processing
Decision Trees:
Decision Trees are very flexible, easy to understand, and easy to debug
No preprocessing or transformation of features required
Prone to overfitting but you can use pruning or Random forests to avoid
that.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 56/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Receiver operating characteristics (ROC curve): ROC curve illustrates the diagnostic
ability of a binary classifier. It is calculated/created by plotting True Positive against
False Positive at various threshold settings. The performance metric of ROC curve is
AUC (area under curve). Higher the area under the curve, better the prediction power
of the model.
The same calculation can be applied to a naive model that assumes absolutely no
predictive power, and a saturated model assuming perfect predictions.
The likelihood values are used to compare different models, while the deviances
(test, naive, and saturated) can be used to determine the predictive power and
accuracy. Logistic regression accuracy of the model will always be 100 percent for
the development data set, but that is not the case once a model is applied to
another data set.
How well does the model fit the data?, Which predictors are most important?, Are the
predictions accurate?
1. Akaike Information Criteria (AIC): In simple terms, AIC estimates the relative amount
of information lost by a given model. So the less information lost the higher the
quality of the model. Therefore, we always prefer models with minimum AIC.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 57/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
3. Confusion Matrix: In order to find out how well the model does in predicting the
target variable, we use a confusion matrix/ classification rate. It is nothing but a
tabular representation of actual Vs predicted values which helps us to find the
accuracy of the model.
First reason is that XGBoos is an ensemble method that uses many trees to make a
decision so it gains power by repeating itself.
SVM is a linear separator, when data is not linearly separable SVM needs a Kernel to
project the data into a space where it can separate it, there lies its greatest strength
and weakness, by being able to project data into a high dimensional space SVM can
find a linear separation for almost any data but at the same time it needs to use a
Kernel and we can argue that there’s not a perfect kernel for every dataset.
One is used for ranking and the other is used for regression.
In ranking, the only thing of concern is the ordering of a set of examples. We only
want to know which example has the highest rank, which one has the second-
highest, and so on. From the data, we only know that example 1 should be ranked
higher than example 2, which in turn should be ranked higher than example 3, and so
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 58/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
on. We do not know by how much example 1 is ranked higher than example 2, or
whether this difference is bigger than the difference between examples 2 and 3.
You have the basic SVM – hard margin. This assumes that data is very well behaved,
and you can find a perfect classifier – which will have 0 error on train data.
Soft-margin
Data is usually not well behaved, so SVM hard margins may not have a solution at all.
So we allow for a little bit of error on some points. So the training error will not be 0,
but average error over all points is minimized.
Kernels
The above assume that the best classifier is a straight line. But what is it is not a
straight line. (e.g. it is a circle, inside a circle is one class, outside is another class). If
we are able to map the data into higher dimensions – the higher dimension may
give us a straight line.
Linear classifiers (all?) learn linear fictions from your data that map your input to
scores like so: scores = Wx + b. Where W is a matrix of learned weights, b is a learned
bias vector that shifts your scores, and x is your input data. This type of function may
look familiar to you if you remember y = mx + b from high school.
A typical svm loss function ( the function that tells you how good your calculated
scores are in relation to the correct labels ) would be hinge loss. It takes the form:
Loss = sum over all scores except the correct score of max(0, scores – scores(correct
class) + 1).
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 59/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
P(X|Y,Z)=P(X|Z)
P(X|Y,Z)=P(X|Z), Whereas more general Bayes Nets (sometimes called Bayesian Belief
Networks), will allow the user to specify which attributes are, in fact, conditionally
independent.
For the Bayesian network as a classifier, the features are selected based on some
scoring functions like Bayesian scoring function and minimal description length(the
two are equivalent in theory to each other given that there is enough training data).
The scoring functions mainly restrict the structure (connections and directions) and
the parameters(likelihood) using the data. After the structure has been learned the
class is only determined by the nodes in the Markov blanket(its parents, its children,
and the parents of its children), and all variables given the Markov blanket are
discarded.
First, Naive Bayes is not one algorithm but a family of Algorithms that inherits the
following attributes:
1.Discriminant Functions
3.Bayesian Theorem
Since these are generative models, so based upon the assumptions of the random
variable mapping of each feature vector these may even be classified as Gaussian
Naive Bayes, Multinomial Naive Bayes, Bernoulli Naive Bayes, etc.
Selection bias stands for the bias which was introduced by the selection of
individuals, groups or data for doing analysis in a way that the proper randomization
is not achieved. It ensures that the sample obtained is not representative of the
population intended to be analyzed and sometimes it is referred to as the selection
effect. This is the part of distortion of a statistical analysis which results from the
method of collecting samples. If you don’t take the selection bias into the account
then some conclusions of the study may not be accurate.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 61/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
Recall is also known as sensitivity and the fraction of the total amount of relevant
instances which were actually retrieved.
Both precision and recall are therefore based on an understanding and measure of
relevance.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 62/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
The information gain is based on the decrease in entropy after a dataset is split on
an attribute. Constructing a decision tree is all about finding the attribute that
returns the highest information gain (i.e., the most homogeneous branches). Step 1:
Calculate entropy of the target.
SVM algorithms have basically advantages in terms of complexity. First I would like to
clear that both Logistic regression as well as SVM can form non linear decision
surfaces and can be coupled with the kernel trick. If Logistic regression can be
coupled with kernel then why use SVM?
Linear Regression Analysis consists of more than just fitting a linear line through a
cloud of data points. It consists of 3 stages–
“KickStart your Artificial Intelligence Journey with Great Learning which offers high-
rated Artificial Intelligence courses with world-class training by industry leaders.
Whether you’re interested in machine learning, data mining, or data analysis, Great
Learning has a course for you!”
FAQ:
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 63/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
There is no fixed or definitive guide through which you can start your machine
learning career. The first step is to understand the basic principles of the subject and
learn a few key concepts such as algorithms and data structures, coding
capabilities, calculus, linear algebra, statistics. The next step would be to take up a ML
course, or read the top books for self-learning. You can also work on projects to get a
hands-on experience.
Any way that suits your style of learning can be considered as the best way to learn.
Different people may enjoy different methods. Some of the common ways would be
through taking up a Machine Learning Course, watching YouTube videos, reading
blogs with relevant topics, read books which can help you self-learn.
Most hiring companies will look for a masters or doctoral degree in the relevant
domain. The field of study includes computer science or mathematics. But having
the necessary skills even without the degree can help you land a ML job too.
Machine Learning for beginners will consist of the basic concepts such as types of
Machine Learning (Supervised, Unsupervised, Reinforcement Learning). Each of these
types of ML have different algorithms and libraries within them, such as,
Classification and Regression. There are various classification algorithms and
regression algorithms such as Linear Regression. This would be the first thing you will
learn before moving ahead with other concepts.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 64/65
14/09/2021 170 Machine Learning Interview Questions and Answer for 2021
You will need to know statistical concepts, linear algebra, probability, Multivariate
Calculus, Optimization. As you go into the more in-depth concepts of ML, you will
need more knowledge regarding these topics.
Stay tuned to this page for more such information on interview questions and career
assistance. You can check our other blogs about Machine Learning for more
information.
You can also take up the PGP Artificial Intelligence and Machine Learning Course
offered by Great Learning in collaboration with UT Austin. The course offers online
learning with mentorship and provides career assistance as well. The curriculum has
been designed by faculty from Great Lakes and The University of Texas at Austin-
McCombs and helps you power ahead your career.
Further reading
1. Python Interview Questions and Answers for 2021
2. NLP Interview Questions and Answers most commonly asked in 2021
3. Top 20 Artificial Intelligence Interview Questions for 2021 | AI Interview
Questions
4. 100+ Data Science Interview Questions for 2021
5. Top 40 Hadoop Interview Questions You Should Prepare for 2021
6. 100+ SQL Interview Questions and Answers you must Prepare in 2021
14
Tanuja Bahirat
Tanuja is a content writer who enjoys spending time in nature, watching football, and journaling. She loves
attending music festivals and reading. In her current journey, she writes about recent advancements in
technology and it's impact on the world.
https://www.mygreatlearning.com/blog/machine-learning-interview-questions/ 65/65