Interview questions companie
Interview questions companie
Company: Uber
Role: Data Scientist
Company: TCS
Role: Data Scientist
Train-test split.
Modelling.
Hyperparameter Tuning.
o Transaction amount,
o Transaction count,
o Transaction frequency,
o transaction category: bar, grocery, jwery etc.
o transaction channels: credit card, debit card, international
wire transfer etc.
o distance between transaction address and mailing
address,
o fraud/ risk score.
Mean
Median
The central limit theorem (CLT) is simple. It just says that with a
large sample size, sample means are normally distributed. ...
Putting it all together the CLT just says that when you have
roughly 30 or more observations in your sample, the average of
those numbers is part of a bell-shaped curve.
6. How will you find out the outliers in the dataset and is it
always to remove outliers?
7. Explain about Machine Learning
Mean imputation. Simply calculate the mean of the observed values for that
variable for all individuals who are non-missing.
Substitution.
Hot deck imputation.
Cold deck imputation.
Regression imputation.
Stochastic regression imputation.
Interpolation and extrapolation.
Company: Verizon
Role: Data Scientist
OR
Families with more than one vehicle, Cars sold and not sold.
Cars registered in city. Cars from outside, Cars in govt duty etc.
Kindly Note:
3. OLS vs MLE?
The least squares method is the most widely used procedure for
developing estimates of the model parameters. For simple
linear regression, the least squares estimate of the model
parameters β0 and β1 are denoted b0 and b1. Using these
estimates, an estimated regression equation is constructed: ŷ =
b0 + b1x .
Company: Fractal
Role: Data Scientist
2.Map function
to put it another way. The list will be faster than the dictionary
on the first item, because there's nothing to look up. it's the
first item, boom. it's done. but the second time the list has to
look through the first item, then the second item. The third time
through it has to look through the first item, then the second
item, then the third item.etc.So each iteration the lookup takes
more and more time. The larger the list, the longer it takes.
While the dictionary is always a more or less fixed lookup time
(it also increases as the dictionary gets larger, but at a much
slower pace, so by comparison it's almost fixed).
8) How do you split the time series data and evaluation metrics
for time series data
9) How did you deploy your model in production? How often do
you retrain it?
Company: Wipro
Role: Data Scientist
HAVING Clause is used to filter record from the groups based on the
specified condition.
Step 3: Finally, you can just cut that stack of 4 pieces in half –
using your third and final cut – and then you will end up with
8 pieces of cake!
7. Explain kmeans clustering
ow, look at the above two figures. what did you observe? Let us
talk about the first figure. The first figure shows the data before
applying the k-means clustering algorithm. Here all three
different categories are messed up. When you will see such
data in the real world, you will not able to figure out the
different categories.
Now, look at the second figure(fig 2). This shows the data after
applying the K-means clustering algorithm. you can see that all
three different items are classified into three different
categories which are called clusters.
How Does the K-means clustering algorithm work?
k-means clustering tries to group similar kinds of items in form
of clusters. It finds the similarity between the items and groups
them into the clusters. K-means clustering algorithm works in
three steps. Let’s see what are these three steps.
Classification
Company: Accenture
Role: Data Scientist
10. Explain bias - variance trade off. How does this affect the
model?
Stats:
What is p value?
What is histograms?
Basic Architecture
There are two main parts to a CNN architecture
A convolution tool that separates and identifies the
various features of the image for analysis in a process
called as Feature Extraction
A fully connected layer that utilizes the output from the
convolution process and predicts the class of the image
based on the features extracted in previous stages.
Convolution Layers
There are three types of layers that make up the CNN
which are the convolutional layers, pooling layers, and
fully-connected (FC) layers. When these layers are stacked,
a CNN architecture will be formed. In addition to these
three layers, there are two more important parameters
which are the dropout layer and the activation function
which are defined below.
1. Convolutional Layer
This layer is the first layer that is used to extract the various
features from the input images. In this layer, the mathematical
operation of convolution is performed between the input image
and a filter of a particular size MxM. By sliding the filter over
the input image, the dot product is taken between the filter and
the parts of the input image with respect to the size of the filter
(MxM).
The output is termed as the Feature map which gives us
information about the image such as the corners and edges.
Later, this feature map is fed to other layers to learn several
other features of the input image.
2. Pooling Layer
In most cases, a Convolutional Layer is followed by a Pooling
Layer. The primary aim of this layer is to decrease the size of
the convolved feature map to reduce the computational costs.
This is performed by decreasing the connections between
layers and independently operates on each feature map.
Depending upon method used, there are several types of
Pooling operations.
In Max Pooling, the largest element is taken from feature map.
Average Pooling calculates the average of the elements in a
predefined sized Image section. The total sum of the elements
in the predefined section is computed in Sum Pooling. The
Pooling Layer usually serves as a bridge between the
Convolutional Layer and the FC Layer
4. Dropout
Usually, when all the features are connected to the FC layer, it
can cause overfitting in the training dataset. Overfitting occurs
when a particular model works so well on the training data
causing a negative impact in the model’s performance when
used on a new data.
To overcome this problem, a dropout layer is utilised wherein a
few neurons are dropped from the neural network during
training process resulting in reduced size of the model. On
passing a dropout of 0.3, 30% of the nodes are dropped out
randomly from the neural network.
5. Activation Functions
Finally, one of the most important parameters of the CNN
model is the activation function. They are used to learn and
approximate any kind of continuous and complex relationship
between variables of the network. In simple words, it decides
which information of the model should fire in the forward
direction and which ones should not at the end of the network.
It adds non-linearity to the network. There are several
commonly used activation functions such as the ReLU,
Softmax, tanH and the Sigmoid functions. Each of these
functions have a specific usage. For a binary classification CNN
model, sigmoid and softmax functions are preferred an for a
multi-class classification, generally softmax us used.
2)If we put a 3×3 filter over 6×6 image what will be the size of
the output image
we get 4 x 4 image
4. Chi-Square test
A chi-square test is a statistical test used to compare observed
results with expected results. The purpose of this test is to
determine if a difference between observed data and expected
data is due to chance, or if it is due to a relationship between
the variables you are studying.
5. A/B testing
8. ANOVA test
9. Cross validation
The Area Under the Curve (AUC) is the measure of the ability of
a classifier to distinguish between classes and is used as a
summary of the ROC curve. The higher the AUC, the better the
performance of the model at distinguishing between the
positive and negative classes.
14.Which metric is used to split a node in Decision Tree
Company: Genpact
Role: Data Scientist
2. Linear Function
We saw the problem with the step function, the gradient of the function
became zero. This is because there is no component of x in the binary step
function. Instead of a binary function, we can use a linear function. We can
define the function as-
Although the gradient here does not become zero, but it is a
constant which does not depend upon the input value x at all.
This implies that the weights and biases will be updated during
the backpropagation process but the updating factor would be
the same.
In this scenario, the neural network will not really improve the
error since the gradient is the same for every iteration. The
network will not be able to train well and capture the complex
patterns from the data. Hence, linear function might be ideal
for simple tasks where interpretability is highly desired.
3. Sigmoid
The next activation function that we are going to look at is the Sigmoid
function. It is one of the most widely used non-linear activation function.
Sigmoid transforms the values between the range 0 and 1. Here is the
mathematical expression for sigmoid-
4. Tanh
The tanh function is very similar to the sigmoid function. The only difference
is that it is symmetric around the origin. The range of values in this case is
from -1 to 1. Thus the inputs to the next layers will not always be of the same
sign. The tanh function is defined as-
5. ReLU
The ReLU function is another non-linear activation function that has gained
popularity in the deep learning domain. ReLU stands for Rectified Linear Unit.
The main advantage of using the ReLU function over other activation
functions is that it does not activate all the neurons at the same time.
This means that the neurons will only be deactivated if the output of the
linear transformation is less than 0. The plot below will help you understand
this better-
For the negative input values, the result is zero, that means the neuron does
not get activated. Since only a certain number of neurons are activated, the
ReLU function is far more computationally efficient when compared to the
sigmoid and tanh function.
6. Leaky ReLU
Leaky ReLU function is nothing but an improved version of the ReLU function.
As we saw that for the ReLU function, the gradient is 0 for x<0, which would
deactivate the neurons in that region.
Leaky ReLU is defined to address this problem. Instead of defining the Relu
function as 0 for negative values of x, we define it as an extremely small
linear component of x. Here is the mathematical expression-
9. What are the conditions that should be satisfied for a time
series to be stationary?
Company: Quantiphi
Role: Machine Learning Engineer
Company: Cognizant
Role: Data Scientist
Company: Deloitte
Role: Data Scientist
1. Conditional Probability
PCA pumps not only control pain but also have other benefits.
People feel less anxious and depressed. They are not as
sleepy, because they use less medicine. Often they are able
to move around more.
Company: Axtria
------------
1.RNN, NN and CNN difference.
Supervised Learning
Types of Problems
Regression problems
Linear Regression
Nonlinear Regression
Bayesian Linear Regression
Unsupervised Learning
Neural Networks
Principal Component Analysis
Reinforcement Learning
5. What is Multicollinearity
Company: Bridgei2i
Role: Senior Analytics Consultant
3) How can you iterate over a list and also retrieve element
indices at the same time?
enumerate function. It takes each element in a sequence (like a
list) and sticks it's location right before it. For example:
Company: Deloitte
Role: Data Scientist
1. Conditional Probability
6. Why is the t-value same for 90% two tail and 95% one tail
test?
Company: Latentview.
Working Procedure:
At first the total no. of oversampling observations, N is set up.
Generally, it is selected such that the binary class distribution is
1:1. But that could be tuned down based on need. Then the
iteration starts by first selecting a positive class instance at
random. Next, the KNN’s (by default 5) for that instance is
obtained. At last, N of these K instances is chosen to interpolate
new synthetic instances. To do that, using any distance metric
the difference in distance between the feature vector and its
neighbors is calculated. Now, this difference is multiplied by any
random value in (0,1] and is added to the previous feature
vector. This is pictorially represented below:
Explain about your project in details and mention how did you
overcome those challenges