DS Interview Questions Guide 365DataScience
DS Interview Questions Guide 365DataScience
We wrote this book to help you master the art of interviewing for a data science position. From
job-specific technical questions to tricky behavioral inquires and unexpected brainteasers and guess-
timates, we will prepare you for any job candidacy in the field – data scientist, data analyst, BI analyst,
data engineer or data architect.
Data Science Interview Questions and Answers is the result of our data science expertise, direct
experience interviewing at companies, and countless conversations with job candidates.
Its goal is to teach by example - not only by giving you a list of interview questions and their an-
swers, but also by sharing the techniques and thought processes behind each question and the ex-
pected answer. Once you read it, you’ll have all the knowledge and tools to succeed during the data
science interview.
How to Use This Book for Best Results? Award yourself with enough time to work through the
questions. This way, you’ll really understand what they are asking and what information you should
highlight for the best response. If studied well, this book will enhance both your technical and commu-
nication skills.
We are excited for your data science journey to begin! Do your very best, practice, and best of luck!
2
CONTENTS
How to prepare for data science interview questions? 4
3
HOW TO PREPARE FOR DATA SCIENCE INTERVIEW QUESTIONS?
If you want to successfully land a job in data We’ll guide you all the way:
science, knowing your stuff and putting it in a neat
package with an impressive CV, an outstanding port- • from the general interview questions that you
folio, and a flashy resume will only get you halfway need to make a great first impression;
through the door. • through the fundamental technical part to
What will open it is understanding the whole • to the behavioral questions, brainteasers, and
data science interview process and how to navigate guesstimates that will help you sign that
it smoothly – from seeing that job posting to closing contract.
the deal with a welcome-to-the-team handshake.
And with this in-depth resource, we’re going to Practically, everything you need to know about
show you how to get there. all levels of preparation. And those are the insights
We’ve prepared a collection of 180 straight-to- that will ultimately help you get the job you want and
the-point data science questions paired with their you’re qualified for.
answers categorized by career paths – data scien-
tist, data analyst, BI analyst, data engineer, and data So let’s dive right in.
architect.
4
DATA SCIENTIST INTERVIEW QUESTIONS
That said, being familiar with the type of ques- Complete Data Science Training
tions you can encounter is an important aspect of If you need to build up your data science skill
your preparation process. set from scratch, feel free to explore the 365 Data
Science Program. It covers everything to get you
from beginner to job-ready - from the basics to
Below you’ll find examples of real-life questions the advanced data science topics.
and answers. Reviewing those should help you as-
sess the areas you’re confident in and where you Explore the Program
should invest additional efforts to improve.
5
DATA SCIENTIST INTERVIEW QUESTIONS / GENERAL DATA SCIENTIST INTERVIEW QUESTIONS
6
DATA SCIENTIST INTERVIEW QUESTIONS / GENERAL DATA SCIENTIST INTERVIEW QUESTIONS
Prepare a script that addresses each of the points company, or maybe you believe that one or two years
above and practice answering the “Tell me about at this job would allow you to pursue much more in-
yourself” question, as you know it’s coming your way teresting opportunities with other companies. This is
once an interview starts. why you need to be prepared and have a good an-
swer in mind.
7
DATA SCIENTIST INTERVIEW QUESTIONS / GENERAL DATA SCIENTIST INTERVIEW QUESTIONS
8
DATA SCIENTIST INTERVIEW QUESTIONS / GENERAL DATA SCIENTIST INTERVIEW QUESTIONS
6.
What is the difference
between WHERE and
HAVING clause in SQL?
distribution, or The Bell Curve, is probably the most
common distribution. There are several important
reasons:
Adding a WHERE clause to a query allows you • It approximates a wide variety of random var-
to set a condition which you can use to specify what iables
part of the data you want to retrieve from the data- • Distributions of sample means with large
base. enough sample sizes could be approximated
HAVING is a clause frequently implemented with to Normal, following the Central Limit Theo-
rem
9
DATA SCIENTIST INTERVIEW QUESTIONS / GENERAL DATA SCIENTIST INTERVIEW QUESTIONS
10
DATA SCIENTIST INTERVIEW QUESTIONS / GENERAL DATA SCIENTIST INTERVIEW QUESTIONS
11
DATA SCIENTIST INTERVIEW QUESTIONS / GENERAL DATA SCIENTIST INTERVIEW QUESTIONS
10.
R has several packages for
solving a particular
problem.
There is one subtle difference, though.
Say the range of values we’ve got is in the interval
(a, b). If the values we are predicting are inside the in-
terval (a, b), we are talking about interpolation (inter =
How do you decide which one is best to use?
between). If the values we are predicting are outside
R has extensive documentation online. There is the interval (a, b), we are talking about extrapolation
usually a comprehensive guide for the use of popu- (extra = outside).
lar packages in R, including the analysis of concrete
data sets. These can be useful to find out which ap- Here’s one example.
proach is best suited to solve the problem at hand. Imagine you’ve got the number sequence: 2, 4, _,
Just like with any other script language, it is the 8, 10, 12. What is the number in the blank spot? It is
responsibility of the data scientist to choose the best obviously 6. By solving this problem, you interpolat-
approach to solve the problem at hand. The choice ed the value.
usually depends on the problem itself or the specific Now, with this knowledge, you know the se-
nature of the data (i.e., size of the data set, the type quence is 2, 4, 6, 8, 10, 12. What is the next value in
of values and so on). line? 14, right? Well, we have extrapolated the next
Something to consider is the tradeoff between number in the sequence.
how much work the package is saving you, and how Finally, we must connect this question with data
much of the functionality you are sacrificing. science a bit more. If they ask you this question, they
It bears also mentioning that because packages are probably looking for you to elaborate on that.
come with limitations, as well as benefits, if you are Whenever we are doing predictive modeling you
working in a team and sharing your code, it might be will be trying to predict values – that’s no surprise.
wise to assimilate to a shared package culture. Interpolated values are generally considered relia-
ble, while extrapolated ones – less reliable or some-
times invalid. For instance, in the sequence from
12
DATA SCIENTIST INTERVIEW QUESTIONS / GENERAL DATA SCIENTIST INTERVIEW QUESTIONS
extremely rare to find cases where interpolation is Consider this simplified situation.
problematic. Please bear in mind that last bit and Say you work in a firm with 4 departments: IT, Mar-
don’t forget to mention it in the interview! keting, HR, and Sales. There are 1000 people in each
department, so a total of 4000 people. You want to
evaluate the general attitude towards a decision to
12.
What is the difference
between population and
sample in data?
move to a new office, which is much better on the
inside but is located on the other side of the city.
You decide you don’t really want to ask 4000
people, but 100 is a nice sample. Now, we know that
A population is the collection of all items of inter- the 4 groups are exactly equal. So, we expect that
est to our study and is usually denoted with an up- in those 100 people, we would have 25 from each
percase N. The numbers we’ve obtained when using department.
a population are called parameters.
A sample is a subset of the population and is de- 1) We pick 100 people (out of the 4000) at ran-
noted with a lowercase n, and the numbers we’ve dom and realize that we have 30 IT, 30 Marketing,
obtained when working with a sample are called 30 HR, and 10 from Sales. Obviously, the opin-
statistics. ion of the Sales department is underrepresent-
That’s more or less what you are expected to say. ed. We have a sample, which is random but not
Further, you can spend some time exploring the representative.
peculiarities of observing a population. Conversely, 2) I’ve been working in this firm for quite a while
it is likely that you’ll be asked to dig deeper into why now, so I have many friends all over it. I decide to
in statistics we work with samples and what types of ask the opinion of my friends from each depart-
samples are there. ment because I want them to feel comfortable
In general, samples are much more efficient and in the workplace. I pick 25 people from each de-
much less expensive to work with. With the prop- partment. The sample is representative but is not
er statistical tests, 30 sample observations may be random.
enough for you to take a data-driven decision.
Finally, samples have two properties: random- In the first case, we have underrepresented some
ness and representativeness. A sample can be one group of people. In the second case, we’ve made a
of those, both, or neither. To conduct statistical tests, decision based on a specific circle of people and not
which results you can use later on, your sample the general ‘public’.
needs to be both random and representative. If I want it to be random and representative, I will
pick 25 people from IT at random, then 25 people
from Marketing at random, same for HR and Sales.
13
DATA SCIENTIST INTERVIEW QUESTIONS / GENERAL DATA SCIENTIST INTERVIEW QUESTIONS
In this way, all groups will be represented, and the ating a decision tree. Whether to include these extra
sample will be random. steps really depends on the position you are apply-
You can decide to skip that detailed explanation, ing for.
or better – ask them if they want you to dive deeper If you are applying for some data science project
into the topic and then impress them with your de- management position, you may be expected to say:
tailed understanding! ‘Validate with all stakeholders to ensure the quality
of the decision tree’.
If you are applying for a data scientist position,
14
DATA SCIENTIST INTERVIEW QUESTIONS / GENERAL DATA SCIENTIST INTERVIEW QUESTIONS
This part of the process is always done by the Needless to stress, 2. and 3. would rarely be a
data scientist, ML engineer, or whoever is in data scientist’s primary job. Still, in a smaller team,
charge of the model training. that may fall on them, too!
2) Computing instance. AWS and Microsoft Az-
ure offer computing instances or cloud-based
environments that can run the model you’ve just 15. What is K-means
clustering? How can you
created. Surely, you can share the file with your select K for K-means?
colleagues through email or Messenger, but
more often, there will be some cloud that han- The main goal of clustering is to group individ-
dles the deployment. ual observations so that the observations from one
The computing instance should be set-up to group are very similar to each other. In addition, we’d
communicate with all other systems that feed the like them to be very different from the observations
inputs and/or require the outputs of the model. in other groups. There are two main types of clus-
3) Job scheduler. Having a model and a place tering: flat and hierarchical. Hierarchical clustering
to run it, you can specify when and how to run is much more spectacular because of the dendro-
it. That could be once a week, once per day, or grams we can create, but flat clustering techniques
every time an event occurs (e.g. a transaction, are much more computationally efficient. Therefore,
new user registration, etc.). At the desired time, we usually opt for the latter.
new data would be taken, loaded, cleaned, pre-
processed, fed to the model, etc. until you reach
the desired outcome.
15
DATA SCIENTIST INTERVIEW QUESTIONS / GENERAL DATA SCIENTIST INTERVIEW QUESTIONS
K-means clustering is the most prominent exam- Therefore, let’s focus on the top 3 cons of using
ple of flat clustering. a linear model.
It consists of finding K clusters, given their mean
distance from the centers of the clusters. K stands 1) Linear model implies linear relationships.
for the number of clusters we are trying to identify. A linear model assumes that the independent
This is a value, selected prior to the clustering. variables explain the dependent one(s) in a linear
Now, the optimal number of clusters is obviously way, e.g. a = bx + c. No powers, exponents, loga-
what we are usually interested in. rithms, etc. are allowed.
There are several ways to approach that, but the Obviously, this is a great simplification – the
most common one is called: ‘The Elbow Method’. real world is not linear. Using a linear model,
There, we solve the clustering problem with 1, 2, would either disregard some patterns or force us
3, 4, 5, 6 and so on number of clusters. We then plot to execute complicated transformations to reach
them on a graph where on the x-axis we’ve got the a linear representation.
number of clusters, while on the y-axis, the WCSS 2) Data must be independent.
(within cluster sum of squares). The resulting image In the general case, that’s not always true, but
resembles a human elbow. The place where the kink in 95+% of the linear models conducted in prac-
is signifies the optimal clustering solution. And that’s tice – it is. Most linear models assume that the
how you choose the ‘K’ in K-means! variables in the model are not collinear. Alterna-
tively, we observe multicollinearity or the math
16.
What are the
disadvantages of a linear
model?
behind the model estimation ‘brakes’. Assuming
that the variables are independent is obviously
a very brave statement especially because we
are limited to a linear relationship (if we had ex-
This is one of the strangest questions you could ponents and logarithms, the probability that they
be asked. It is like being asked: ‘what are the disad- are collinear would drop dramatically).
vantages of playing tennis barefoot?’ You don’t need 3) Outliers are a big, big issue.
shoes to play tennis, but it is much better if you do. Since linear models assume linearity, having
Now, the most common linear models are the lin- values that are too big, or too small regarding
ear regression model and linear time series model. any feature may be devastating for the model.
Therefore, let’s answer the question in that context. All points are expected to be close to some line,
The single biggest advantage of a linear model is which as you can imagine is rather unrealistic.
that it is simple. From there, there are mainly disad- To deal with that we often complicate the line-
vantages and limitations. ar model in ways that practically make it behave
like a non-linear one.
16
DATA SCIENTIST INTERVIEW QUESTIONS / GENERAL DATA SCIENTIST INTERVIEW QUESTIONS
17
DATA SCIENTIST INTERVIEW QUESTIONS / GENERAL DATA SCIENTIST INTERVIEW QUESTIONS
18
DATA SCIENTIST INTERVIEW QUESTIONS / TECHNICAL DATA SCIENTIST INTERVIEW QUESTIONS
Monday
o No
o Yes
1PM to 2PM
o No
o Yes
Room 160
o No
o Yes
3PM to 4PM
o No
o Yes
Room 155
o No
o Yes
And so on..
19
DATA SCIENTIST INTERVIEW QUESTIONS / TECHNICAL DATA SCIENTIST INTERVIEW QUESTIONS
Based on this tree, we would normally estimate the patterns of the population, or simply the noise in
probabilities to have the meeting in one place or an- the training data.
other.
The main issue is that this is a very bad classifi-
er. However, combining many such trees we reach
a random forest. The underlying assumption is that
21.
How to make sure you are
not overfitting while
training a model?
many bad classifiers equal a good classifier. Each
tree makes a prediction (which observation to put in First, we need to clarify what overfitting is exact-
what class) and then the class with the most “votes” ly. Usually, overfitting happens when your model fits
across all trees will be our random forest prediction. the training data so well that it misses the point. In
other words – it doesn’t look for the general patterns,
20.
What’s wrong with training
and testing a machine
learning model on the same
but for the noise in the data provided. If that hap-
pens, when provided with new data, the model be-
haves disastrously in a real-life setting.
data? Regularization - In the context of machine learn-
ing refers to the process of modifying a learning al-
This is one of the more common questions. When gorithm so as to make it simpler often to prevent
we are training a model, we are exposing it to the overfitting or to solve a badly posed problem.
‘training data’. This means it is learning the patterns • Early stopping – early stopping is the most
from it. By the end of the training, it becomes very common type of regularization. It is designed pre-
good at predicting this particular dataset. However, cisely to prevent overfitting. It consists of techniques
sometimes we may overfit. This is a situation where that interrupt the training process, once the model
we keep improving the accuracy, but not because starts overfitting.
the model is good, but just because it has learned * Here you may be expected to say ‘validation’
every little detail about the data it is given. or ‘cross-validation’. In fact, early stopping methods
If we test on that data, we will be checking the always use the outputs from the validation to deter-
accuracy of the training. This is not a test per se. mine whether to stop the training process.
That’s simply a ‘train accuracy check’. Our model will • Feature selection – for some models, having
seem to be very accurate and working properly, but useless input features leads to much worse perfor-
that is because we trained it on that same data. We mance. Therefore, you have to make sure to choose
are essentially asking the model to predict what was only the most relevant features for your problem
already predicted, which is not a hard task. otherwise this may affect (among other things) over-
To truly test a model, we must expose it to data fitting.
it has never seen before. This will reveal if it learned
20
DATA SCIENTIST INTERVIEW QUESTIONS / TECHNICAL DATA SCIENTIST INTERVIEW QUESTIONS
• Ensembling. Ensembles are methods to com- That is where cross-validation comes in.
bine several base models in order to produce one Cross-validation does the same thing as simple
optimal predictive model. A good example of the validation, but it first divides the dataset into equal
ensemble method is Random Forest (a collection of parts (5,10,20 depending on the size of data). To
decision trees). cross-validate, it sets aside the first part and trains
on the remaining parts. Then it sets aside the 2nd
It is very important to realize that overfitting is an part and trains on the remaining ones (this time, in-
extremely important issue. Every model will overfit if cluding the first part). We continue in that way, utiliz-
no preventative techniques have been implement- ing a different subset for each validation. In that way,
ed. Therefore, you should always aim to apply one the model gets exposed to all the data in contrast to
or more of these techniques in your model building conventional validation.
efforts.
Cross-validation refers to many model validation This is practical knowledge that can be tested
techniques that use the same dataset for both train- with a coding task, but it’s possible you are asked
ing and validation. Usually, it is on a rotational ba- this question as a stand-alone. In that case, you can
sis so that observations are not overexposed to the ask about the use case for the numbers you’re gen-
training process and thus can serve as better vali- erating.
dation. It is mainly used in settings where the goal is First, if you need a data table for the sake of having
prediction, and one wants to estimate how accurate- data to test on, you can just use one of R’s preloaded
ly a predictive model will perform in practice. datasets. You can access the list by calling data().
Why do we even need to validate? If you’d like to still create a table from scratch, you
Well, when you use sample data (so most of the can use any of the random generator functions in R
time), you need to make sure that your model is not to generate random numbers according to a distri-
overfitting the parameters. bution, and store them in a matrix or a data frame.
So how do we validate? We take out like 10% The functions are:
of the data for later use and train on the remaining
90%. Once we are done, we validate on the 10% we • runif()
set aside at the beginning. This is a pretty common • rnorm()
practice but has one major drawback - some of the • rbinom()
data (these 10% precisely) is not really utilized in the • rexp()
training process.
21
DATA SCIENTIST INTERVIEW QUESTIONS / TECHNICAL DATA SCIENTIST INTERVIEW QUESTIONS
22
DATA SCIENTIST INTERVIEW QUESTIONS / TECHNICAL DATA SCIENTIST INTERVIEW QUESTIONS
23
DATA SCIENTIST INTERVIEW QUESTIONS / TECHNICAL DATA SCIENTIST INTERVIEW QUESTIONS
tion, relations between tables are expressed in the In most cases, the tools form the Data Manipula-
following way – the column name that designates tion Language (DML) will allow you to do that. Usu-
the logical match is a foreign key in one table, and ally, you could either use a SELECT DISTINCT state-
it is connected with a corresponding column from ment to select distinct rows only or apply a GROUP
another table. Often, the relationship goes from a BY clause to a join to filter the data in the desired
foreign key to a primary key, but in more advanced way.
circumstances, this will not be the case. To catch the
24
DATA SCIENTIST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
BEHAVIORAL QUESTIONS
25
DATA SCIENTIST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
process. Instead, focus on some of the aspects that important. It means that you want to put quality in
we listed above and customize them to the specific your work and create value for the company. Inter-
position that you are applying for. nal drive is probably the best reason to go the extra
What you say while answering this question is mile; you are willing to do what is necessary in order
not the only important thing. Your interviewer will be to be good at what you do.
eager to see that all signs point in the same direc-
tion. Try to show that you are excited through your An example of such a situation:
voice, posture and body language. This can be the During your previous internship experience, you
critical difference that will determine whether or not put in a lot of extra effort in order to show that your
you will be selected. tutor who also recruited you did not make a mistake.
You stayed late and studied during the weekend be-
26
DATA SCIENTIST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
27
DATA SCIENTIST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
heading in the right direction. By the end of the sec- and implement everything that I learned in univer-
ond semester, your GPA was slightly higher than the sity so I could perform great. Trying to implement
average for the class. complex models and “doing too much” is something
that I need to control in the future.
This experience allowed me to understand that
is OK to say that you do not like to speak in public. clients multiple tasks/stress?
However, if you are applying for a consulting or an in-
vestment bank job you should not say that, because Each of these aspects can be really important for
public speaking can be essential for those profes- a given position and the Hiring Manager will want to
sions. make sure that you are the right person that he/she
Choose a weakness that you can turn into a posi- is looking for. Try to figure out the most important
tive. “I am usually not good at…but I am making an ef- characteristics of the job that you are applying for.
fort to improve that”. Avoid cliché answers like “I work Are you expected to do multitasking? What part of
too hard” and “I am a perfectionist”. No one is perfect your overall responsibilities would be related to fi-
– that is why you need to indicate a weakness when nancial figures? Are you going to interact with many
you are asked about one. This shows that you are people?
self-aware and have listened to feedback. Based on your findings, you will know what to
expect. Prepare good examples from your past that
An example of such a situation: can serve as proof of your statements.
The tutor at my previous internship gave me
some interesting feedback: “Don’t try to do too
much.” I remembered that and had a chance to re-
flect on it, once the internship was over. He was right;
I tried to do too much. I was eager to prove myself
28
DATA SCIENTIST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
29
DATA SCIENTIST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
30
DATA SCIENTIST INTERVIEW QUESTIONS / BRAINTEASERS
BRAINTEASERS
31
DATA SCIENTIST INTERVIEW QUESTIONS / GUESSTIMATE
GUESSTIMATE
32
DATA ANALYST INTERVIEW QUESTIONS
33
DATA ANALYST INTERVIEW QUESTIONS / GENERAL DATA ANALYST INTERVIEW QUESTIONS
Answer Example
“I believe the largest data set I’ve worked with
was within a joint software development project.
The data set comprised more than a million records
and 600-700 variables. My team and I had to work
with Marketing data which we later loaded into an
analytical tool to perform EDA.”
34
DATA ANALYST INTERVIEW QUESTIONS / GENERAL DATA ANALYST INTERVIEW QUESTIONS
35
DATA ANALYST INTERVIEW QUESTIONS / GENERAL DATA ANALYST INTERVIEW QUESTIONS
How to Answer
Strong presentation skills are extremely valua-
ble for any data analyst. Employers are looking for
candidates who not only possess brilliant analytical
skills, but also have the confidence and eloquence
to present their results to different audiences, in-
cluding upper-level management and executives,
and non-technical coworkers.
So, when talking about the audiences you’ve
presented to, make sure you mention the following:
36
DATA ANALYST INTERVIEW QUESTIONS / GENERAL DATA ANALYST INTERVIEW QUESTIONS
time-efficient when it comes to passing through all These are the most important Python libraries
the security. Moreover, I learned how important it is you should mention.
to clearly state the reasons for requiring certain data Numpy is an essential library, as it used for ma-
for my analysis.” trices and arrays and includes methods for their ma-
nipulation.
Pandas is the second library, which is used in al-
37
DATA ANALYST INTERVIEW QUESTIONS / GENERAL DATA ANALYST INTERVIEW QUESTIONS
Sci-py boasts an impressive number of mathe- The first use case is whenever we’ve got a cate-
matical algorithms and high-level commands and gorical outcome. Examples are: Yes/No, Will buy/
classes to help data scientists in their data analysis Won’t buy, and 0/1 situations. As any other classi-
tasks. Scikit learn was originally developed during fication method, a logistic regression would output
a “Google Summer of Code” project, as a third par- the category it deems most probable to be the an-
ty extension for Scipy. Scikit learn includes various swer.
classification, regression, and clustering algorithms,
designed to be incorporated with the Scipy and Speaking of probabilities, we reach the second
Numpy packages. use case. We could employ a logistic regression to
And once you’re done with machine learning, determine the exact probability that an event is go-
you’ll also need a good way of visualizing the re- ing to occur.
sults. Matplotlib\ Seaborn are-visualization libraries, The mechanics of the two use cases follow the
which are great for that. same path.
Tensorflow, Keras and Pytorch are libraries for For instance, imagine a logistic regression pre-
deep learning. If you want to train neural networks, dicts that a customer is 70% likely to buy and 30%
for example in the context of NLPs or Computer likely to not buy. Under these conditions, the pre-
Vision, these are the way to go. Here knowing the diction will be classified as ‘Will buy’. Depending on
difference between Tensorflow 1 and Tensorflow 2 our needs we could use one the probabilistic rep-
could be a bonus during an interview. resentation or simply the output class.
Finally, it is useful to note that we were discuss-
38
DATA ANALYST INTERVIEW QUESTIONS / GENERAL DATA ANALYST INTERVIEW QUESTIONS
10.
Have you worked with
comparatively large data sets
in a project?
11.
Which tools have you used in
each stage of your previous
data analysis projects?
39
DATA ANALYST INTERVIEW QUESTIONS / GENERAL DATA ANALYST INTERVIEW QUESTIONS
12.
In large companies, data is
often stored in multiple data
warehouses.
How to Answer
The technical complexity of your work as a Data
Analyst may vary depending on the size of the com-
panies you have worked at in the past. Strong tech-
nical skills is an important attribute of a Data Ana-
lyst’s background. Having experience retrieving data
from multiple data warehouses demonstrates your
understanding of databases, data structures, and
programming languages.
The size of the companies you’ve worked for can
affect the technical complexity of your tasks as a
data analyst. That said, a strong technical skillset is
always a plus in the eyes of your future employer.
So, having retrieved data from multiple data ware-
houses in your work on past projects will showcase
your expertise in databases and data structures, as
well as in programming languages.
Answer Example
“I’ve had the chance to work for a big corpora-
tion in the past. I can say my work there has been of
great importance to developing my technical skillset.
Once, I queried against 5 different data warehouses
to retrieve the data for a large-scale company pro-
ject. Once I had all the necessary records and vari-
ables, I built a dataset I later utilized in my analysis.”
40
DATA ANALYST INTERVIEW QUESTIONS / TECHNICAL DATA ANALYST INTERVIEW QUESTIONS
Answer Example
“When it comes to data analysis tools, I can say
I’m a traditionalist. That’s why I find Microsoft Excel
and Microsoft Access most useful. I feel truly com-
fortable working with those, and they’re available in
almost every company out there. Moreover, you can
achieve great results with them with the right train-
ing.”
41
DATA ANALYST INTERVIEW QUESTIONS / TECHNICAL DATA ANALYST INTERVIEW QUESTIONS
models?
42
DATA ANALYST INTERVIEW QUESTIONS / TECHNICAL DATA ANALYST INTERVIEW QUESTIONS
Answer Example
“In my line of work, I’ve used basic statistics –
mostly calculated the mean and standard variances,
18.
How many years of SQL
programming experience
do you have?
as well as significance testing. The latter helped
In your latest job, how many of your analyti-
me determine the statistical significance of meas-
cal projects involved using SQL?
urement differences between two populations for
a project. I’ve also determined the relationship be-
How to Answer
tween 2 variables in a data set, working with correla-
SQL is considered as one of the easiest scripting
tion coefficients.”
languages to learn. So, if you want to be competi-
tive in the job market as a Data Analyst, you should
43
DATA ANALYST INTERVIEW QUESTIONS / TECHNICAL DATA ANALYST INTERVIEW QUESTIONS
44
DATA ANALYST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
BEHAVIORAL QUESTIONS
Answer Example
“In my work with stakeholders, it often comes
down to the same challenge – facing a question I
don’t have the answer to, due to limitations of the
gathered data or the structure of the database. In
such cases, I analyze the available data to deliver
answers to the most closely related questions. Then,
I give the stakeholders a basic explanation of the
current data limitations and propose the develop-
ment of a project that would allow us to gather the
45
DATA ANALYST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
46
DATA ANALYST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
47
DATA ANALYST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
professional experience?
How to Answer
More and more data analyst job postings require
28.
Give me an example of a
time when you worked as a
team.
48
DATA ANALYST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
• Listen actively
• Respect others 29. Describe a time when you
failed to meet your goals.
• Appreciate other work styles
Some failure in life is inevitable. Those who are
Keep in mind these qualities when you think of a brave and bold attempt many new things and thus
story when you were part of a team. The story should fail much more often. Don’t be afraid to explain a
demonstrate not only the fact that you were part of time when you wanted to achieve something, but
the team, but also that you were a great one too. you were not able to do it. Chances are that the in-
Here’s an example of such a situation: terviewer is more interested in learning how you han-
A group assignment during the last year of my dled the failure that you experienced. He wants to
studies required me and four of my classmates to know whether you learned from your mistakes and
perform a detailed Company Valuation. whether you are motivated to succeed in the future.
This was a pretty difficult task that included a When you think of a story, don’t pick a major fail-
significant amount of work. The deadline for submit- ure and try to choose a story where external factors
ting the complete work was in 2 weeks. At the time, influenced your failure as well. Inexperience on your
I was busy filling out internship applications and had part is OK too, given that you are in the early stages
to prepare for some of my other exams. This was the of your career. Don’t point out as a reason for your
case for the other team members as well. failure qualities that can have a negative impact on
Nevertheless, all of us concentrated full-time on your work in the future (for example attention to de-
the project, as I understood that this was the only tail, ability to handle pressure, etc.).
way we could have respected the tight deadline im- It is very important to show that you turned a neg-
posed. Another interesting thing about the project ative situation into a valuable learning experience.
was that we managed to work well together, despite This will make a great impression on the Interviewer.
the different styles that each group member had.
We listened actively and were open to the ideas that Here’s an example of such a situation:
the others had. Given that we came from a different Last year, I was eager to find a summer internship
background, each of us certainly added value to the opportunity, but I wasn’t able to do that. One of the
project. Good communication helped us coordinate main reasons behind this was the tough job market
our responsibilities and integrate the separate piec- that we are currently facing. Along with that, I believe
es of work that we were assigned individually. I was too inexperienced and did not realize how diffi-
cult it was to find a good opportunity.
This year I had a totally different approach. You
could say I learned my lesson perfectly. So, I started
preparing myself since November and created
49
DATA ANALYST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
a shortlist of opportunities that I wanted to pursue. independent and are able to learn fast, even when
Then I researched all potential employers and chose they are under pressure. Does that make sense?
the ones that were really interesting. I had more time Job-Seeker: Sure, it does. I can imagine that the
to work on my CV and Cover Letters and to prepare environment in which your firm operates requires
for interviews. Of course, I wasn’t going to make the such qualities. This is precisely what made me apply
same mistake twice. for this position in the first place. I want to be a part
of your dynamic environment. I am able to learn fast
and adapt to changing circumstances quite easily.
30. Why should we hire you?
Sounds much better, right?
This question is very similar to “How would you In order to respond successfully to this question,
add value to our company”. The Hiring Manager you need to communicate well with the interview-
challenges you to sell him/her the idea of you be- er and understand exactly what they are looking for.
ing hired. Your profile is the product that needs to be Otherwise, you simply don’t know why they should
sold. Remember the example that we gave with the hire you, leaving your answer to be a shot in the dark.
pen?
Most people will start listing their qualities and
qualifications, hoping that they will touch the right
nerve along the way. But that is not the way to go.
The Hiring Manager has read your CV, he/she
already knows about your credentials. What he/she
wants to understand is whether you can handle a
tough question and be persuasive while making a
valid point. Try to open your answer with a question
instead:
Manager: Let me ask you, with so many people Data Science Career Guide
applying for this job, why should we hire you?
Job-Seeker: A great question. But I would like to Discover everything you need to know to
launch a successful career in data science - ed-
ask you something as well. Can I? ucation, skills, career paths, salaries, data sci-
Manager: Sure, go ahead. ence job openings, career advice, and more.
Job-Seeker: What makes a great Analyst with
your firm?
Download Career Guide
Manager: We are looking for people who are very
50
DATA ANALYST INTERVIEW QUESTIONS / BRAINTEASERS
BRAINTEASERS
51
DATA ANALYST INTERVIEW QUESTIONS / GUESSTIMATE
GUESSTIMATE
52
BI ANALYST INTERVIEW QUESTIONS
53
BI ANALYST INTERVIEW QUESTIONS / GENERAL BI ANALYST INTERVIEW QUESTIONS
Answer Example
“I’m a Finance Graduate specialized in Business
Administration. My education has helped me greatly
on my business intelligence career path, as my inter-
est and expertise evolved in fields such as business
law, microeconomics, and financial accounting.”
How to Answer
A seasoned BI analyst will have exposure to sys-
tems development life cycle (SDLC) and user ac-
ceptance testing (UAT). When a company introduces
a new software or application to their business, the
transition must be well thought out, carefully tested,
54
BI ANALYST INTERVIEW QUESTIONS / GENERAL BI ANALYST INTERVIEW QUESTIONS
Answer Example
“Although I have limited exposure to SDLC, I’ve
been involved in the UAT phase of some projects.
4. What is your opinion about Agile
software development for BI
projects?
I enjoy analyzing which aspects of a new software
program or application are the hardest to imple- Do you support employing Agile methodol-
ment, which are the easiest to accommodate, and ogies with your company’s clients?
how to proceed from there.”
How to Answer
Agile software development has received a warm
55
BI ANALYST INTERVIEW QUESTIONS / GENERAL BI ANALYST INTERVIEW QUESTIONS
a company?
into practice in your organization.”
How to Answer
Being able to work in a cross-functional environ-
ment is certainly a plus for larger companies. Hiring
56
BI ANALYST INTERVIEW QUESTIONS / GENERAL BI ANALYST INTERVIEW QUESTIONS
managers are aware that you’ll probably have to col- of college, think of a presentation you had to pre-
laborate on projects with teams from other depart- pare as a part of your education. Of course, it would
ments, such as HR, IT, or Marketing. Therefore, they be more than great if you have a sample of your best
want to know more about your exposure to the chal- presentation on your phone or tablet to show to the
lenges that may arise in this line of work. That said, hiring manager.
make sure you share how you’ve solved any issues
you’ve faced in your experience. Answer Example
“One of the presentations I’m proud of was re-
Answer Example lated to the launching of a client’s new app. I had to
“In my last job as a business intelligence analyst, share the results from the preliminary user testing.
I was often exposed to cross-functional teamwork. What I came up with was an engaging presentation
I’ve mostly worked with our HR and IT departments. with lots of eye-catching visuals. I believe the latter,
In my experience, if the team is attuned to the needs together with intriguing content, is key to a well-re-
of the company for that particular project, it can turn ceived presentation. I highlighted both the areas of
out to be a huge success. I do my best to commu- strength and the areas of improvement. After that,
nicate expectations clearly. In addition, I take into I shared some actionable tips for product improve-
account that everyone has different work styles, ment with the client. The feedback was positive, and
strengths, and weaknesses. Usually, that largely de- I can actually show you a copy of my presentation on
pends on their expertise and job role.” my tablet.”
How to Answer
How to Answer As a business intelligence analyst, you should
As a business intelligence analyst, giving pres- understand what the acronym INVEST means to
entations to the executives of your company or the technical teams and product managers. It stands for:
company’s clients, will be an important part of your
work. You’ll often be expected to extract the insights – Independent
from the data, prepare the presentation, along with – Negotiable
compelling visuals and dashboards, and then deliv- – Valuable
er it – all by your own efforts. If you have plenty of – Estimable
experience, discuss the topic of your presentations – Sized appropriately
and the feedback you received. If you’re straight out – Testable
57
BI ANALYST INTERVIEW QUESTIONS / GENERAL BI ANALYST INTERVIEW QUESTIONS
If you’re familiar with the term, break down each will certainly be helpful to my clients, as I build up
word to show the interviewers you know what you’re my professional portfolio. So, earning a Six Sigma
talking about. If not, make sure you show interest certification is definitely an option I intend to explore
in understanding the concept and which industries in the future.”
mostly use it.
Answer Example
“I’ve mostly worked in the banking and telecom-
11. What does the acronym
PEST mean?
10.
Are you Six Sigma certified?
Do you think that’s important
and why?
have some knowledge of PEST and how it works.
However, if you haven’t had the chance to em-
ploy PEST in your work experience, show the hiring
How to Answer manager you have a basic idea of the concept and
A Six Sigma certification is not a must, but it’s that you’re more than willing to apply this form of
certainly a plus for a BI analyst. Six Sigma certifica- analysis in your future job.
tions have different levels, starting from white belt
Answer Example
through yellow, green and black belts to master
“I am just starting my career in business intelli-
black belt and champion belt. If you have complet-
gence, so I haven’t applied PEST analysis in my work
ed the training, talk about your experience, the skills
just yet. Nevertheless, I’ve implemented PEST in a
you’ve acquired, and how you apply them in your job
case study while in college. I had to discover the po-
as a BI analyst. If not, share your perspective on why
litical, economic, social, and technological factors
you would consider taking the training.
affecting the airline industry in recent years. I think
Answer Example it’s a really efficient type of analysis and I’d be happy
“Although I haven’t started any Six Sigma training to become proficient in it in the future.”
yet, I’m aware that expertise in lean management
58
BI ANALYST INTERVIEW QUESTIONS / TECHNICAL BI ANALYST INTERVIEW QUESTIONS
Answer Example
“I do most of my data modeling in Excel, as I find
it most convenient for data mapping. I have some
exposure to Power BI, as well. However, I believe I
can benefit from sharpening my skills in that pro-
gram. That’s why I’m currently taking a Power BI on-
line training.”
How to Answer
Your BI analyst experience and skillset are close-
ly related to the focus of your career. Depending on
whether you are a data BI analyst, an IT BI analyst, or
59
BI ANALYST INTERVIEW QUESTIONS / TECHNICAL BI ANALYST INTERVIEW QUESTIONS
60
BI ANALYST INTERVIEW QUESTIONS / TECHNICAL BI ANALYST INTERVIEW QUESTIONS
Answer Example
“Benchmarking is an important practice of com- How to Answer
paring your business against other businesses that The interviewer wants to see what you know
are already very successful. It’s like a smart, analyti- about decision-making and what techniques you
cal comparison. I believe it’s essential to benchmark use to arrive at reliable conclusions in your projects.
when a company is looking at making a significant Some of the common decision-making techniques
change, are seeing a loss of revenue, are anticipating are T-Chart Analysis, Pareto Analysis, a.k.a. the 80/20
the launch of a new product, or need to recalibrate rule, etc. Discuss the techniques you utilize with the
their business operations in one way or another.” interviewer and the reasons for your preferences.
61
BI ANALYST INTERVIEW QUESTIONS / TECHNICAL BI ANALYST INTERVIEW QUESTIONS
62
BI ANALYST INTERVIEW QUESTIONS / TECHNICAL BI ANALYST INTERVIEW QUESTIONS
63
BI ANALYST INTERVIEW QUESTIONS / TECHNICAL BI ANALYST INTERVIEW QUESTIONS
64
BI ANALYST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
BEHAVIORAL QUESTIONS
Answer Example
“As a business intelligence analyst, I like to keep
everyone in the loop about the development of a
project. I often promote the use of project manage-
ment apps that make collaboration easier and gives
access to every detail of the project at any stage.”
complete?
How to Answer
A great business analyst knows that when a cli-
ent signs off a project, it doesn’t mean it’s success-
ful (or finished) yet. So, make sure you explain to the
interviewer that you remain available to your clients
and you support them until you’re sure their expec-
tations are met and they are happy with the results.
65
BI ANALYST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
coworkers?
How to Answer
28.
How do you respond when
you’re unhappy with the
end result of a project?
Having regular discussions with other team
members is of great importance when it comes to How to Answer
project plans and aligning ideas. Let the interviewer Even the best BI analysts experience failure at
know that you’re a team player who is open to oth- times. Not all projects are perfect, and not all clients
ers’ views and opinions. can be satisfied. What the interviewer would like to
know is if you’re capable of accepting disappoint-
Answer Example ment and responding in a mature and productive
“I believe learning from each other’s working way.
styles and approaches is invaluable for any project.
I support the collaborative spirit in my team and I’m Answer Example
sure we always come up with better ideas together “I think business intelligence requires perfection-
rather than individually.” ism at all times. When I’m not happy with my perfor-
mance, or I make a mistake, I take a step back and
How to Answer
Confidentiality agreements ensure the protec-
66
BI ANALYST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
year?
How to Answer
interested in learning saucy details about the bad
habits of that other person. Instead, he/she wants to
know more about your conflict management abili-
Employers are seeking BI analysts who are con- ties. He/she is eager to learn whether you are an ac-
stantly upgrading their skills and strive to stay rel- tive listener and whether you are good at persuad-
evant. You can set career development goals and ing people.
accomplish them by attending conferences, earn- Every behavioral question comes together with
ing online certificates, listening to podcasts, or even a story that supports the answer given by the candi-
joining a mentoring program. When you mention date.
some of these examples and the goals you’ve set for When you answer this question, try to think of
yourself this year, make sure you bridge the knowl- a disagreement that was not personal, but derived
edge you’ll gain with the benefits you’ll bring to the from different views regarding the execution of a
company. certain task. It is much safer to have this type of dis-
agreement, as it does not suggest you are someone
Answer Example that is difficult to work with.
“This year, I’ve enrolled in a Power BI online There are a few key points which you should con-
course to refresh my expertise, and I’ve also signed centrate on:
up for a few TDWI seminars in Predictive Analytics
• You listened actively
and Data Modeling. I can’t wait to take my skills to
• looked for the best possible solution
another level and, hopefully, apply what I’ve learned
• had at heart the team’s success rather than
as a BI analyst in your company.”
showing muscles
• were persuasive
67
BI ANALYST INTERVIEW QUESTIONS / BEHAVIORAL QUESTIONS
Learn Power BI
Learn More
68
BI ANALYST INTERVIEW QUESTIONS / BRAINTEASERS
BRAINTEASERS
69
BI ANALYST INTERVIEW QUESTIONS / GUESSTIMATE
GUESSTIMATE
70
DATA ENGINEER INTERVIEW QUESTIONS
71
DATA ENGINEER INTERVIEW QUESTIONS / GENERAL DATA ENGINEER INTERVIEW QUESTIONS
Answer Example
“Ever since I was a child, I have always had a keen
interest in computers. When I reached senior year in
high school, I already knew I wanted to pursue a de-
gree in Information Systems. While in college, I took
some math and statistics courses which helped me
land my first job as a Data Analyst for a large health-
care company. However, as much as I liked applying
my math and statistical knowledge, I wanted to de-
velop more of my programming and data manage-
ment skills. That’s when I started looking into data
engineering. I talked to experts in the field and took
online courses to learn more about it. I discovered it
was the ideal career path for my combination of in-
terests and skills. Luckily, within a couple of months,
72
DATA ENGINEER INTERVIEW QUESTIONS / GENERAL DATA ENGINEER INTERVIEW QUESTIONS
2.
What do you think is the
hardest aspect of being a data
engineer?
How did you eventually solve it?
How to answer
This question gives you the perfect opportunity
How to answer to demonstrate your problem-solving skills and how
Smart hiring managers know not all aspects of you respond to sudden changes in the plan. The
a job are easy. So, don’t be hesitate to answer this question could be data-engineer specific or a more
question honestly. You might think Its goal isn’t to general one about handling challenges. Even if you
make you pinpoint a weakness. But, in fact, what the don’t have this particular experience, you can still
interviewer wants to know is how you managed to give a satisfactory hypothetical answer.
resolve something you struggled with.
Answer Example
Answer Example “In my previous work experience, my team and I
“As a data engineer, I’ve mostly struggled with have always tried to be prepared for any issues that
fulfilling the needs of all the departments within the may arise during the ETL process. Nevertheless,
company. Different departments often have conflict- every once in a while, a problem will occur com-
ing demands. So, balancing them with the capabil- pletely out of the blue. I remember when that hap-
ities of the company’s infrastructure has been quite pened while I was working for a franchise compa-
challenging. Nevertheless, this has been a valua- ny. Its system required for data to be collected from
ble learning experience for me, as it’s given me the various systems and locations. So, when one of the
chance to learn how these departments work and franchises changed their system without prior notifi-
their role in the overall structure of the company.” cation, it created quite a few loading issues for their
store’s data. To deal with this issue, first I came up
with a short-term solution to get the essential data
into the company’s corporate wide-reporting sys-
tem. Once I took care of that, I started developing
a long-term solution to prevent such complications
from happening again.”
73
DATA ENGINEER INTERVIEW QUESTIONS / GENERAL DATA ENGINEER INTERVIEW QUESTIONS
74
DATA ENGINEER INTERVIEW QUESTIONS / GENERAL DATA ENGINEER INTERVIEW QUESTIONS
R reads data from a decent number of sources, The UNION command is very similar to the JOIN
like text, Excel, SPSS, SAS, Stata, systat… with text, command, as they are both used to select related
and more specifically, CSV, being the most popular. information from multiple tables. However, the UN-
Depending on the format of the data, you’d need to ION command selects only columns of the same
use different packages to import it into R. data type. Furthermore, UNION selects distinct val-
In terms of syntax, there is nothing too shocking ues only, i.e. it combines the result set of two or more
about the operations – a standard read call is used SELECT statements. In contrast, UNION ALL selects
in most situations. all values (without eliminating duplicate rows).
Importing text files is fairly straightforward.
The user can use the barebones read.table() func-
tion from the built-in {utils} package, and set all rele-
vant arguments, or opt for using read.csv() which has
8. What programming/scripting
languages have you used?
Answer Example
“I have worked with both Python and SQL. How-
ever, I’m most comfortable using Python, due to the
75
DATA ENGINEER INTERVIEW QUESTIONS / GENERAL DATA ENGINEER INTERVIEW QUESTIONS
nature of the tasks in the previous company I worked showed that certain employee profiles result in con-
for. I understand that SQL is preferred, and I can as- siderable increases in sales for a significant period
sure you I can advance my SQL skills quickly on the of time. I take pride in this discovery, as HR data had
job. I’m a quick learner and learning new concepts never been cross-referenced with sales data for an-
has always come easy to me.” alytical purposes in this company before.”
76
DATA ENGINEER INTERVIEW QUESTIONS / GENERAL DATA ENGINEER INTERVIEW QUESTIONS
Pipeline or Database, or a
more Generalist role?
way. Yes, it’s true that compared to a data analyst, a
data engineer’s work is much less analytical in na-
ture. However, this doesn’t mean that data engineers
How to Answer lack analytical skills or that they don’t implement
A data engineer’s role heavily depends on the them at all. When giving your answer, tell the hiring
size of the company and the specific tasks they’re manager how you view your role as a data engineer
assigned. Generalists employ a variety of skills, and how you’ve used your analytics skills on the job.”
as they are responsible for many different tasks. If
you’re focused on Pipeline, this means you have ex- Answer Example
perience in working closely with data scientists and “I’d have to say I firmly disagree with this state-
have a better understanding of how to prepare data ment. I’ve used my analytical skills on numerous
for analysis. Data engineers who have worked most- occasions. As a data engineer, I’ve often performed
ly in Database, have in-depth knowledge of the ETL analyses to ensure the high quality and integrity of
process and table schemas. No matter which role/s the data. My analytical skills have also helped me
you have been in, include all your experiences in immensely in my mutual projects with data scientists
your answer. You can also go into moderate detail in and data analysts. Thanks to my analytical mindset,
explaining why you prefer one type over the other. I’ve been able to identify and help them with their
data needs.”
Answer Example
“I’ve always worked in more of a Generalist role.
I can say I like this one more than the other types
because I like having a broader scope of expertise. I
enjoy being in-the-know about the whole structure
and process, as opposed to focusing on just one
subset of skills I’ve acquired.”
77
DATA ENGINEER INTERVIEW QUESTIONS / GENERAL DATA ENGINEER INTERVIEW QUESTIONS
How to Answer
Technology’s constantly changing, so, if you’re
setting high goals for yourself, this question may
prompt you to list several trainings you’d like to fit in
your schedule. However, make sure you convey that
you’d like to complete these courses as they cover
topics of interest and not to make up for weakness-
es in your preparation. Balance your answer by men-
tioning your strengths and the skills you’ve already
acquired.
Answer Example
“I think enrolling in trainings is crucial for any data
engineer that wants to be up-to-date with the ad-
vancements in the industry. Personally, I’d like to ex-
pand my current expertise in ETL processes and the
cloud environment. Although I have significant ex-
perience working with both, I believe my future work
can only benefit from continuous learning.”
Enroll in Training
78
DATA ENGINEER INTERVIEW QUESTIONS / TECHNICAL DATA ENGINEER INTERVIEW QUESTIONS
Answer Example
“I have experience with various ETL tools, such
as IBM Infosphere, SAS Data Management, and SAP
Data Services. However, if I have to pick one as my
favorite, that would be Informatica’s PowerCenter. In
my opinion, what makes it the best out there is its
efficiency. PowerCenter has a very top performance
rate and high flexibility which, I believe, are the most
important properties of an ETL tool. They guarantee
access to the data and smoothly running business
data operations at all times, even if changes in the
business or its structure take place.
79
DATA ENGINEER INTERVIEW QUESTIONS / TECHNICAL DATA ENGINEER INTERVIEW QUESTIONS
15.
Have you built data
systems using the Hadoop
mind the limited resources for the project. Not to
mention that it’s Java-based, so it was easy to use by
framework? everyone on the team and no additional training was
required.”
If so, please describe a particular project
you’ve worked on.
How to Answer
16.
Do you have experience
with a cloud computing
environment?
Hadoop is a tool that many hiring managers ask
about during interviews. You should know that when- What are the pros and cons of working in
ever there’s a specific question like that, it’s highly one?
likely that you’ll be required to use this particular
tool on the job. So, to prepare, do your homework How to Answer
and make sure you’re familiar with the languages Data engineers are well aware that there are pros
and tools the company uses. More often than not, and cons to cloud computing. That said, even if you
you can find that information in the job description. If lack prior experience working in cloud computing,
you’re experienced with the tool, give a detailed ex- you must be able to demonstrate a certain level of
planation of your project to highlight your skills and understanding of its advantages and shortcomings.
knowledge of the tool’s capabilities. In case you ha- This will show the hiring manager that you’re aware
ven’t worked with this tool, the least you could do is of the present technological issues in the industry.
do some research to demonstrate some basic famil- Plus, if the position you’re interviewing for requires
iarity with the tool’s attributes. using a cloud computing environment, the hiring
manager will know that you’ve got a basic idea of
Answer Example the possible challenges you might face.
“I’ve used the Hadoop framework while work-
ing on a team project focused on increasing data Answer Example
processing efficiency. We chose to implement it “I haven’t had the chance to work in a cloud
because of its ability to increase data processing computing environment yet. However, I have a good
speeds while, at the same time, preserving quality overall idea of its pros and cons. On the plus side,
through its distributed processing. We also decided cloud computing is more cost-effective and relia-
to implement Hadoop because of its scalability, as ble. Most providers sign agreements that guaran-
the company I worked for expected a considerable tee a high level of service availability which should
increase in its data processing needs over the next decrease downtimes to a minimum. On the nega-
few months. In addition, Hadoop is an open-source tive side, the cloud computing environment may
network which made it the best option, keeping in compromise data security and privacy, as the data
80
DATA ENGINEER INTERVIEW QUESTIONS / TECHNICAL DATA ENGINEER INTERVIEW QUESTIONS
81
DATA ENGINEER INTERVIEW QUESTIONS / TECHNICAL DATA ENGINEER INTERVIEW QUESTIONS
18.
What is your experience
level with NoSQL
databases?
more coming from NoSQL databases in the future.
That said, the additional training some developers
might need is certainly worth it.
82
DATA ENGINEER INTERVIEW QUESTIONS / TECHNICAL DATA ENGINEER INTERVIEW QUESTIONS
20.
Have you ever taken part in a
data disaster recovery
situation?
21.
Have you ever created
custom analytics
applications?
If so, describe what happened and how you If so, please share details about the applica-
solved the issue at hand. tion you’ve built.
83
DATA ENGINEER INTERVIEW QUESTIONS /BEHAVIORAL DATA ENGINEER INTERVIEW QUESTIONS
Answer Example
“It’s true that data maintenance may come off as
routine. But, in my opinion, it’s always a good idea to
closely monitor the specified tasks. And that includes
making sure the scripts are executed successfully.
Once, while I was conducting an integrity check, I lo-
cated a corrupt index that could have caused some
serious problems in the future. This prompted me to
come up with a new maintenance task that prevents
corrupt indexes from being added to the company’s
databases.”
84
DATA ENGINEER INTERVIEW QUESTIONS /BEHAVIORAL DATA ENGINEER INTERVIEW QUESTIONS
85
DATA ENGINEER INTERVIEW QUESTIONS /BEHAVIORAL DATA ENGINEER INTERVIEW QUESTIONS
How to Answer
26.
Have you ever played an
active role in solving
a business problem through
One of the things hiring managers value most the innovative use of existing
is constant improvements of the existing environ- data?
ment, especially if you initiate those improvements
yourself, as opposed to being assigned to do it. So, How to Answer
if you’re a self-starter, definitely point this out. This Hiring managers are looking for self-motivated
will showcase your ability to think creatively and people who are eager to contribute to the success
the importance you place on the overall company’s of a project. Try to give an example where you came
success. If you lack such experience, explain what up with a project idea or you took charge of a project.
changes you would propose as a data engineer. In It’s best if you point out what novel solution you pro-
case your ideas were not implemented for reasons posed, instead of focusing on a detailed description
such as lack of financial resources, you can mention of the problem you had to deal with.
that. However, try to focus on your continuous efforts
to find novel ways to improve data quality. Answer Example
“In the last company I worked for, I took an active
Answer Example part in a project that aimed to identify the reason’s
“Data quality and reliability have always been a for the high employee turnover rate. I started by
top priority in my work. While working on a specific closely observing data from other areas of the com-
project, I discovered some discrepancies and out- pany, such as Marketing, Finance, and Operations.
liers in the data stored in the company’s database. This helped me find some high correlations of data
Once I’ve identified several of those, I proposed to in these key areas with employee turnover rates.
develop and implement a data quality process in Then, I collaborated with the analysts in those de-
my department’s routine. This included bi-weekly partments to gain a better understanding of the cor-
meetups with coworkers from different departments relations in question. Ultimately, our efforts resulted
where we would identify and troubleshoot data is- in strategic changes that had a positive influence on
sues. At first, everyone was worried that this would employee turnover rates.”
take too much time off their current projects. How-
86
DATA ENGINEER INTERVIEW QUESTIONS /BEHAVIORAL DATA ENGINEER INTERVIEW QUESTIONS
How to Answer
Although technical skills are of major importance
if you want to advance your data engineer career,
there are many non-engineering skills that could aid
your success. In your answer, try to avoid the most
obvious examples, such as communication or inter-
personal skills.
Answer Example
“I’d say the most useful skills I’ve developed over
the years are multitasking and prioritizing. As a data
engineer, I have to prioritize or balance between
various tasks daily. I work with many departments in
the company, so I receive tons of different requests
from my coworkers. To cope with those efficient-
ly, I need to put fulfilling the most urgent company
needs first without neglecting all the other requests.
And strengthening the skills I mentioned has really
helped me out.”
87
DATA ENGINEER INTERVIEW QUESTIONS / BRAINTEASERS
BRAINTEASERS
88
DATA ENGINEER INTERVIEW QUESTIONS / GUESSTIMATE
GUESSTIMATE
89
DATA ARCHITECT INTERVIEW QUESTIONS
90
DATA ARCHITECT INTERVIEW QUESTIONS / GENERAL DATA ARCHITECT INTERVIEW QUESTIONS
1.
Have you ever taken part in
improving a company’s existing
data architecture?
Answer Example
“In my work experience, marrying external data
with internal data in corporate systems can pose
a variety of threats to data integrity. That’s why I
launched a project where I established a step-by-
step screening process for our 3-rd party purchased
data. I also managed to further improve the relation-
ship with our data supplier, who, in turn, agreed to
run a few checks on their data before sending it to
us. This initiative had a positive impact on the com-
pany’s data reliability and decreased database er-
rors by 29% within 1 year.”
91
DATA ARCHITECT INTERVIEW QUESTIONS / GENERAL DATA ARCHITECT INTERVIEW QUESTIONS
2.
As a data architect, have you
faced any challenges related to
the company’s data security?
3. As a data architect, you should
be up to date with the latest
technologies and developments
in the field.
How did you ensure the integrity of the data
was not compromised? How do you keep yourself informed about
the new trends in data architecture?
How to Answer
Data security is a top priority for every company. How to Answer
That’s why hiring managers would like to learn more When working in a technical role, it’s common to
about your experience with data security issues. get absorbed in the company’s current processes
When answering this question, emphasize that data and miss out on the latest industry developments.
security is an important aspect of your job, although Hiring managers will value your willingness to ed-
your background isn’t focused in that particular field. ucate yourself despite your busy schedule. So, try
to list news resources you’re subscribed to, and
Answer Example mention some conferences or trainings, or industry
“When working in a team, it’s sometimes hard to events you attend when you have the chance.
agree on what could pose a security risk. I remem-
ber a situation when some colleagues of mine want- Answer Example
ed to change the established process for upload- “I do my best to stay informed about the latest
ing franchise data to our system. I was sure theses industry trends and technology advancements. I be-
changes could result in security risks. So, in order to lieve this helps me learn things that can improve my
validate my point, I calculated the possible financial work… Or inspire me to come up with an idea that will
loss to the company in case security was compro- benefit the company’s status quo. I’m subscribed
mised. This prompted the team members to modify to newsfeeds such as InformationWeek and Tech-
their plan to strengthen data security measures.” NewsWorld. I also attend 2-3 conferences a year
where I network with other professionals in the field.
Whenever my schedule allows it, I attend special-
ized trainings and seminars.
92
DATA ARCHITECT INTERVIEW QUESTIONS / GENERAL DATA ARCHITECT INTERVIEW QUESTIONS
linked between tables. Referential integrity is major- • You’re a team player who is willing to help
ly important -if a database lacks referential integrity, others
this can result in return of incomplete data without • You relate well to people
any indication of an error.
For instance, we can say the foreign key in a cer- The second aspect that is important about this
tain child table maintains the referential integrity question is the method that you used when you
within the database by referencing a valid, existing were teaching. How did you share your knowledge?
primary key in the parent table. Did you have to use some special technique in order
A foreign key in SQL is defined through a foreign to explain a given concept? Did you have a strategy
key constraint. This type of constraint verifies that the that helped to facilitate learning? Perhaps you pro-
values in the child and parent tables match. There- vided valid practical examples?
fore, referential integrity doesn’t allow us to add re-
cords to a related table unless there is an associated Here’s an example of such a situation:
record in the primary table. It also prevents us from You can say that you always wanted to teach your
changing values in a primary table that would lead younger brother how to create good PowerPoint
to orphaned records in a related table. Moreover, it presentations. At first, it was difficult because it was
makes it impossible to delete records from a primary very hard to get his attention. Then you proposed
table in case there are matching related records. creating a presentation together – a presentation
To visualize how the fields from the various tables about his favorite motorbike company. He instantly
within a database refer to each other, people usually agreed because it was something that he was inter-
use Entity-Relationship diagrams (ER diagrams), or, ested in sharing with his friends and perhaps post in
the simpler and handier tool – relational schemas. one of his favorite forums. At first, you were the one
who was working with the mouse and the keyboard,
93
DATA ARCHITECT INTERVIEW QUESTIONS / GENERAL DATA ARCHITECT INTERVIEW QUESTIONS
how different departments use
the company’s stored data?
it is important for a data architect
to have an in-depth
understanding of the business
How to Answer and its strategic challenges.
Different departments have different data needs.
How have you approached this requirement
And, as a data architect, you must have the abili-
in your past position?
ty to work with people from non-technical back-
grounds to understand how they use the available How to Answer
data. When you answer this question, do your best Missing the bigger picture is a common prob-
to convey that you’re willing to educate yourself to lem for data architects, due to the technical nature
improve your job and better serve the company’s of their work. With your answer, you have to reassure
data requirements. the hiring manager that you’re capable of taking pro-
active steps and stay on track with the overall busi-
Answer Example ness strategy and goals of the company.
“As a data architect, understanding the work of
my colleagues in different departments has always Answer Example
been important to me. In my previous workplace, I’d “In my experience as a data architect, I’ve learned
regularly meet with reps from other teams to dis- that in order to improve my performance, I have to
cuss their current and future projects. I would ask a be constantly aware of the company’s short-term
series of questions, instead of making assumptions. and long-term goals. This is why I’ve been proactive
This approach has allowed me to correctly identify in my communication with management and c-level
and plan their data needs.” executives. I’ve also attended corporate trainings on
a regular basis. This has given me a chance to ask
the right questions to the right people.”
Become a savvy data science profes-
sional that meets every employer’s re-
quirements and business needs. Learn
data science at your own pace from an
all-in-one structured training.
8. What is/(are) your greatest
strength/(s)?
Start Learning Today vor than “What is your biggest weakness?” Never-
theless, you need to prepare to answer it, because it
94
DATA ARCHITECT INTERVIEW QUESTIONS / GENERAL DATA ARCHITECT INTERVIEW QUESTIONS
• Communicator
• Motivator
• Team player
• Problem Solver
95
DATA ARCHITECT INTERVIEW QUESTIONS / TECHNICAL DATA ARCHITECT INTERVIEW QUESTIONS
Answer Example
“In my work experience, the cause for external
data integration issues is usually a different system
that creates the data in an incompatible format. Un-
fortunately, it isn’t possible for all companies to use
the same systems. So, I solved this problem by cre-
ating and running a script prior to uploading the data
in my company’s warehouse tables. The script not
only changed the external data format but also ran
tests to ensure the new format was compatible with
our systems.”
96
DATA ARCHITECT INTERVIEW QUESTIONS / TECHNICAL DATA ARCHITECT INTERVIEW QUESTIONS
10.
Have you worked with open
source technology? Tell us
about some issues you have
est and most intuitive way to explain the difference
between the inner, left, and right joins is by using a
Venn diagram, which shows all possible logical rela-
comes across when using it. tions between data sets.
The INNER JOIN lets us select all records from
How to Answer Table A and Table B, as long as there is a match be-
When an interviewer asks a specific question like tween the columns.
that, the company is either considering using open
Inner join
source technology in the future or is already utiliz-
ing it. If you have relevant experience, give some
particular examples. And be sure you also highlight
your ability to modify the open-source programming
code. If you haven’t encountered any problems us-
ing it, mention any possible disadvantages to open
source technology you’re aware of.
Answer Example
“I’ve worked with both Hadoop and MySQL with-
out facing any major problems. Nevertheless, I re-
alize that using open source databases or software
utilities has its drawbacks. For example, you have to
rely on advice from user forums, as there is no for-
Left join
mal customer support to address your issue. Anoth-
er thing is that developers don’t spend a lot of time
on their user interface, so you may lack the resourc-
es you need to get started.”
How to Answer
The basic types of SQL joins are: inner, left, and
right (in SQL theory, there is one more type of join –
full. However, it is used very rarely today). The easi-
97
DATA ARCHITECT INTERVIEW QUESTIONS / TECHNICAL DATA ARCHITECT INTERVIEW QUESTIONS
The SQL LEFT JOIN returns all records from the ships between tables, not the tables themselves.
left table, plus the matched values from the right ta- In the relational schemas form of representa-
ble. In case there are no matches, the left join still tion, relations between tables are expressed in the
returns all rows from the left table and a NULL value following way – the column name that designates
from the right. the logical match is a foreign key in one table, and
Regarding the functionality of the SQL RIGHT it is connected with a corresponding column from
JOINS – it is identical to LEFT JOINS, but with the another table. Often, the relationship goes from a
opposite direction of the operation. foreign key to a primary key, but in more advanced
circumstances, this will not be the case. To catch the
relations on which a database is built, we should al-
How to Answer
A primary key is a column (or a set of columns)
whose value exists and is unique for every record in 13. How many types of data
structures does R have?
a table. It’s important to know that each table can
have one and only one primary key. How to Answer
Therefore, you can think of a primary key as the This question is important because virtually
field (or group of fields) that identifies the content of everything you do in R involves data in some shape
a table in a unique way. For this reason, the primary or form. The most commonly used data structures in
keys are also called the unique identifiers of a table. R are these:
Another crucial feature of primary keys is they
• Vectors (atomic and lists);
cannot contain null values. This means, in an exam-
• Matrixes;
ple with a single-column primary key, there must al-
• Data frames;
ways be a value inserted in the rows under this col-
• Factors.
umn. You cannot leave it blank.
One last remark about primary keys - not all ta-
bles you work with will have a primary key, although
almost all tables in any database will have a sin-
gle-column or a multi-column primary key.
A foreign key, instead, is a column (or a set of
columns) that references a column (most often the
primary key) of another table. Foreign keys can be
called identifiers, too, but they identify the relation-
98
DATA ARCHITECT INTERVIEW QUESTIONS / TECHNICAL DATA ARCHITECT INTERVIEW QUESTIONS
How to Answer
Having established processes to ensure the qual-
ity of data is key to a company’s data infrastructure.
With this question, the hiring manager wants to as-
99
DATA ARCHITECT INTERVIEW QUESTIONS / TECHNICAL DATA ARCHITECT INTERVIEW QUESTIONS
sess your relevant experience. Make sure you high- Here’s an example of such a situation.
light the particular dimensions you’ve monitored to While working for my previous employer, I was
validate the data quality. part of a project aiming to make data more acces-
sible to all company employees. Each department’s
Answer Example data was siloed and team members in other de-
“I’ve always been involved in ensuring data partments couldn’t access it. Acquiring data outside
quality in my job as a data architect. My team and one’s own department was a dull and tiresome pro-
I monitored some specific dimensions to validate cess that prevented timely analyses. I actively took
the quality of data. These included completeness, part in making data sharing among the company’s
uniqueness, timeliness, validity, accuracy, and con- departments easy without compromising data se-
sistency. Monitoring these dimensions helped us curity. Thus, analysts were able to complete their
detect inconsistencies that could negatively affect projects in time using a much more robust dataset
the accuracy of data analysis.” than before. This made it possible for senior man-
agement to make fast and better-informed strategic
decisions.
to a company’s data
management systems and 18. What issues have you
faced while leading teams
The data needs of companies change and hiring Tell us how you solved these issues.
managers want to make sure they hire an architect
that will not only adapt to the new requirements but How to Answer
will also take up the initiative to implement these You can approach this question in a more gener-
changes and introduce some new improvements. If al way, or describe a real situation you and your team
you are just beginning your career as a Data Archi- have faced when working on a specific task. Either
tect and you don’t have experience in dealing with way, make sure you point out your problem-solv-
such changes, think of a hypothetical situation that ing skills and the ability to work in a team to reach a
will demonstrate your problem-solving skills and common goal.
hands-on approach to challenges.
100
DATA ARCHITECT INTERVIEW QUESTIONS / TECHNICAL DATA ARCHITECT INTERVIEW QUESTIONS
Answer Example
“In my experience as a data architect, I’ve often
worked with teams to develop changes in the data
architecture of our company. Of course, people on
a team come from different backgrounds and have
varying opinions that affect their priorities. What
I’ve discovered is that making compromises is cru-
cial to the success of the task, along with staying
open-minded to others’ ideas. That said, once we’ve
identified our common goals, a consensus has al-
ways been easy to reach.”
Learn More
101
DATA ARCHITECT INTERVIEW QUESTIONS / BEHAVIORAL DATA ARCHITECT INTERVIEW QUESTIONS
Answer Example
“I believe a good data architect should under-
stand the needs of the different departments across
the company. That said, I’ve had to work with people
who don’t fully understand my role and responsibil-
ities on numerous occasions. Some of my cowork-
ers would pose requests that I had to reject due to
our data architecture limitations. And that has lead to
certain tensions. I’d say overcoming such challenges
takes time. Gradually, we learned more about each
other’s work which helped us brainstorm possible
solutions. All in all, making the extra step to educate
myself and the others made has made all the differ-
ence.”
102
DATA ARCHITECT INTERVIEW QUESTIONS / BEHAVIORAL DATA ARCHITECT INTERVIEW QUESTIONS
Answer Example
“I’d describe my work style as collaborative. I How to Answer
like to work on full-team participation projects and There are a lot of factors that may influence your
co-create with my teammates. If I’m not sure of the decision to take on a new job. These include:
direction I should take on a project, I always consult
with my team. This way we can work toward consen- - career growth opportunity;
sus and align our ideas.” - compensation;
- work/life balance;
103
DATA ARCHITECT INTERVIEW QUESTIONS / BEHAVIORAL DATA ARCHITECT INTERVIEW QUESTIONS
Answer Example
“The most important factors for me, as a data 24. How would you assess your
performance in the interview
architect, are the company’s industry and the work- so far?
place culture. The first one predefines the projects I’ll
be involved in. The second one indicates if the work How to Answer
environment will be positive and teamwork-orient- This is a question you should answer openly.
ed. To me, those are equally important to compen- Generally, you would know if you performed well, or
sation and benefits.” if your interview were a disaster. In fact, if you ad-
dress the issues of your performance, you might get
104
DATA ARCHITECT INTERVIEW QUESTIONS / BEHAVIORAL DATA ARCHITECT INTERVIEW QUESTIONS
105
DATA ARCHITECT INTERVIEW QUESTIONS / BEHAVIORAL DATA ARCHITECT INTERVIEW QUESTIONS
106
DATA ARCHITECT INTERVIEW QUESTIONS / BRAINTEASERS
BRAINTEASERS
107
DATA ARCHITECT INTERVIEW QUESTIONS / GUESSTIMATE
GUESSTIMATE
108
This brings our list of 180 data science interview questions to an end.
We believe this concise guide will help you “expect the unexpected” and enter
your first data science interview with confidence.
Of course, when it comes to preparing for a data science career, and data science
interview questions in particular, more is more. So, make sure you check out our
career resources, as they will help you on the path towards your professional data
science goals.
109
ABOUT THE AUTHORS
110
Ready to take the next step?
The 365 Data Science Program offers a complete data science training
taught by expert instructors with a fun, interactive, and beginner-friendly ap-
proach. The courses start with the fundamentals, cover in-demand program-
ming languages like Python, R, and SQL, visualization tools like Power BI and
Tableau, and finish off with advanced specialized courses, including
state-of-the-art Machine and Deep Learning.
You can try the course (12 hours of video instruction) for free.
111