EDA Interview Questions
EDA Interview Questions
Interview Questions
(Practice Project)
Easy Questions
Answer: Graphical representations are used to visualize data in a way that highlights patterns, relationships,
and trends. They simplify complex data, making it easier to interpret and communicate insights
Answer: A histogram is best suited for visualizing the distribution of continuous variables
Answer: A box plot provides information about the median, quartiles, interquartile range (IQR), and potential
outliers in a dataset
Answer: A scatter plot is used to explore the relationship between two continuous variables
Answer: A bar chart displays categorical data with rectangular bars representing the frequency or value of
each category, while a histogram displays the distribution of a continuous variable by grouping data into bins
Answer: Line charts are used in time series analysis because they effectively display trends and changes over
Answer: The height of bars in a histogram represents the frequency of data points within each bin or interval.
Medium Question
What are the advantages of using a violin plot over a box plot?
Answer: Violin plots provide additional information by combining box plots and kernel density plots, allowing for
a visual comparison of data distribution across different levels of a categorical variable. They can reveal
variations and multimodal distributions that might not be visible in a box plot
Answer: Outliers in a box plot are typically represented as individual points that fall outside the whiskers, which
extend to 1.5 times the interquartile range (IQR) from the first and third quartiles
Answer: Heatmaps are significant in EDA as they provide a visual representation of data values through color
coding, which helps in identifying patterns, correlations, and the intensity of values within a matrix
Describe a situation where you would prefer a pair plot over a scatter plot.
Answer: A pair plot is preferred when you want to analyze the relationships between multiple variables
simultaneously, as it creates a matrix of scatter plots for each pair of variables, making it easier to observe
PW Skills
What insights can be gained from a 3D scatter plot that might be missed in a 2D scatter plot?
Answer: A 3D scatter plot allows for the exploration of relationships between three variables simultaneously,
providing a more comprehensive view of the data. It can reveal complex interactions and patterns that are not
visible in a 2D plot
Answer: A heatmap can be used to visualize a correlation matrix by representing the correlation coefficients
between variables with different colors. This makes it easy to identify strong positive or negative correlations
What is the main difference between a line chart and a scatter plot?
Answer: A line chart connects data points with a continuous line and is commonly used for time series data,
while a scatter plot shows individual data points without connecting lines, typically used to explore the
Hard Questions
Answer: A 3D surface plot is used to visualize the relationship between three variables by plotting data points
on a three-dimensional surface. This can help identify peaks, valleys, and trends in the data that represent
interactions between the variables. It's particularly useful for visualizing complex functions or models
What challenges might arise when interpreting a heatmap with a large dataset?
Answer: Interpreting a heatmap with a large dataset can be challenging due to overlapping data points, which
may obscure patterns. Additionally, selecting appropriate color scales and managing large amounts of
Answer: Pair plots can detect multicollinearity by visualizing the relationships between multiple variables.
Strong linear relationships between two or more independent variables in the pair plots suggest
Answer: A violin plot might provide misleading information if the data distribution is heavily skewed or if there
are too few data points, as the density estimation might exaggerate certain aspects of the distribution, leading
to incorrect interpretations
Answer: To enhance the readability of a complex 3D plot, you can use techniques such as rotating the plot for
different perspectives, adjusting the color scheme for better contrast, adding grid lines or contours, and
Answer: Graphical representations in EDA are limited by their subjective nature, as interpretations can vary
between viewers. They may also oversimplify complex data or obscure details in large datasets. Additionally,
creating accurate and effective visualizations requires skill, as poorly designed graphs can mislead or fail to
PW Skills