Problem Statement II
Problem Statement II
Problem Statement - II
Identify the missing data and use appropriate method to deal with it. (Remove
Hint: Note that in EDA, since it is not necessary to replace the missing value, but if you
have to replace the missing value, what should be the approach. Clearly mention the
approach.
Identify if there are outliers in the dataset. Also, mention why do you think it is an
outlier. Again, remember that for this exercise, it is not necessary to remove any data
points.
Identify if there is data imbalance in the data. Find the ratio of data imbalance.
Hint: How will you analyse the data in case of data imbalance? You can plot more than
one type of plot to analyse the different aspects due to data imbalance. For example, you
can choose your own scale for the graphs, i.e. one can plot in terms of percentage or
absolute value. Do this analysis for the ‘Target variable’ in the dataset ( clients with
payment difficulties and all other cases). Use a mix of univariate and bivariate analysis
etc.
Hint: Since there are a lot of columns, you can run your analysis in loops for the
appropriate columns
upGrad & IIITB | and find the insights.Live
Learn Jobs Discussions
Data Science
Program -
Explain the results
August 2023
of univariate, segmented univariate, bivariate analysis, etc. in business
terms.
Find the top 10 correlation for the Client with payment difficulties and all other cases
(Target variable). Note that you have to find the top correlation by segmenting the data
frame w.r.t to the target variable and then find the top correlation for each of the
segmented data and find if any insight is there. Say, there are 5+1(target) variables in a
dataset: Var1, Var2, Var3, Var4, Var5, Target. And if you have to find top 3 correlation, it
can be: Var1 & Var2, Var2 & Var3, Var1 & Var3. Target variable will not feature in this
or decreasing.
Include visualisations and summarise the most important results in the presentation.
You are free to choose the graphs which explain the numerical/categorical variables.
Insights should explain why the variable is important for differentiating the clients with
You need to submit one/two Ipython notebook which clearly explains the thought
process behind your analysis (either in comments of markdown text), code and relevant
plots. The presentation file needs to be in PDF format and should contain the points
discussed above with the necessary visualisations. Also, all the visualisations and plots
must be done in Python(should be present in the Ipython notebook), though they may be
Report an error
PREVIOUS NEXT