13.exploratory Data Analysis
13.exploratory Data Analysis
Instructions:
Please share your answers filled inline in the word document. Submit Python code and R code
files wherever applicable.
Problem Statements:
Q1) Calculate Skewness, Kurtosis using R/Python code & draw inferences on the following data.
Hint: [Insights drawn from the data such as data is normally distributed/not, outliers, measures
like mean, median, mode, variance, std. deviation]
a. Cars speed and distance
Answer (b) : -
Answer :-
Histogram :
1. The Chick Weight data is right skewed or positively skewed.
2. More than 50% Chick Weight is between 50 to 150.
3. Most of the Chick Weight is between 50 to 100.
Mean 41
Median 40.5
Variance 25.52
Standard Deviation 5.05
The above figure is “Box Plot” representation of the data distribution found using
Python Coding.
Q5) What is the nature of skewness when mean, median of data is equal?
Answer :- Normalized Skewness.
Q10) Answer the below questions using the below boxplot visualization.
Draw an Inference from the distribution of data for Boxplot 1 with respect Boxplot 2.
Hint: [On comparing both the plots, and check if the data is normally distributed/not, outliers
present, skewness etc.]
Q12)
Answer :- Since the data has higher frequency of high value the data is positively
skewed.
(iii) If it were found that the data point with the value 25 is 2.5, how would the new
boxplot be affected?
(Hint: On changing the data point from 25 to 2.5 in the data, how is it different from the
current one.)
Q13)
Answer :- The mode of this dataset will lie on the 7 of x-axis (Values of Y).
(iii) Suppose that the above histogram and the boxplot in question 2 are plotted for
the same dataset. Explain how these graphs complement each other in providing
information about any dataset. Hint: [Visualizing both the plots, draw the
insights]
Answer :- Skewness of both the plots are same which is Right Side Skewed.
3.3. R & Python code for Univariate Analysis (histogram, box plot, bar plots etc.) the data
distribution to be attached
4. All the codes (executable programs) should execute without errors
5. Code modularization should be followed
6. Each line of code should have comments explaining the logic and why you are using that