0% found this document useful (1 vote)
1K views10 pages

13.exploratory Data Analysis

EDA

Uploaded by

Rohit Paul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (1 vote)
1K views10 pages

13.exploratory Data Analysis

EDA

Uploaded by

Rohit Paul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Exploratory Data Analysis

Instructions:

Please share your answers filled inline in the word document. Submit Python code and R code
files wherever applicable.

Please ensure you update all the details:

Name: ROHIT PAUL

Batch Id: DS-25012022


Topic: Exploratory Data Analysis

Problem Statements:
Q1) Calculate Skewness, Kurtosis using R/Python code & draw inferences on the following data.
Hint: [Insights drawn from the data such as data is normally distributed/not, outliers, measures
like mean, median, mode, variance, std. deviation]
a. Cars speed and distance

© 2013 - 2020 360DigiTMG. All Rights Reserved.


Answer (a) : -

Skewness -0.11751 0.806895


Kurtosis 0.50899 0.405053

© 2013 - 2020 360DigiTMG. All Rights Reserved.


b. Top Speed (SP) and Weight (WT)

Answer (b) : -

© 2013 - 2020 360DigiTMG. All Rights Reserved.


Skewness 1.61145 -0.61474
Kurtosis 2.977329 0.950291

Q2) Draw inferences about the following boxplot & histogram.


Hint: [Insights drawn from the plots about the data such as whether data is normally
distributed/not, outliers, measures like mean, median, mode, variance, std. deviation]

Answer :-
Histogram :
1. The Chick Weight data is right skewed or positively skewed.
2. More than 50% Chick Weight is between 50 to 150.
3. Most of the Chick Weight is between 50 to 100.

© 2013 - 2020 360DigiTMG. All Rights Reserved.


Answer :-
1. The data is right skewed or positively skewed for the box plot.
2. There are outliers at upper side.

Q3) Below are the scores obtained by a student in tests


34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56
1) Find mean, median, variance, standard deviation.

Mean 41
Median 40.5
Variance 25.52
Standard Deviation 5.05

The above figure is “Box Plot” representation of the data distribution found using
Python Coding.

© 2013 - 2020 360DigiTMG. All Rights Reserved.


2) What can we say about the student marks? [Hint: Looking at the various measures
calculated above whether the data is normal/skewed or if outliers are present].

Answer :- The scores are in uniformly distribution data in ascending order.

Q5) What is the nature of skewness when mean, median of data is equal?
Answer :- Normalized Skewness.

Q6) What is the nature of skewness when mean > median?


Answer :- Right Skewed.

Q7) What is the nature of skewness when median > mean?


Answer :- Left Skewed.

Q8) What does positive kurtosis value indicates for a data?


Answer :- Positive values of kurtosis indicate that distribution is peaked and possesses thick
tails. Sharp peak in the plot and less gap between tails to x-axis.

Q9) What does negative kurtosis value indicates for a data?


Answer :- Negative values of kurtosis indicate that a distribution is flat and has thin tails. Border
peak under the curve and more gap between the tails to x-axis.

Q10) Answer the below questions using the below boxplot visualization.

© 2013 - 2020 360DigiTMG. All Rights Reserved.


What can we say about the distribution of the data?
Answer :- Data distribution is in De-assigned format.
What is nature of skewness of the data?
Answer :- Left side skewed data.
What will be the IQR of the data (approximately)?
Answer :- IQR of the data is calculated by [Q3 - Q1].
Q3 = 18
Q1 = 10
Q3 - Q1 = 8
Hence, IQR of the data is 8 approximately.

Q11) Comment on the below Boxplot visualizations?

Draw an Inference from the distribution of data for Boxplot 1 with respect Boxplot 2.
Hint: [On comparing both the plots, and check if the data is normally distributed/not, outliers
present, skewness etc.]

© 2013 - 2020 360DigiTMG. All Rights Reserved.


Answer :- The Boxplot 1 is designed with range = 3, The Boxplot 2 is designed with range = 1.5 .

Q12)

Answer the following three questions based on the boxplot above.


(i) What is inter-quartile range of this dataset? [Hint: IQR = Q3 – Q1]
In one line, explain what this value implies. (Hint: Based on IQR definition)

Answer :- Assuming Q3 to be 12 and Q1 to be 5,


IQR of the data is 7 (The value implies that Mean > Median)

(ii) What can we say about the skewness of this dataset?

Answer :- Since the data has higher frequency of high value the data is positively
skewed.

(iii) If it were found that the data point with the value 25 is 2.5, how would the new
boxplot be affected?
(Hint: On changing the data point from 25 to 2.5 in the data, how is it different from the
current one.)

Answer :- The new Boxplot will be effected by 3.

Q13)

© 2013 - 2020 360DigiTMG. All Rights Reserved.


Answer the following three questions based on the histogram above.
(i) Where would the mode of this dataset lie? Hint: [In terms of values On Y-axis]

Answer :- The mode of this dataset will lie on the 7 of x-axis (Values of Y).

(ii) Comment on the skewness of the dataset

Answer :- Right side skewed.

(iii) Suppose that the above histogram and the boxplot in question 2 are plotted for
the same dataset. Explain how these graphs complement each other in providing
information about any dataset. Hint: [Visualizing both the plots, draw the
insights]

Answer :- Skewness of both the plots are same which is Right Side Skewed.

© 2013 - 2020 360DigiTMG. All Rights Reserved.


Hints:
For each assignment, the solution should be submitted in the below format
1. Research and Perform all possible steps for obtaining solution
2.
3. For Statistics calculations, explanation of the solutions should be documented in black and
white along with the codes.
Must follow these guidelines:
3.1. Be thorough with the concepts of Probability, Central Limit Theorem and Perform the
calculation stepwise
3.2. For True/False Questions, or short answer type questions explanation is must

3.3. R & Python code for Univariate Analysis (histogram, box plot, bar plots etc.) the data
distribution to be attached
4. All the codes (executable programs) should execute without errors
5. Code modularization should be followed
6. Each line of code should have comments explaining the logic and why you are using that

© 2013 - 2020 360DigiTMG. All Rights Reserved.

You might also like