0% found this document useful (0 votes)

4 views

UNIT - 1 EDA Continuation

The document provides an overview of Exploratory Data Analysis (EDA) and data visualization techniques using Python libraries such as Matplotlib and Seaborn. It covers various visual aids including line charts, bar charts, scatter plots, and histograms, along with practical examples of data generation and manipulation. Additionally, it discusses data transformation techniques and compares different visualization libraries based on syntax, plot types, interactivity, and customization.

Uploaded by

mk4997320

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

UNIT - 1 EDA Continuation

Uploaded by

mk4997320

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 113

TOPIC - 7

VISUAL AIDS
FOR EDA
AD3301 DATA EXPLORATION AND VISUALIZATION LT
PC

3024

OBJECTIVES:

TO OUTLINE AN OVERVIEW OF EXPLORATORY DATA ANALYSIS.

TO IMPLEMENT DATA VISUALIZATION USING MATPLOTLIB.
TO PERFORM UNIVARIATE DATA EXPLORATION AND ANALYSIS.
TO APPLY BIVARIATE DATA EXPLORATION AND ANALYSIS.
TO USE DATA EXPLORATION AND VISUALIZATION TECHNIQUES FOR MULTIVARIATE
AND TIME SERIES DATA.

UNIT I EXPLORATORY DATA ANALYSIS

EDA FUNDAMENTALS – UNDERSTANDING DATA SCIENCE – SIGNIFICANCE OF EDA –

MAKING SENSE OF DATA – COMPARING EDA WITH CLASSICAL AND BAYESIAN
ANALYSIS – SOFTWARE TOOLS FOR EDA - VISUAL AIDS FOR EDA- DATA
TRANSFORMATION TECHNIQUES-MERGING DATABASE, RESHAPING AND PIVOTING,
• Line chart
Python Libraries
• Bar chart
• Scatter plot
• Area plot & stacked plot
• Pie chart
• Table chart
• Polar chart
• Histogram
A RT ?
E C H
L I N
LINE CHART
• Line chart is used to illustrate
the relationship between two
or more continuous variables.
• Used to plot time series lines.
E R ? ?
FAK
‘faker’ - Python library - We have
created a function using the faker
Python library to generate the
dataset.
t a s e t
l e d a
s i m p .
t e a m n s
n e r a c o l u
Ge t t w o a t e ‘
h j u s i s ‘D
w i t l u m n i s
rs t c o l u m n
e fi d c o
Th s e c o n s t o c k
t h e g t h e
a n d c a t i n
, i n d i a t e .
i c e ‘ h a t d
‘Pr e o n t
p r i c
# Import Necessary Libraries

import datetime
import random
import radar
import pandas as pd

datetime - Provides classes for manipulating dates

and times
random - Allows you to generate random numbers
radar - It seems to be a library used for generating
random dates
pandas - A powerful data analysis and manipulation
library
# Function Definition

def generateData(n):

A function named “generateData” which takes an

integer “n” as input

# Variable Initialization

listdata = []
start = datetime.datetime(2019, 8, 1)
end = datetime.datetime(2019, 8, 30)

• Initializes an empty list “listdata“ to store generated data.

• Defines start and end variables representing the start and
end dates (August 1, 2019, to August 30, 2019).
# Data Generation Loop

for _ in range(n):
date = radar.random_datetime(start='2019-08-01',
stop='2019- 08-30').strftime("%Y-%m-%d")
price = round(random.uniform(900, 1000), 4)
listdata.append([date, price])

• Iterates “n“ times.

• Generates a random date between August 1, 2019, and
August 30, 2019, using the radar.random_datetime function.
It then formats the date to the "YYYY-MM-DD" format.
• Generates a random floating-point number between 900 and
1000 and rounds it to 4 decimal places.
• Appends the date and price as a list to listdata.
# Creating DataFrame
df = pd.DataFrame(listdata, columns=['Date', 'Price'])
Converts the “listdata” list of lists into a pandas DataFrame
with columns 'Date' and 'Price’

# Date Formatting
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')

Converts the 'Date' column in the DataFrame to datetime

format for proper date handling

# Data Aggregation

df = df.groupby(by='Date').mean()
Groups the DataFrame by the 'Date' column and calculates
the mean (average) of the 'Price' for each unique date
import datetime
import random
import radar
import pandas as pd
def generateData(n):
listdata = []
start = datetime.datetime(2019, 8, 1)
end = datetime.datetime(2019, 8, 30)
delta = end - start
for _ in range(n):
date = radar.random_datetime(start='2019-08-01', stop='2019-08-30').strftime("%Y-
%m-%d")
price = round(random.uniform(900, 1000), 4)
listdata.append([date, price])
df = pd.DataFrame(listdata, columns=['Date', 'Price'])
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
df = df.groupby(by='Date').mean()
return df
Output
df =
generateData(50)
df.head(10)
CREATING THE LINE CHART
# Import the matplotlib library
import matplotlib.pyplot as plt
# Plot the graph
plt.plot(df)
# Display it on the screen
plt.show()
A RT ?
R C H
BA
BAR CHART
• Bars can be drawn horizontally or vertically to
represent categorical variables.
• Bar charts are frequently used to distinguish objects
between distinct collections in order to track variations
over time.
• Bar charts are very convenient when the changes are
large.
REAL TIME EXAMPLE
• A pharmacy in Norway keeps track of the
amount of Zoloft sold every month using Bar
chart.
Note: Zoloft is a medicine prescribed to patients
suffering from depression.
N DAR
CAL E
?
‘Calendar’ Python library to keep track
of the months of the year (1 to 12)
corresponding to January to December.
# Import Necessary Libraries

import numpy as np
import calendar
import matplotlib.pyplot as plt

Numpy- library for numerical computations in

Python
Matplotlib - plotting library in Python for creating
visualizations
# Creating Lists

months = list(range(1, 13))

sold_quantity = [round(random.uniform(100, 200)) for x
in range(1, 13)]

• ‘months’ is a list containing numbers from 1 to 12,

representing the months of the year.
• ‘sold_quantity‘ is a list comprehension that generates
12 random floating-point numbers between 100 and
200, rounded to the nearest integer.
import numpy as np
import calendar
import matplotlib.pyplot as plt
months = list(range(1, 13))
sold_quantity = [round(random.uniform(100,
200)) for x in range(1, 13)]
figure, axis = plt.subplots()
plt.xticks(months, calendar.month_name[1:13],
rotation=20)
plot = axis.bar(months, sold_quantity)
for rectangle in plot:
height = rectangle.get_height()
axis.text(rectangle.get_x() +
rectangle.get_width() /2., 1.002 *
height, '%d' % int(height), ha='center', va =
'bottom')

plt.show()
TT E R
SC A
LO T ?
P
SCATTER PLOT
• Scatter plots are also called scatter
graphs, scatter charts, scattergrams,
and scatter diagrams.
• They use a Cartesian coordinates
system to display values of typically
two variables for a set of data.
WHEN SHOULD WE USE A
SCATTER PLOT?
• Scatter plots can be constructed in the following two situations:
￭ When one continuous variable is dependent on another
variable, which is under the control of the observer
￭ When both continuous variables are independent
• Scatter plots are used when we need to show the relationship
between two variables, and hence are sometimes referred to as
correlation plots
REAL TIME EXAMPLE
1. The number of hours of sleep required by a
person depends on the age of the person.
2. The average income for adults is based on the
number of years of education.
• Display a scatter plot for sleep vs. age
dataset and Iris dataset
B B L E
B U
A R T ?
C H
BUBBLE CHART
• A bubble plot is a manifestation
of the scatter plot where each
data point on the graph is shown
as a bubble.
• Each bubble can be illustrated
with a different color, size, and
appearance.
Display a Bubble plot for Iris
dataset
SCATTER PLOT USING SEABORN

• A scatter plot can also be

generated using the seaborn
library.
• Seaborn makes the graph visually
better.
E A &
A R T ?
P LO
C K E D
S TA
AREA & STACKED PLOT
• The stacked plot owes its name to the
fact that it represents the area under a
line plot and that several such plots
can be stacked on top of one another,
giving the feeling of a stack.
• The stacked plot can be useful when
we want to visualize the cumulative
effect of multiple variables
ART ?
I E C H
P
PIE CHART
• This is one of the more interesting types of
data visualization graphs.
• The main reason is that people love circles.
• The purpose of the pie chart is to
communicate proportions.
• Use “Pokemon dataset” to visualize pie
chart.
A RT ?
L E C H
TA B
TABLE CHART
• A table chart combines a bar chart
and a table.
• Consider standard LED bulbs that
come in different wattages.
• The standard Philips LED bulb can
be 4.5 Watts, 6 Watts, 7 Watts, 8.5
Watts, 9.5 Watts, 13.5 Watts, and
15 Watts.
• Year, Wattage and Units are the
attributes.
A RT ?
R C H
P O L A
POLAR CHART
• A polar chart is a diagram that is plotted on
a polar axis.
• Its coordinates are angle and radius.
• It is also referred to as a spider web plot.
• Assume you have five courses in your
academic year & you planned to obtain the
following grades in each subject,
plannedGrade = [90, 95, 92, 68, 68, 90]
but after your final examination, these are the
grades you got:
actualGrade = [75, 89, 89, 80, 80, 75]
Create a polar plot for the above
RA M ?
T OG
H IS
HISTOGRAM
• A Histogram plots is a type of frequency graph used to depict the
distribution of any continuous variable.
• These types of plots are very popular in statistical analysis.
• Consider the following use cases. A survey created in vocational
training sessions of developers had 100 participants. They had several
years of Python programming experience ranging from 0 to 20. Use
histogram to plot distribution of python programming experience in the
vocational training.
E E N
B E W
T
E N C E
I F F E R RT &
D C H A
B A R M ?
G R A
I S T O
H
I P O P
LO L L
A R T ?
C H
LOLLIPOP CHART
• A lollipop chart can be used to
display ranking in the data.
• It is similar to an ordered bar
chart.
• Create a lollipop char for
Highway Mileage using car
dataset.
T H E
S I NG
H O O T ?
C H AR
E S T C
B
OTHER LIBRARIES TO EXPLORE

• So
Python Libraries
far, we have seen
different types of 2D and 3D
visualization techniques
using matplotlib and
seaborn.
TLI B?
T PLO
M A
MATPLOTLIB
• Matplotlib is the most widely used data visualization
library in Python.
• It provides a low-level API for creating a wide range of
plots, from simple line graphs to complex 3D plots.
• Matplotlib is highly customizable and provides
complete control over every aspect of the plot.
R N?
E ABO
S
SEABORN
• Seaborn is a high-level data visualization library built
on top of Matplotlib.
• It provides a wide range of statistical visualizations
and is particularly useful for exploring relationships
between variables.
• Seaborn has a clean and modern look and can
generate complex plots with minimal code.
T LY ?
PLO
PLOTLY
• Plotly is a web-based data visualization library that
provides highly interactive and customizable plots.
• It provides a wide range of visualizations, from basic
line and scatter plots to complex 3D plots and maps.
• Plotly is particularly useful for creating interactive
dashboards and reports.
B O R N
S E A
PA R E V S
COM T L I B
T P LO
M A
VS O T LY
PL
• Seaborn, Matplotlib and plotly has been compared based
on the four factors:

1. Syntax and API

2. Types of Plots

3. Interactivity

4. Customization
SYNTAX & API
• Seaborn provides a high-level API that is easy to use
and requires minimal code to generate complex plots.
• Matplotlib, on the other hand, provides a low-level API
that provides complete control over every aspect of the
plot but can be challenging to use.
• Plotly provides an intermediate-level API that is easy to
use and provides a wide range of customization
options.
TYPES OF PLOTS
• Seaborn provides a wide range of statistical visualizations
that are particularly useful for exploring relationships
between variables.
• Matplotlib provides a broad range of plot types, from simple
line and scatter plots to complex 3D plots.
• Plotly provides a wide range of interactive visualizations
that are useful for creating interactive dashboards and
reports.
INTERACTIVITY

• Seaborn and Matplotlib provide limited interactivity, while

Plotly provides highly interactive and responsive plots that

can be zoomed, panned, and rotated.

CUSTOMIZATION
• Seaborn provides a limited range of customization options
but can generate visually appealing plots with minimal code.
• Matplotlib provides complete control over every aspect of the
plot, making it highly customizable.
• Plotly provides a wide range of customization options and
provides support for themes and color scales.
TRY IT!
• Consider we have a data set of dimension 300
(n) × 50 (p). n represents the number of
observations, and p represents the number of
predictors/ attributes. How many scatter plots
are possible to analyze the variable
relationship?
TOPIC - 8
DATA
TRANSFORMATION
DATA TRANSFORMATION
• Data transformation is a set of techniques used to convert data from
one format or structure to another format or structure.
• Data transformation is the process where you extract data, sift through
data, understand the data, and then transform it into something you
can analyze.
• Raw or source data is often:
• Inconsistent: It contains both relevant and irrelevant data.
• Imprecise: It contains incorrectly entered information or missing
values.
• Repetitive: It contains duplicate data.
Types of Data Transformation
• Data Deduplication 8. Data Integration
• Key Restructuring 9. Data Filtering
• Data Cleansing 10. Data Joining
• Data Validation 11. Binning
• Format Revisioning 12. Data Splitting
• Data derivation 13. Data Summarization
• Data aggregation 14. Normalization &
TOPIC - 9
MERGING DATABASE,
RESHAPING AND PIVOTING,
TRANSFORMATION
TECHNIQUES
• Consider two courses Software Engineering course & an
Introduction to Machine Learning course and there are
enough students to split into two classes.
• The examination for each class and for each course was done
in two separate buildings and graded by four different
professors.
• Create a dataset using the above information.
DATASET
d f o r
d u s e
e th o
M i n g ?
m e rg
Methods used for merging
• Concat ()
• df.merge ()
• Append
• Join
Pandas concat() method
Pandas concat() method:

dataframe = pd.concat([dataFrame1,
dataFrame2], ignore_index=True)

dataframe
d ex ?
re _i n
i g n o
IGNORE_INDEX
• The ignore_index argument creates a new index; in its

absence, we'd keep the original indices.

a x i s ?
AXIS
• To combine the dataframe together in the same
direction, axis = 0 is used.
• To combine the dataframe side by side, axis=1 is
used.

pd.concat([dataFrame1, dataFrame2], axis=1)

TRY IT!
• Assume your head of department walked up to your desk
and started bombarding you with a series of questions:
• How many students appeared for the exams in total?
• How many students only appeared for the Software
Engineering course?
• How many students only appeared for the Machine
Learning course?
Data frames for both subject
import pandas as pd
df1SE = pd.DataFrame({ 'StudentID': [9, 11, 13, 15, 17, 19, 21, 23, 25, 27,
29], 'ScoreSE' : [22, 66, 31, 51, 71, 91, 56, 32, 52, 73, 92]})
df2SE = pd.DataFrame({'StudentID': [2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22,
24, 26, 28, 30],
‘ScoreSE': [98, 93, 44, 77, 69, 56, 31, 53, 78, 93, 56, 77, 33, 56, 27]})
df1ML = pd.DataFrame({ 'StudentID': [1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21,
23, 25, 27, 29],
'ScoreML' : [39, 49, 55, 77, 52, 86, 41, 77, 73, 51, 86, 82, 92, 23, 49]})
df2ML = pd.DataFrame({'StudentID': [2, 4, 6, 8, 10, 12, 14, 16, 18, 20],
'ScoreML': [93, 44, 78, 97, 87, 89, 39, 43, 88, 78]})
1. Concatenate along with an axis
dfSE = pd.concat([df1SE, df2SE],

ignore_index=True)

dfML = pd.concat([df1ML, df2ML],

ignore_index=True)

df = pd.concat([dfML, dfSE], axis=1)

df
Pandas df.merge()
method
df.merge() method can be used
along with joins
Types of Joins

• Inner Join
• Outer Join
• Left Join
• Right Join
• The inner join takes the intersection from two or more
dataframes, which corresponds to the INNER JOIN in SQL.
• The outer join takes the union from two or more
dataframes, which corresponds to the FULL OUTER JOIN in
SQL.
• The left join uses the keys from the left-hand dataframe
only, which corresponds to the LEFT OUTER JOIN in SQL.
• The right join uses the keys from the right-hand dataframe
only, which corresponds to the RIGHT OUTER JOIN in SQL
2. Use df.merge with an inner join
df.merge() is used to get a list of students who
appeared in both the courses.
dfSE = pd.concat([df1SE, df2SE],
ignore_index=True)
dfML = pd.concat([df1ML, df2ML],
ignore_index=True)
df = dfSE.merge(dfML, how='inner')
21 students took both the

courses
3. Use df.merge with a left join
dfSE = pd.concat([df1SE, df2SE],
ignore_index=True)
dfML = pd.concat([df1ML, df2ML],
ignore_index=True)
df = dfSE.merge(dfML, how='left')
df
s o n l y
u d e n t
n y s t a r e
w m a S o f t w
H o r t h e
re d f o s e ?
e a o u r
a pp e ri n g c
i n e
Eng
The total number would be 26.

Note that these students did not appear for

the Machine Learning exam and hence
their scores are marked as NaN.
w i th a
e rg e
d f. m ?
h t j o i n
ri g
4. Use df.merge with a right join
dfSE = pd.concat([df1SE, df2SE],
ignore_index=True)
dfML = pd.concat([df1ML, df2ML],
ignore_index=True)
df = dfSE.merge(dfML, how='right')
df
right join is used to get a list of all the
students who appeared in the Machine
Learning course.
w i th a
e r g e
d f. m ?
e r j o i n
o u t
5. Use df.merge with a outer join
dfSE = pd.concat([df1SE, df2SE],
ignore_index=True)
dfML = pd.concat([df1ML, df2ML],
ignore_index=True)
df = dfSE.merge(dfML, how='outer')
df
i n g o n
M erg
d e x ?
In
g a n d
h a p i n
Res
t i n g ?
Pi v o
Reshaping and Pivoting
• Pivoting - Rearrange data in a dataframe can be
done with hierarchical indexing using two
actions:
1. Stacking: Stack rotates from any
particular column in the data to the rows.
2. Unstacking: Unstack rotates from the
rows into the column.
TRY IT!

Create a dataframe that records the rainfall, humidity,

and wind conditions of five different counties in Norway

data = np.arange(15).reshape((3,5))
indexers = ['Rainfall', 'Humidity', 'Wind']
dframe1 = pd.DataFrame(data, index=indexers,
columns=['Bergen', 'Oslo', 'Trondheim',
'Stavanger', 'Kristiansand'])
dframe1
o d fo r
M e th
ki n g ?
S t a c
Reshaping and Pivoting
• Using the stack() method on the preceding
dframe1, we can pivot the columns into rows to
produce a series:
stacked = dframe1.stack()
stacked
o d fo r
M e t h
ki n g ?
n s ta c
U
stacked.unstack()
• Unstacking will create missing
data if all the values are not
present in each of the sub-groups.
series1 = pd.Series([000, 111, 222, 333],
index=['zeros','ones', 'twos', 'threes'])
series2 = pd.Series([444, 555, 666], index=
'fours', 'fives', 'sixes'])
frame2 = pd.concat([series1, series2],
keys=['Number1', 'Number2'])
frame2.unstack()
Reshaping
• Reshaping means changing the shape of an
array.
• The shape of an array is the number of
elements in each dimension.
• By reshaping we can add or remove dimensions
or change number of elements in each
dimension.
a ti o n
n s f o rm
Tra s ?
n i q u e
Tec h
Da t a
a t i o n ?
d u p l ic
De
l a c i n g
Rep
lu e s ?
Va
is s i n g
l i n g m
Ha n d
d a ta ?

99924-2093-03 FR730V FS730V FX730V English Ebook
No ratings yet
99924-2093-03 FR730V FS730V FX730V English Ebook
177 pages
Woodcut
No ratings yet
Woodcut
4 pages
Carbocryl HV PDF
No ratings yet
Carbocryl HV PDF
2 pages
Ccs346 Eda Unit 1
No ratings yet
Ccs346 Eda Unit 1
139 pages
Understanding.results.with.Python.B0DCY757YS
No ratings yet
Understanding.results.with.Python.B0DCY757YS
467 pages
EDAP LAB
No ratings yet
EDAP LAB
47 pages
Unit 5 Fod (1) (Repaired)
No ratings yet
Unit 5 Fod (1) (Repaired)
28 pages
EDA-4-5
No ratings yet
EDA-4-5
7 pages
Python Data Analysis and Visualization 100 Practical Exercises With Results and Explanations (Yuka, Horikawa Yui, Kirigaya Kouta Etc.) (Z-Library)
No ratings yet
Python Data Analysis and Visualization 100 Practical Exercises With Results and Explanations (Yuka, Horikawa Yui, Kirigaya Kouta Etc.) (Z-Library)
453 pages
EDA Module 2
No ratings yet
EDA Module 2
34 pages
UNIT 1
No ratings yet
UNIT 1
15 pages
L5 6 DataViz
No ratings yet
L5 6 DataViz
79 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
41 pages
Chapter 2. Data Analysis and Processing - Full
No ratings yet
Chapter 2. Data Analysis and Processing - Full
49 pages
unit-2
No ratings yet
unit-2
52 pages
Data Minds - Data Science Curriculum 2023 V2
No ratings yet
Data Minds - Data Science Curriculum 2023 V2
15 pages
Data Visualizations in Python With Matplotlib: Sidita Duli, PHD
No ratings yet
Data Visualizations in Python With Matplotlib: Sidita Duli, PHD
6 pages
Data Science Learning Checklist
No ratings yet
Data Science Learning Checklist
1 page
MLS+1+-+Python+for+Data+Science
No ratings yet
MLS+1+-+Python+for+Data+Science
33 pages
Module1 DS Ppt
No ratings yet
Module1 DS Ppt
61 pages
DAV EXP 1 t12 31
No ratings yet
DAV EXP 1 t12 31
39 pages
8537ADS Experiment 03
No ratings yet
8537ADS Experiment 03
4 pages
Data Science With Python - Lesson 10 - Data Visualization in Python With Matplotlib - Raw
No ratings yet
Data Science With Python - Lesson 10 - Data Visualization in Python With Matplotlib - Raw
71 pages
Unit - 1 EDA
No ratings yet
Unit - 1 EDA
123 pages
Data Science Unit 2-11-08 2023
No ratings yet
Data Science Unit 2-11-08 2023
78 pages
Chapter 4
No ratings yet
Chapter 4
120 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
21CS644 Module 4
No ratings yet
21CS644 Module 4
24 pages
ass-2 (2)
No ratings yet
ass-2 (2)
13 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Python Exploratory Data Analysis
No ratings yet
Python Exploratory Data Analysis
24 pages
lec19
No ratings yet
lec19
14 pages
Data Manipulation and Visualization
No ratings yet
Data Manipulation and Visualization
21 pages
Description of Data Visualization Tools
No ratings yet
Description of Data Visualization Tools
15 pages
Class 1 Data Visualization in Python using matplotlib
No ratings yet
Class 1 Data Visualization in Python using matplotlib
13 pages
DEV Experiment No.3
No ratings yet
DEV Experiment No.3
10 pages
IT_R23_Skills Development-DATA VISUALIZATION Lab
No ratings yet
IT_R23_Skills Development-DATA VISUALIZATION Lab
31 pages
4 Visualization
No ratings yet
4 Visualization
28 pages
DV Lab Manual (Ex - No.1-10)
No ratings yet
DV Lab Manual (Ex - No.1-10)
23 pages
Unit-1&2 Viva Ques
No ratings yet
Unit-1&2 Viva Ques
5 pages
00. Data+Visualization+in+Python
No ratings yet
00. Data+Visualization+in+Python
17 pages
Prac - 6
No ratings yet
Prac - 6
7 pages
19_Matplotlib
No ratings yet
19_Matplotlib
26 pages
Data Visualization Module1
No ratings yet
Data Visualization Module1
44 pages
Module 2
No ratings yet
Module 2
30 pages
Handout6 - visualization
No ratings yet
Handout6 - visualization
75 pages
Data Science Process
No ratings yet
Data Science Process
30 pages
DVPD Final Lab Word PDF
No ratings yet
DVPD Final Lab Word PDF
93 pages
Data Science Four Marks Qa
No ratings yet
Data Science Four Marks Qa
4 pages
Question bank DEV.docx
No ratings yet
Question bank DEV.docx
16 pages
ITS62604 Tutorial 6 (Answer)
No ratings yet
ITS62604 Tutorial 6 (Answer)
2 pages
Ad3301-Data Exploration and Visualization Important Questions For Ciat-1
No ratings yet
Ad3301-Data Exploration and Visualization Important Questions For Ciat-1
3 pages
AIDS C04-Session-22
No ratings yet
AIDS C04-Session-22
22 pages
Unit 4 (2) Python
No ratings yet
Unit 4 (2) Python
27 pages
unit_5 (1)
No ratings yet
unit_5 (1)
81 pages
Python (2024)
100% (2)
Python (2024)
466 pages
Record DSCP508 - DV-1-1
No ratings yet
Record DSCP508 - DV-1-1
89 pages
Data Visualization
No ratings yet
Data Visualization
31 pages
Module 1
No ratings yet
Module 1
91 pages
Data Visualization With Matplotlib
No ratings yet
Data Visualization With Matplotlib
20 pages
Be A 65 Ads Exp 2
No ratings yet
Be A 65 Ads Exp 2
10 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Accent P0032 HO2S Heater Control Circuit High
No ratings yet
Accent P0032 HO2S Heater Control Circuit High
7 pages
eSSL 2020
No ratings yet
eSSL 2020
40 pages
Fatigue Weld Random CAEA
No ratings yet
Fatigue Weld Random CAEA
23 pages
Carpentry Hand Tools and Basic Timber Joints: Draft Only
No ratings yet
Carpentry Hand Tools and Basic Timber Joints: Draft Only
28 pages
PL Laptop HP
No ratings yet
PL Laptop HP
3 pages
Synthesis of Au-Sn Alloy Nanoparticles For Lead-Free Electronics With Unique PDF
No ratings yet
Synthesis of Au-Sn Alloy Nanoparticles For Lead-Free Electronics With Unique PDF
5 pages
English Introduction Manual
No ratings yet
English Introduction Manual
12 pages
Exercise Respiratory System
100% (1)
Exercise Respiratory System
19 pages
Upper Limb and Lower Limb
No ratings yet
Upper Limb and Lower Limb
16 pages
Lokfix : (Formerly Known As Lokset)
No ratings yet
Lokfix : (Formerly Known As Lokset)
4 pages
Catalogue-EN-IPEX-Municipal-1
No ratings yet
Catalogue-EN-IPEX-Municipal-1
98 pages
Lab 04-Timer0 Module of PIC18F4520uC and Its Applications
No ratings yet
Lab 04-Timer0 Module of PIC18F4520uC and Its Applications
3 pages
Scopus Database Review
No ratings yet
Scopus Database Review
8 pages
Oral Diagnosis 4
No ratings yet
Oral Diagnosis 4
12 pages
A Fractal Projective Hedgehog
No ratings yet
A Fractal Projective Hedgehog
6 pages
Infrared Detection of Transformer Insulation
No ratings yet
Infrared Detection of Transformer Insulation
4 pages
WA GoodAsGold Excerpt
No ratings yet
WA GoodAsGold Excerpt
18 pages
How To Wind Your Own Audio Transformer
100% (1)
How To Wind Your Own Audio Transformer
8 pages
Smart Irrigation System
100% (1)
Smart Irrigation System
14 pages
Introduction To Animal Health
100% (1)
Introduction To Animal Health
20 pages
Reactions With Heterocyclic - Enaminoesters: A Novel Synthesis of 2-Amino-3-Ethoxycarbonyl - (4H) - Pyrans
No ratings yet
Reactions With Heterocyclic - Enaminoesters: A Novel Synthesis of 2-Amino-3-Ethoxycarbonyl - (4H) - Pyrans
5 pages
Mangaroo by Naomi Royde-Smith, Pp. 817-824 PDF
No ratings yet
Mangaroo by Naomi Royde-Smith, Pp. 817-824 PDF
9 pages
Gas Laws
No ratings yet
Gas Laws
3 pages
Hyperbaric Evacuation
100% (1)
Hyperbaric Evacuation
45 pages
The 2019 Boyongan Bayugo and Kalayaan Reseource PMRC Compliant Report
100% (1)
The 2019 Boyongan Bayugo and Kalayaan Reseource PMRC Compliant Report
195 pages
Offshore: Solutions
No ratings yet
Offshore: Solutions
13 pages
The Biology and Life Cycle of The MALARIA PARASITE
No ratings yet
The Biology and Life Cycle of The MALARIA PARASITE
12 pages

UNIT - 1 EDA Continuation

Uploaded by

UNIT - 1 EDA Continuation

Uploaded by

TOPIC - 7

TO OUTLINE AN OVERVIEW OF EXPLORATORY DATA ANALYSIS.

UNIT I EXPLORATORY DATA ANALYSIS

EDA FUNDAMENTALS – UNDERSTANDING DATA SCIENCE – SIGNIFICANCE OF EDA –

datetime - Provides classes for manipulating dates

A function named “generateData” which takes an

• Initializes an empty list “listdata“ to store generated data.

• Iterates “n“ times.

Converts the 'Date' column in the DataFrame to datetime

Numpy- library for numerical computations in

months = list(range(1, 13))

• ‘months’ is a list containing numbers from 1 to 12,

• A scatter plot can also be

1. Syntax and API

• Seaborn and Matplotlib provide limited interactivity, while

Plotly provides highly interactive and responsive plots that

can be zoomed, panned, and rotated.

absence, we'd keep the original indices.

pd.concat([dataFrame1, dataFrame2], axis=1)

dfML = pd.concat([df1ML, df2ML],

df = pd.concat([dfML, dfSE], axis=1)

Note that these students did not appear for

Create a dataframe that records the rainfall, humidity,

and wind conditions of five different counties in Norway

You might also like