0% found this document useful (0 votes)

17 views

ML Lac0 Notes

The document provides an overview of Exploratory Data Analysis (EDA) and its application in spatial data analysis, emphasizing the importance of EDA in the data analysis workflow. It discusses techniques for data ingestion and cleaning, statistical analysis, visualization methods, and introduces Exploratory Spatial Data Analysis (ESDA) concepts. A case study involving a teahouse location decision illustrates the practical application of EDA techniques in real-world scenarios.

Uploaded by

odpc4979

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

ML Lac0 Notes

Uploaded by

odpc4979

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Introduction to Exploratory (Spatial) Data Analysis - Summary

Overview

Exploratory Data Analysis (EDA) and its application in spatial data. The presentation covers the
fundamentals of EDA, the importance of EDA before modelling, and the speciﬁc techniques used
in spatial data analysis.

Learning Objectives

The primary learning objectives of the lesson are:

• Explaining the fundamentals and importance of EDA to peers.

• Applying statistical and visualization methods to different types of data.

• Developing familiarity with Python for data analysis tasks.

Data Analysis Workﬂow

The presentation outlines a typical data analysis workﬂow, starting from data preparation, which
involves ingesting and cleaning data, followed by EDA to summarize data characteristics using
statistical numbers and visualizations.

Data Ingestion and Cleaning

Data ingestion involves reading data from various formats using Python libraries such as:

• pandas.read_csv() for CSV ﬁles.

• pandas.read_excel() for Excel ﬁles.

• scipy.io.loadmat() for MATLAB ﬁles.

• geopandas.read_file() for shapefiles and GeoJSON files.

• rasterio.open() for GeoTIFF ﬁles.

• matplotlib.pyplot.imread() for images.

Data cleaning is emphasized as a crucial step to transform messy data into tidy data suitable for
modeling.

Exploratory Data Analysis (EDA)

EDA is described as a method to summarize data characteristics with statistical measures and
visualizations. Key beneﬁts of EDA include:

• Providing an overview of the data.

• Guiding further analysis and method selection.

• Generating hypotheses.

• Identifying data problems.

• Understanding variable properties and relationships.

Statistical Analysis and Visualization

The presentation highlights the importance of combining statistical analysis with visualization to
maximize data insights and uncover underlying structures. Examples include:

• Histograms and Probability Density Functions (PDFs) for univariate analysis.

• Box plots for summarizing data distributions.

• Bar plots for categorical data.

Bi-Variate Analysis

Bi-variate analysis techniques are discussed to understand relationships between two variables.
Methods include:

• Correlation analysis to quantify relationships.

• 2-D scatter plots to visualize linear relationships.

• Pair plots to show pairwise relationships and identify patterns and outliers.

Exploratory Spatial Data Analysis (ESDA)

ESDA applies traditional EDA techniques to spatial datasets, connecting variables to speciﬁc
locations or times and considering spatial autocorrelation. Key concepts include:

• Spatial autocorrelation: Describing how variable values are correlated across space.

o Positive spatial autocorrelation: Similar values cluster together.

o Zero spatial autocorrelation: Random distribution of values.

o Negative spatial autocorrelation: Dissimilar values disperse.

Visualization Techniques in ESDA

Several ESDA mapping techniques are introduced, including:

• Box maps to identify outliers and visualize data distribution.

• Connection maps to show spatial relationships.

• Various advanced mapping methods like conditional choropleth maps and Voronoi
diagrams.

Case Study: Ghelgheli’s Teahouse

The presentation includes a team-based learning assignment involving a hypothetical scenario

where Ghelgheli, a tea lover, uses data analysis to ﬁnd a suitable location for his teahouse. The
case study emphasizes:

• Data collection on potential locations, foot traffic, competitor locations, rent prices, and
demographics.

• Data cleaning and imputation to handle missing and anomalous values.

• Statistical analysis to extract descriptive statistics.

• Visualization techniques to identify patterns and trends.

• Decision-making based on data insights to select the best location.

Key Takeaways

The document concludes with several important lessons:

• The critical role of data in decision-making processes.

• The effectiveness of EDA techniques in uncovering insights.

• The transformation of messy data into valuable insights through proper cleaning and
analysis.

This comprehensive presentation provides a solid foundation for understanding and applying
EDA and ESDA techniques in various data analysis scenarios.
Introduction to Exploratory
(Spatial) Data Analysis

Mahdi KHODADADZADEH
Assistant Professor
Faculty of Geo-Information Science and Earth Observation (ITC)
Department of Geo-information Processing (GIP)
[email protected]

May 2024
Exploratory

Data

Analysis

From: https://xkcd.com
2
This lesson’s learning objectives

Explain to peers
• the fundamentals of E(S)DA
• the importance of E(S)DA before modelling
Apply statistical and visualization methods on different types of
data
Develop familiarity with Python

3
You are a Python master. Congrats!

4
M a g ic B
ox

ta )
( D a
_ X
t h m
or i
A l g
l =
M ode

You’ve learned how to build a model in Python. Congrats!

5
M a g ic B
ox

But you run into some issues!

6
Data Analysis Workflow

Data Preparation

From: https://davpy.netlify.app/3-data-workflow.html

7
Ingesting Data

Getting data in a shape that we can use to start our

analysis.
Python:
Reading comma separated value (CSV) data: pandas.read_csv()
Reading an Excel file: pandas.read_excel()
Reading a MATLAB file: scipy.io.loadmat()
Reading shapefile and GeoJSON files: geopandas.read_file()
Reading GeoTIFF: rasterio.open()
Reading an image: matplotlib.pyplot.imread()

8
Data Cleaning

Data preparation: messy data à tidy data

Rectangular data structures à Data modelling

From: https://www.openscapes.org/blog/2020/10/12/tidy-data/
9
Exploratory Data Analysis (EDA)

EDA aims at summarizing the characteristics of a dataset

with statistical numbers and graphs

Statistical Analysis + Visualization

Get an overview of the data

Orient further analysis à choose correct methods/approaches
Help you to generate hypothesis
Spot problems in data
Understand properties of the variables (e.g., mean)
Understand relationships between variables
10
Statistics + Visualization

From: https://en.wikipedia.org/wiki/Anscombe%27s_quartet

11
Statistics + Visualization

Visualization
Maximize insight into a
data set
Uncover underlying
structure

From: https://en.wikipedia.org/wiki/Anscombe%27s_quartet

12
Univariate Analysis

Mean and Standard Deviation

Histogram and PDF

distribution of the
data, showing the
number of
observations that fall
within each bin.
PDF is the continuous
version of the
histogram

13
Univariate Analysis

Min, Max, Median, Percentile, Quartile

Percentile: Given a vector V of length N, the q-th percentile of V is the

value q/100 of the way from the minimum to the maximum in a sorted copy
of V.
Quartile: The q-th quantile of V is the value q of the way from the minimum to
the maximum in a sorted copy of V.

five-number summary à

14
Univariate Analysis

Box plot: displays the five-number summary (the minimum, first

quartile, median, third quartile, and maximum) of a set of data.
It can tell you about your outliers and what their values are

https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

15
Univariate Analysis

Bar plots

From: https://matplotlib.org/

16
From: https://xkcd.com

17
Bi-Variate Analysis

Correlation
Relationship between two variables quantitatively

18
Bi-Variate Analysis

2-D Scatter Plots

They can show the
linear relationship
between two variables

19
Bi-Variate Analysis

Pair-plot
Note: -

A pair plot is a visualization that

shows pairwise relationships
between variables in a dataset. It’s
a great way to explore how different
variables correlate with each other.

What is a Pair Plot?

A pair plot displays scatterplots,

histograms, or kernel density
estimates for each variable pair in
your dataset.
It’s useful for identifying patterns,
correlations, and potential outliers.

20
Exploratory Spatial Data Analysis

Geospatial data → ESDA

“Traditional” EDA can be applied to spatial datasets for

obtaining statistics and basic plots (barplot, histograms,
boxplots,..).
ESDA tools connects a specific variable to a location/time
It takes into account the values of the same variable in
different locations/time.

21
Applying EDA to geospatial data

22
Spatial autocorrelation

Correlation of a variable with itself across space (in different places in

space) à relationships to neighbors

Positive spatial autocorrelation

values are similar to their neighbors or other close objects
clusters of similar values on the map
Zero or no spatial autocorrelation
random values of close objects or neighbors
no clear pattern visually
Negative spatial autocorrelation
values are dissimilar to their neighbors or close objects
dispersed patterns of values on the map

23
Spatial autocorrelation

From: (Radil, 2011)

24
Spatial autocorrelation

From: https://mgimond.github.io/Spatial/spatial-autocorrelation.html

25
SPATIAL AUTOCORRELATION: MORAN’S I

• n is the number of cases

• xi is the variable value at a
particular location
• xj is the variable value at nåi å j wi , j ( xi - x)( x j - x)
another location I=
• ! is the mean of the variable
𝑿 åi å j i , j åi i
w ( x - x ) 2

• wij is a weight applied to the

comparison between location
i and location j

-1 0* +1

high negative spatial no spatial high positive spatial

autocorrelation autocorrelation* autocorrelation

Check out the link below for more in-depth explanation:

https://rpubs.com/corey_sparks/105700

26
Visualization on map

27
Connection map

From: https://www.data-to-viz.com/story/MapConnection.html

28
Box map

Note: -

A box map (Anselin 1994) is the mapping counterpart of the

idea behind a box plot. The point of departure is again a
quantile map, more specifically, a quartile map. But the four
categories are extended to six bins, to separately identify the
lower and upper outliers. The definition of outliers is a function
of a multiple of the inter-quartile range (IQR), the difference
between the values for the 75 and 25 percentile. As we will see
in a later chapter in our discussion of the box plot, we use two
options for these cut-off values, or hinges, 1.5 and 3.0. The box
map uses the same convention.

The box map in Figure separates the three lower outliers (the
observations with zero values) from the other four observations
in the first quartile. They are depicted in dark blue. Similarly, it
separates the six outliers in Manhattan from the eight other
observations in the upper quartile. The upper outliers are
colored dark red.
29
ESDA maps

Some examples of ESDA maps:

Box Map: https://geodacenter.github.io/workbook/3a_mapping/lab3a.html#extreme-
value-maps

Brushing & linking:

https://www.spatialanalysisonline.com/HTML/eda__esda_and_estda.htm

Conditional choropleth mapping:

http://publichealthintelligence.org/content/geography-diabetes-us-conditioned-map

Voronoi analysis: https://www.gislounge.com/voronoi-diagrams-and-gis/

Cartograms: https://gisgeography.com/cartogram-maps/
Connection map: https://www.data-to-viz.com/story/MapConnection.html

30
Team Based Learning
Team based learning assignment
Ghelgheli decided to change his job, and as a tea lover, he opted to open a teahouse. He aimed
to find the right location for his business, where many people were passing by and not many
competitors around.
Ghelgheli started by collecting data, organizing it into rows and columns within a table on his
computer. However, the data was somewhat messy, containing several missing values and
even some anomalies. Nevertheless, Ghelgheli was enthusiastic about working with such a
dataset. He used some cool techniques to clean the data, extract statistical measures, and
generate plots and maps.
Through his analysis, Ghelgheli pinpointed a suitable location for his teahouse, and soon after
opening, it became a local favorite.

Which data and methods do you think Ghelgheli utilized for his analysis?
What interesting learnings did you derive from Ghelgheli's story?
Can you provide some real-life examples similar to Ghelgheli's experience?

32
Data Collection: Ghelgheli started by collecting data on potential locations for his
teahouse. This could include foot traffic data, competitor locations, rent prices,
demographic information of the area, etc.

Data Cleaning: The data Ghelgheli collected was described as messy, with missing
and strange values. Ghelgheli likely employed techniques like data imputation,
outlier detection, and data validation to clean the dataset.

Statistical Analysis: Ghelgheli extracted statistical measures from the cleaned

dataset. This could involve calculating means, medians, standard deviations, and
other descriptive statistics to understand the characteristics of the data.

Visualization: Ghelgheli created plots and maps to visualize the data. This could
include scatter plots, histograms, heatmaps, and geographical maps to identify
patterns and trends in the data.

Decision Making: Through the analysis, Ghelgheli identified a suitable location for
his teahouse based on the insights gained from the data analysis.

33
• The importance of data in decision-making processes

• The power of EDA techniques in uncovering insights and

making informed decisions.

• How messy data can be transformed into valuable insights

through proper cleaning and analysis.

Unit I - Part I Notes
100% (7)
Unit I - Part I Notes
33 pages
03a EDA
No ratings yet
03a EDA
47 pages
Document (4)
No ratings yet
Document (4)
21 pages
FDS Unit 2
No ratings yet
FDS Unit 2
15 pages
Unit 3
No ratings yet
Unit 3
47 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
DOC-20250125-WA0000.
No ratings yet
DOC-20250125-WA0000.
15 pages
exp 4-10 merged
No ratings yet
exp 4-10 merged
89 pages
03 Phan Tich Dau Tu Nang Cao - Phan Tich Kham Pha Du Lieu
No ratings yet
03 Phan Tich Dau Tu Nang Cao - Phan Tich Kham Pha Du Lieu
47 pages
Dev 1
No ratings yet
Dev 1
2 pages
OE PPT
No ratings yet
OE PPT
8 pages
Unit 3
No ratings yet
Unit 3
222 pages
Group-7
No ratings yet
Group-7
19 pages
Data Science Process
No ratings yet
Data Science Process
30 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
4.1 Advanced Data Analysis & Visualization
No ratings yet
4.1 Advanced Data Analysis & Visualization
12 pages
CH4 Exploratory Data Analysis
No ratings yet
CH4 Exploratory Data Analysis
12 pages
devish all unit
No ratings yet
devish all unit
42 pages
Unit-1
No ratings yet
Unit-1
52 pages
Data Sciecnce
No ratings yet
Data Sciecnce
16 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
10 pages
EDA Feature eng- Estimation Inference and Hypothesis
No ratings yet
EDA Feature eng- Estimation Inference and Hypothesis
53 pages
Data Exploration and Visualization
100% (1)
Data Exploration and Visualization
281 pages
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
No ratings yet
AI6322 - Module 3 - Exploratory Data Analysis (EDA) - MODULE
15 pages
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
No ratings yet
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
8 pages
C21_SMA_EXP4[1]
No ratings yet
C21_SMA_EXP4[1]
12 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
datascience 3
No ratings yet
datascience 3
40 pages
UNIT 1
No ratings yet
UNIT 1
23 pages
DL_EDA_process
No ratings yet
DL_EDA_process
2 pages
Linear Regression Merged
No ratings yet
Linear Regression Merged
38 pages
Eda Sandhya
No ratings yet
Eda Sandhya
7 pages
Desug12 Pisati
No ratings yet
Desug12 Pisati
91 pages
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
9 pages
Exploratory Dataanalysis (EDA) : Kevin Angelo A. Inlong
No ratings yet
Exploratory Dataanalysis (EDA) : Kevin Angelo A. Inlong
6 pages
Unit 4
No ratings yet
Unit 4
33 pages
eda1
No ratings yet
eda1
25 pages
Unit - 1 EDA
No ratings yet
Unit - 1 EDA
123 pages
Eda Unit 1
No ratings yet
Eda Unit 1
57 pages
Assignment EDA
No ratings yet
Assignment EDA
4 pages
EDA
No ratings yet
EDA
9 pages
Exploratory Data Analysis (EDA)
No ratings yet
Exploratory Data Analysis (EDA)
12 pages
What Is Exploratory Data Analysis (EDA) ?
No ratings yet
What Is Exploratory Data Analysis (EDA) ?
6 pages
What Is Exploratory Data Analysis (EDA)
100% (2)
What Is Exploratory Data Analysis (EDA)
13 pages
Exploratory Data Analysis - Komorowski PDF
No ratings yet
Exploratory Data Analysis - Komorowski PDF
20 pages
datascience unit-4
No ratings yet
datascience unit-4
6 pages
ds unit 2 qb
No ratings yet
ds unit 2 qb
25 pages
unit-1
No ratings yet
unit-1
50 pages
Exp-12
No ratings yet
Exp-12
7 pages
Unit 1
No ratings yet
Unit 1
19 pages
ML EXP1_2201107
No ratings yet
ML EXP1_2201107
34 pages
IMPDAV
No ratings yet
IMPDAV
105 pages
The analysis_In_EDA
No ratings yet
The analysis_In_EDA
7 pages
EDA - Module 4
No ratings yet
EDA - Module 4
35 pages
AIDS C04-Session-22
No ratings yet
AIDS C04-Session-22
22 pages
Data Science with Python: Unlocking the Power of Pandas and Numpy
From Everand
Data Science with Python: Unlocking the Power of Pandas and Numpy
Robert Johnson
No ratings yet
Exploring the World of Data Science and Machine Learning
From Everand
Exploring the World of Data Science and Machine Learning
NIBEDITA Sahu
No ratings yet
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
DL Notes After Midsem-3
No ratings yet
DL Notes After Midsem-3
57 pages
Pme After Mid Sem
No ratings yet
Pme After Mid Sem
13 pages
AD Notes After Midsem
No ratings yet
AD Notes After Midsem
12 pages
Divide and Conquer
No ratings yet
Divide and Conquer
13 pages
NASA Space Shuttle STS-94 Press Kit
No ratings yet
NASA Space Shuttle STS-94 Press Kit
42 pages
Marina Garcés - Two Articles
No ratings yet
Marina Garcés - Two Articles
10 pages
Cae Reult Practice Exercises
No ratings yet
Cae Reult Practice Exercises
6 pages
Falk and Dierking School Is Not Where Americans Learn Science
No ratings yet
Falk and Dierking School Is Not Where Americans Learn Science
9 pages
Teamdfir - Sift-Cli - CLI Tool To Manage A SIFT Install
No ratings yet
Teamdfir - Sift-Cli - CLI Tool To Manage A SIFT Install
2 pages
Vocabulary PDF
No ratings yet
Vocabulary PDF
4 pages
DS160 Service Manual English 1
100% (1)
DS160 Service Manual English 1
64 pages
DLL 1 - What Is Philosophy
No ratings yet
DLL 1 - What Is Philosophy
3 pages
Exercise 5.5 Problem 1: If Sol: Given
No ratings yet
Exercise 5.5 Problem 1: If Sol: Given
34 pages
Modal and Harmonic Response Analysis of Key Components of Ditch Device Based On ANSYS
No ratings yet
Modal and Harmonic Response Analysis of Key Components of Ditch Device Based On ANSYS
9 pages
MENTOR Mentee Handbook
100% (2)
MENTOR Mentee Handbook
12 pages
Offline Activation Guide PDF
No ratings yet
Offline Activation Guide PDF
5 pages
Greenpeace Book of Greenwash PDF
No ratings yet
Greenpeace Book of Greenwash PDF
36 pages
Chapter 5 Torsion for RC beam (1)
No ratings yet
Chapter 5 Torsion for RC beam (1)
39 pages
DLL Science 6 q3 w10
No ratings yet
DLL Science 6 q3 w10
6 pages
2023 Values Checklist Handout
No ratings yet
2023 Values Checklist Handout
2 pages
Hip Rafter Ridge Board Barge Board: Part of Roof
No ratings yet
Hip Rafter Ridge Board Barge Board: Part of Roof
8 pages
Kipley Pereles' Resume - Concordia
No ratings yet
Kipley Pereles' Resume - Concordia
2 pages
Resume Ashok
No ratings yet
Resume Ashok
2 pages
Education and The Process of Stratificaton CH 3
No ratings yet
Education and The Process of Stratificaton CH 3
10 pages
Lecture 5 - Relative Density - Ce 5133 Foundation Engineering
No ratings yet
Lecture 5 - Relative Density - Ce 5133 Foundation Engineering
29 pages
Surat Lamaran PT - PP Persero
No ratings yet
Surat Lamaran PT - PP Persero
10 pages
Signed Off Statistics and Probability11 q2 m5 Test of Hypothesis Lesson 2
No ratings yet
Signed Off Statistics and Probability11 q2 m5 Test of Hypothesis Lesson 2
6 pages
Windows NT Event Logging
No ratings yet
Windows NT Event Logging
26 pages
Dfu Util Manual
No ratings yet
Dfu Util Manual
3 pages
BALDOVINO JV Lab2-1
No ratings yet
BALDOVINO JV Lab2-1
11 pages
Draw The Block Diagram of Von Neumann Architecture and Explain About Its Parts in Brief Answer
No ratings yet
Draw The Block Diagram of Von Neumann Architecture and Explain About Its Parts in Brief Answer
7 pages
CANcaseXL Manual en
No ratings yet
CANcaseXL Manual en
38 pages
Harsh Pal's Fortune Report
No ratings yet
Harsh Pal's Fortune Report
67 pages
Construction Standard Specification SECTION 01330 Submittal Procedures
No ratings yet
Construction Standard Specification SECTION 01330 Submittal Procedures
12 pages

ML Lac0 Notes

Uploaded by

ML Lac0 Notes

Uploaded by

Introduction to Exploratory (Spatial) Data Analysis - Summary

The primary learning objectives of the lesson are:

• Explaining the fundamentals and importance of EDA to peers.

• Applying statistical and visualization methods to different types of data.

• Developing familiarity with Python for data analysis tasks.

Data Analysis Workﬂow

Data Ingestion and Cleaning

• pandas.read_csv() for CSV ﬁles.

• pandas.read_excel() for Excel ﬁles.

• scipy.io.loadmat() for MATLAB ﬁles.

• geopandas.read_file() for shapefiles and GeoJSON files.

• rasterio.open() for GeoTIFF ﬁles.

• matplotlib.pyplot.imread() for images.

Exploratory Data Analysis (EDA)

• Providing an overview of the data.

• Guiding further analysis and method selection.

• Identifying data problems.

• Understanding variable properties and relationships.

• Histograms and Probability Density Functions (PDFs) for univariate analysis.

• Box plots for summarizing data distributions.

• Bar plots for categorical data.

• Correlation analysis to quantify relationships.

• 2-D scatter plots to visualize linear relationships.

Exploratory Spatial Data Analysis (ESDA)

o Positive spatial autocorrelation: Similar values cluster together.

o Zero spatial autocorrelation: Random distribution of values.

o Negative spatial autocorrelation: Dissimilar values disperse.

Visualization Techniques in ESDA

Several ESDA mapping techniques are introduced, including:

• Box maps to identify outliers and visualize data distribution.

• Connection maps to show spatial relationships.

Case Study: Ghelgheli’s Teahouse

The presentation includes a team-based learning assignment involving a hypothetical scenario

• Data cleaning and imputation to handle missing and anomalous values.

• Statistical analysis to extract descriptive statistics.

• Decision-making based on data insights to select the best location.

The document concludes with several important lessons:

• The critical role of data in decision-making processes.

• The effectiveness of EDA techniques in uncovering insights.

You’ve learned how to build a model in Python. Congrats!

But you run into some issues!

Getting data in a shape that we can use to start our

Data preparation: messy data à tidy data

EDA aims at summarizing the characteristics of a dataset

Statistical Analysis + Visualization

Get an overview of the data

Mean and Standard Deviation

Histogram and PDF

Min, Max, Median, Percentile, Quartile

Percentile: Given a vector V of length N, the q-th percentile of V is the

Box plot: displays the five-number summary (the minimum, first

2-D Scatter Plots

A pair plot is a visualization that

What is a Pair Plot?

A pair plot displays scatterplots,

Geospatial data → ESDA

“Traditional” EDA can be applied to spatial datasets for

Correlation of a variable with itself across space (in different places in

Positive spatial autocorrelation

From: (Radil, 2011)

• n is the number of cases

• wij is a weight applied to the

high negative spatial no spatial high positive spatial

Check out the link below for more in-depth explanation:

A box map (Anselin 1994) is the mapping counterpart of the

Some examples of ESDA maps:

Brushing & linking:

Conditional choropleth mapping:

Voronoi analysis: https://www.gislounge.com/voronoi-diagrams-and-gis/

Statistical Analysis: Ghelgheli extracted statistical measures from the cleaned

• The power of EDA techniques in uncovering insights and

• How messy data can be transformed into valuable insights

You might also like