0% found this document useful (0 votes)

3 views

IMPDAV

The document outlines the syllabus for Unit III of a Data Analytics and Visualization course at MIT School of Computing, focusing on Exploratory Data Analysis (EDA) techniques and tools. It covers the importance of EDA, steps involved in data collection and cleaning, as well as univariate and bivariate analysis methods using Python libraries. Advanced EDA techniques such as outlier detection, time series analysis, and dimensionality reduction are also discussed, along with real-world applications and challenges faced in EDA.

Uploaded by

GAYATRI BHOSALE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

IMPDAV

Uploaded by

GAYATRI BHOSALE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 105

MIT Art, Design and Technology University

MIT School of Computing, Pune

21BTCS027 - Data Analytics and Visualization

Class - T.Y. (SEM-II), Core

Unit – III EDA FOR ANALYSIS AND
VISUALIZATION
Prof. Dr. Aditya Pai H
Prof. Shubhangi Divekar
Prof. Revati Deshpande

AY 2024-2025 SEM-II
Unit III - Syllabus

Unit III – EDA FOR ANALYSIS AND VISUALIZATION

Exploratory Data Analysis: Basic, Examples, Techniques.
Python libraries for Analysis: Pandas and Numpy, and invoke
APIs and Web Services. Visualize using Python: Matplotlib,
Seaborn, and Folium.
Exploratory Data Analysis
Basic Concepts of EDA

● Overview of Descriptive Statistics

● Central Tendency and Dispersion Measures
● Key Concepts: Mean, Median, Variance, Standard Deviation
Exploratory Data Analysis
Definition of EDA

• Exploratory data analysis is a data analytics process that aims to understand the data in depth and
learn the different data characteristics, often using visual means. This allows you to get a better
feel of your data and find useful patterns.
Exploratory Data Analysis
Importance in the Data Analysis

• It helps you gather insights, make better sense of the data, and remove irregularities and
unnecessary values from data.
• Helps you prepare your dataset for analysis.
• Allows a machine learning model to predict our dataset better.
• Gives you more accurate results.
• It also helps us to choose a better machine-learning model
Exploratory Data Analysis
Goals of EDA
 Discover patterns and trends.
 Spot errors, anomalies, and outliers.
 Visualize relationships between variables.
 e.g., a raw scatterplot vs. a cleaned-up, annotated version.
Exploratory Data Analysis
Steps Involved in Exploratory Data Analysis

1. Data Collection - Data collection is an essential part of exploratory data analysis. It refers to the
process of finding and loading data into our system. Good, reliable data can be found on various
public sites or bought from private organizations. Some reliable sites for data collection are
Kaggle, Github, Machine Learning Repository, etc.

• The data depicted below represents the housing dataset available on Kaggle. It contains
information on houses and their sale prices.
Exploratory Data Analysis
Steps Involved in Exploratory Data Analysis

2. Data Cleaning - Data cleaning refers to removing unwanted variables and values
from your dataset and eliminating any irregularities in it. Such anomalies can
disproportionately skew the data and, hence, adversely affect the results. Some
steps that can be done to clean data are:
● Removing missing values, outliers, and unnecessary rows/ columns.
● Re-indexing and reformatting our data.

Now, it’s time to clean the housing dataset. You first need to check to see the number of
missing values in each column and the percentage of missing values they contribute to
Exploratory Data Analysis
Steps Involved in Exploratory Data Analysis

3. Finding Missing Values

To do so, drop the columns which are missing more than 15% of the data. Further, some
variables are missing a significant chunk of the data, like 'PoolQC' , 'MiscFeature', 'Alley',
etc., seem to be outliers.
Exploratory Data Analysis
Steps Involved in Exploratory Data Analysis

3. Finding Missing Values

Exploratory Data Analysis
Steps Involved in Exploratory Data Analysis

4. Dropping Missing Values

Exploratory Data Analysis
Steps Involved in Exploratory Data Analysis

Your final dataset after cleaning looks as shown below. You now have only 63 columns of
importance.
Exploratory Data Analysis
Exploratory Data Analysis
Exploratory Data Analysis
Exploratory Data Analysis
Exploratory Data Analysis
Univariate Analysis
In Univariate Analysis, you analyze data of just one variable. A variable in your dataset
refers to a single feature/ column. You can do this with graphical or non-graphical means
by finding specific mathematical values in the data. Some visual methods include:

● Histograms: Bar plots in which the frequency of data is represented with rectangle
bars.
● Box plots: Here, the information is represented in the form of boxes.

Let's make a histogram out of our SalePrice column.

Exploratory Data Analysis

Univariate Analysis
Exploratory Data Analysis

Univariate Analysis
Right skew
Also known as positive skew, this distribution has a longer tail on the right
side of its peak. The mean of the data is greater than the median.

Left skew
Also known as negative skew, this distribution has a longer tail on the left
side of its peak. The mean of the data is less than the median.

Zero skew (normal distribution)

A symmetrical distribution where the data graph is the same on both sides of
a central point.
Exploratory Data Analysis

Univariate Analysis
•High kurtosis
•A narrow box with long whiskers indicates high kurtosis. This means the
distribution has a narrow peak and many extreme values.
•Low kurtosis
•A wide box with short whiskers indicates low kurtosis. This means the
distribution has a broad peak and few extreme values.
•Normal distribution

•A bell-shaped curve with a kurtosis of 3. This is the ideal level of kurtosis,

neither too heavy nor too light.
Exploratory Data Analysis

Univariate Analysis
Exploratory Data Analysis

Univariate Analysis
• From the graph, you can say that the graph
deviates from the normal and is positively
skewed.

• Now, find the Skewness and Kurtosis of the

graph.

Skewness and Kurtosis in your

data
Exploratory Data Analysis
Univariate Analysis - To understand exactly which variables are outliers, you need to establish a threshold. To
do this, you have to standardize the data. Hence, the data should have a mean of 1 and a standard deviation of 0.

• The above figure shows that the lower range values fall in a
similar range and are too far from 0. Meanwhile, all the higher
range values have a range far from 0.

• You cannot consider that all of them are outliers, but you have to
be careful with the last two variables that are above 7.
Exploratory Data Analysis
Tools and Libraries
 Python: Pandas, Matplotlib, Seaborn, Plotly.
 R: ggplot2, dplyr.
 Visualization tools: Tableau, Power BI.
Exploratory Data Analysis

Bivariate Analysis - Here, you use two variables and

compare them. This way, you can find how one feature
affects the other. It is done with scatter plots, which plot
individual data points or correlation matrices that plot the
correlation in hues. You can also use boxplots.
Exploratory Data Analysis

Bivariate Analysis - Now, plot a scatter plot of the Basement

area vs. the Sales Price and see their relationship. Again,
you can see that the greater the basement area, the
more the sales price.
Exploratory Data Analysis
Bivariate Analysis

Now, delete the last two values as they are outliers.

Deleting Outliers
Exploratory Data Analysis
Bivariate Analysis
Now, plot a scatter plot of the Basement area vs. the Sales Price and see their
relationship. Again, you can see that the greater the basement area, the more
the sales price.
Exploratory Data Analysis
Bivariate Analysis
Moving ahead, plot a boxplot of the Sales Price with Overall Quality. The overall
quality feature is categorical here. It falls in the range of 1 to 10. Here, you can
see the increase in sales price as the quality increases. The rise looks a bit like
an exponential curve.
Exploratory Data Analysis
Advanced EDA Techniques
●Outlier Detection
●Time Series Analysis
●Dimensionality Reduction (PCA)
●Real-world Examples
Exploratory Data Analysis
Advanced EDA Techniques
●Outlier Detection - Ensuring data quality and reliability is crucial
for making informed decisions and extracting meaningful insights.
However, datasets often contain irregularities known as outliers,
which can significantly impact the integrity and accuracy of
analyses. This makes outlier detection a crucial task in data analysis.
Exploratory Data Analysis
Advanced EDA Techniques
●Outlier Detection.
Exploratory Data Analysis
Advanced EDA Techniques
Outlier Detection.
Types of Outliers - Outliers can be classified into various types based
on their characteristics:

1.Univariate Outliers: These are outliers that occur in a single variable

or feature.

2.Multivariate Outliers: These outliers occur when considering

multiple variables simultaneously. A data point may not be an outlier
in any single dimension but can be an outlier when considering
multiple dimensions.
Exploratory Data Analysis
Advanced EDA Techniques
Outlier Detection.
Types of Outliers

3.Global Outliers: Also known as point anomalies, these data points

significantly differ from the rest of the dataset.

4.Contextual Outliers: These are data points that are considered outliers in a
specific context. For example, a high temperature may be normal in summer
but an outlier in winter.

5.Collective Outliers: A collection of data points that deviate significantly from

the rest of the dataset, even if individual points within the collection are not
outliers.
Exploratory Data Analysis
Advanced EDA Techniques
●Time Series Analysis - In Exploratory Data Analysis (EDA), "time
series analysis" refers to the process of examining data collected
over time to identify patterns, trends, seasonality, and outliers by
visualizing the data through techniques like line plots,
autocorrelation plots, and decomposition, which helps in
understanding the underlying structure of the time series data and
guiding further analysis or modeling decisions.
Exploratory Data Analysis
Advanced EDA Techniques
●Time Series Analysis
Exploratory Data Analysis
Advanced EDA Techniques
• Time Series Analysis - The obvious graph to start with is the time
plot. That is, the observations are plotted against the time they
were observed, with consecutive observations joined by lines.
• In Python , we can use Pandas and Matplotlib:
Exploratory Data Analysis
Advanced EDA Techniques
• Time Series Analysis -
Exploratory Data Analysis
Advanced EDA Techniques
●Dimensionality Reduction (PCA)
●In Exploratory Data Analysis (EDA), dimensionality
reduction using Principal Component Analysis (PCA)
is a technique used to transform high-dimensional
data into a lower-dimensional space, allowing for
easier visualization and identification of patterns
within complex datasets, while still preserving the
most important information from the original data.
Exploratory Data Analysis
Advanced EDA Techniques
●Dimensionality Reduction (PCA)
●Principal Component Analysis (PCA) is a
dimensionality reduction technique that can be used to
reduce a larger set of feature variables into a smaller
set that still contains most of the variance in the larger
set.

●https://www.kaggle.com/code/prashant111/eda-logistic
-regression-pca
Exploratory Data Analysis
Advanced EDA Techniques Application
● Advanced Exploratory Data Analysis (EDA) in real-world
scenarios includes using techniques like
● Interaction plots to examine complex relationships between
multiple variables,
● Time series analysis to identify patterns in data over time,
● Dimensionality reduction to visualize high-dimensional data,
outlier detection using advanced statistical methods, and
applying
● Clustering algorithms to identify distinct groups within a
dataset, often applied in fields like customer churn prediction,
fraud detection, healthcare analytics, and market research.
Exploratory Data Analysis
Advanced EDA Techniques Application
A. Customer Churn Analysis:
●Interaction plots: Visualizing how factors like customer
tenure, monthly usage, and recent support interactions
combine to influence churn probability.
●Time series analysis: Identifying patterns in customer
behavior over time to predict churn risk based on
usage trends.
●Clustering: Grouping customers with similar
characteristics to target churn prevention strategies.
Exploratory Data Analysis
Advanced EDA Techniques
B. Healthcare Analytics:
• Dimensionality reduction: Analyzing large medical
datasets with many variables using techniques like
Principal Component Analysis (PCA) to identify key
factors impacting patient outcomes.
• Outlier detection: Identifying unusual patient data
points (e.g., extreme lab values) that could signal
potential health issues.
• Survival analysis: Studying factors influencing patient
survival rates using time-to-event analysis.
Exploratory Data Analysis
Advanced EDA Techniques:
1. Interaction Plot - Used to visualize how two or more variables interact
with each other.
• Example: Interaction between marketing spend and customer age on
sales.
Exploratory Data Analysis
Advanced EDA Techniques:
2. Time Series Analysis Plot
Shows how a variable changes over time.
• Example: Stock market trends, COVID-19 cases over time.
Exploratory Data Analysis
Advanced EDA Techniques:
3. Dimensionality Reduction (PCA, t-SNE, UMAP)
Used to visualize high-dimensional data in a lower-dimensional space.
• Example: PCA visualization of customer segmentation.
Exploratory Data Analysis
Advanced EDA Techniques:
3. Dimensionality Reduction (PCA, t-SNE, UMAP)
A. Interpreting the PCA Cluster Plot
• The X and Y axes represent Principal Component 1 and Principal Component 2, which
contain the most variance in the data.
• Each point represents a data sample, colored by the cluster it belongs to.
• Even though the data originally had more features (e.g., 5D or 10D), we compressed it to
2D while preserving the structure.
B. Advantages of PCA
• Reduces noise and redundancy in the data.
• Speeds up computations in machine learning models.
• Aids visualization of complex datasets.
Exploratory Data Analysis
Advanced EDA Techniques:
4. Outlier Detection (Boxplot, Z-score, Isolation Forest)
Identifies anomalies in data distribution.
• Example: Detecting fraud in credit card transactions.
Exploratory Data Analysis

Advanced EDA Techniques:

4. Outlier Detection (Boxplot)

Normal Transactions (Inside the Box & Whiskers)

• Most credit card transactions fall within the IQR.
• These are regular spending patterns that follow normal
behavior.
Exploratory Data Analysis

Advanced EDA Techniques:

4. Outlier Detection (Boxplot)

Suspicious Transactions (Outliers - Dots Beyond the Whiskers)

• Transactions outside the whiskers are considered anomalies.
• These may indicate fraudulent activity, such as:
• Unusually high transactions (e.g., a user who normally spends $50 suddenly spends
$5,000).
• Multiple small transactions in a short time (indicative of fraudsters testing a stolen
card).
• Spending in unfamiliar locations (geographical anomalies).
Exploratory Data Analysis

Advanced EDA Techniques:

5. Clustering (K-Means, DBSCAN, Hierarchical Clustering)
Groups similar data points.
• Example: Customer segmentation in market research.
Exploratory Data Analysis
Advanced EDA Techniques:
5. Clustering (K-Means, DBSCAN, Hierarchical Clustering)
Exploratory Data Analysis
Advanced EDA Techniques:
5. Clustering (K-Means, DBSCAN, Hierarchical Clustering)
The scatter plot above shows the results of applying K-Means clustering for customer
segmentation based on:
• Annual Income ($1000s) (X-axis)
• Spending Score (1-100) (Y-axis)
Interpretation
• Customers are grouped into 4 clusters, represented by different colors.
• Cluster Centroids (black 'X' markers) indicate the center of each group.
• This segmentation helps businesses identify customer behavior patterns, such as:
• High-income, high-spending customers (Luxury buyers)
• Low-income, low-spending customers (Budget-conscious buyers)
• High-income, low-spending customers (Potential luxury market)
• Low-income, high-spending customers (Discount seekers)
Exploratory Data Analysis
Challenges in EDA
●Dealing with Missing Data
●Addressing Outliers
●Handling Skewed Distributions
●Strategies and Best Practices
Exploratory Data Analysis
1. Dealing with Missing Data
Problem:
 Missing values can lead to biased analysis and reduce model
performance.
 Causes: Human errors, data corruption, sensor failures, or
incomplete records.
Exploratory Data Analysis
1. Dealing with Missing Data
Solutions:

 Imputation Methods:
o Mean/Median Imputation: Fill in missing values with the mean/median of the column.
o Mode Imputation: Fill categorical missing values with the most frequent value.
o KNN Imputation: Use K-Nearest Neighbours to predict missing values.
o Multiple Imputation: Create multiple datasets with different imputed values.

 Dropping Missing Data: If missing values are excessive and random.

 Domain-specific handling (E.g., using business rules to infer missing values.

Exploratory Data Analysis
2. Addressing Outliers
Problem:
 Outliers can skew results and lead to incorrect conclusions.
 Causes: Errors in data entry, fraud, rare but valid occurrences.
Exploratory Data Analysis
2. Addressing Outliers

Solutions:

 Visualization Techniques:
o Boxplots and Z-scores help detect outliers.

o Interquartile Range (IQR): Values outside Q1 - 1.5*IQR and Q3 + 1.5*IQR are considered outliers.

 Transformations:
o Log transformation or Winsorization to cap extreme values.

 Machine Learning Approaches:

o Isolation Forest, DBSCAN, One-Class SVM for anomaly detection.
Exploratory Data Analysis
3. Handling Skewed Distributions
Problem:
 Highly skewed data affects the performance of statistical tests and
machine learning models.
 Right-skewed: Income, sales, transaction amounts (heavy tail on the
right).
 Left-skewed: Negative reviews, rare events (heavy tail on the left).
Exploratory Data Analysis
3. Handling Skewed Distributions
Solutions:
 Transformation Methods:
o Log Transformation: Reduces right-skew.
o Box-Cox Transformation: Normalizes both left- and right-skewed
data.
o Square Root & Reciprocal Transformations: Adjust distributions
with mild skew.
 Binning Data: Converting continuous data into categorical bins.
Exploratory Data Analysis
4. Strategies and Best Practices

Best Practices for EDA:

1. Understand the Data Context: Know the domain to guide cleaning and transformations.

2. Use Visualization Techniques:

o Histograms, Boxplots, Pairplots, and Correlation Heatmaps to explore patterns.

3. Feature Engineering: Create meaningful features to improve analysis.

4. Data Scaling & Normalization: Helps in models that rely on distance calculations (e.g., KNN,
SVM).

5. Automate EDA with Tools: Pandas Profiling, Sweetviz, AutoViz for rapid insights.
Exploratory Data Analysis

Interactive EDA Tools

●Introduction to Tools like Jupyter Notebooks, R Shiny,
etc.

●Benefits of Interactive Exploration

●Visual Demonstrations
Exploratory Data Analysis
1. Introduction to Tools like Jupyter Notebooks, R
Shiny, etc.

Jupyter Notebooks (Python)

• Interactive coding environment for Python, R, and Julia.
• Supports live visualizations (Matplotlib, Seaborn, Plotly).
• Allows step-by-step data exploration with Markdown
documentation.
Exploratory Data Analysis
1. Introduction to Tools like Jupyter Notebooks, R
Shiny, etc.

R Shiny (R)
• Web-based interactive dashboards for EDA and data
visualization.
• Ideal for building dynamic reports that update with user
input.
• Used in data science, finance, and healthcare analytics.
Exploratory Data Analysis
1. Introduction to Tools like Jupyter Notebooks, R
Shiny, etc.

Other Tools
• Tableau / Power BI: Drag-and-drop interactive EDA.
• Google Colab: Cloud-based Jupyter alternative with free
GPU/TPU.
• Streamlit / Dash: Python frameworks for custom web-
based data apps.
Exploratory Data Analysis

2. Benefits of Interactive Exploration

 Real-Time Analysis → Immediate feedback on data trends.

Dynamic Filtering → Select specific ranges, apply filters, and update

visualizations.
 Better Collaboration → Share notebooks/dashboards for team analysis.

 Custom Reports → Generate automated insights for decision-making.

Exploratory Data Analysis
3. Visual Demonstrations
• Would you like a live interactive EDA example
using

• Jupyter Notebooks with Pandas Profiling

• Plotly, or Streamlit?
Exploratory Data Analysis
3. Visual Demonstrations
Option 1: Pandas Profiling (Automated EDA)
• Generates a full report of data insights, including:
• Missing values, distributions, correlations, and key statistics.

Option 2: Plotly (Interactive Graphs)

• Creates dynamic visualizations (scatter plots, histograms, and
bar charts).
• Users can zoom, filter, and hover over data points.
https://colab.research.google.com/drive/1IYD4dgd0pCpcx0ZDmnb
AydXdnWdFOTeC?usp=sharing
Exploratory Data Analysis
3. Visual Demonstrations
Option 3: Streamlit (Web App for EDA)
• Builds a lightweight web-based dashboard for exploring
datasets interactively.
• Supports real-time filtering, uploading files, and interactive
charts.
Exploratory Data Analysis
Exploratory Data Analysis

Case Study: Retail Sales Analysis

●Walkthrough of a Retail Sales Dataset
●Application of Various EDA Techniques
●Key Findings and Insights

https://colab.research.google.com/drive/1wBByojR4ce
felJ1T7z85hEVo1GFcosPB?usp=sharing
Python Libraries for Analysis and Visualization

• NumPy, Pandas, Seaborn, and Sklearn are a few of the foremost prevalent
libraries utilized in Python programming.

• NumPy may be a library for scientific computing, Pandas could be a library for
data analysis, Seaborn could be a library for visualizing information, and Sklearn
could be a library for machine learning.

• Each library provides effective, however simple, data manipulation and analysis
tools. With these libraries, engineers can rapidly and effectively make capable
applications that use the control of data science.
Python Libraries for Analysis and Visualization

1. NumPy (numpy)

Purpose: Numerical computations, handling large arrays & matrices efficiently.

Key Features:
• Supports multi-dimensional arrays.

• Provides mathematical & statistical functions.

• Faster than Python lists due to vectorization.

https://colab.research.google.com/drive/1gv_iUCnb301Zqh7UPI9Eq0ga4TJ6-GPn?usp=sharing
Python Libraries for Analysis and Visualization

2. Pandas (pandas)

Purpose: Data manipulation & analysis, primarily using DataFrames & Series.

Key Features:
• Handles missing data efficiently.

• Supports SQL-like operations on data.

• Works well with CSV, Excel, SQL, and JSON files.

https://colab.research.google.com/drive/1gv_iUCnb301Zqh7UPI9Eq0ga4TJ6-GPn?usp=sharing
Python Libraries for Analysis and Visualization

3. Seaborn (seaborn)

Purpose: Advanced data visualization based on Matplotlib.

Key Features:
• Attractive & informative statistical graphics.

• Built-in themes for styling.

• Integrated with Pandas for easy plotting.

https://colab.research.google.com/drive/1gv_iUCnb301Zqh7UPI9Eq0ga4TJ6-GPn?usp=sharing
Python Libraries for Analysis and Visualization

4. Scikit-Learn (sklearn)

Purpose: Machine learning, data preprocessing, and model evaluation.

Key Features:
• Provides algorithms for classification, regression, clustering.

• Supports feature selection and dimensionality reduction.

• Comes with utilities for train-test splitting and performance evaluation.

https://colab.research.google.com/drive/1gv_iUCnb301Zqh7UPI9Eq0ga4TJ6-GPn?usp=sharing
Python Libraries for Analysis and Visualization

Library Purpose Key Functionality

Numerical computing,
NumPy np.array(), np.mean(), np.std()
arrays, matrices
Data handling & pd.DataFrame(), df.describe(),
Pandas
analysis df.groupby()
Statistical data
Seaborn sns.scatterplot(), sns.histplot()
visualization
Machine learning & train_test_split(),
Scikit-Learn
model evaluation LinearRegression()
Invoke APIs and Web Services

• APIs (Application Programming Interfaces) and Web Services allow software

applications to communicate with each other over a network.

• Exploratory Data Analysis (EDA) is often used to access, retrieve, or send data
between different systems or platforms for analysis and visualization.

1. API (Application Programming Interface):

• An API is a set of rules and protocols allowing one software application to interact.

• In data analysis, APIs fetch data from online sources, databases, or other systems.

Examples:

1. Weather APIs to fetch weather data.

2. Financial APIs to retrieve stock prices or economic indicators.

Invoke APIs and Web Services

2. Web Services:
• A type of API that operates over a network (commonly the internet) to enable communication
between different systems.

• Web services typically use standard protocols like HTTP/HTTPS to send and receive data.

• Formats: Most web services provide data in structured formats like JSON or XML, which are
easy to process in Python.
Invoke APIs and Web Services

3. Invoking APIs/Web Services:

• Invoking means sending a request to the API endpoint (a URL) and receiving
the response (data).

• Python provides libraries like requests, urllib, and others to simplify this
process.
Invoke APIs and Web Services

Why Use APIs in EDA?

1. Access Live/Real-Time Data: APIs allow analysts to work with up-to-date datasets
from external services (e.g., social media platforms, financial systems, or weather
services).

2. Automation: Automating data retrieval through APIs saves time compared to

manual data collection.

3. Diverse Data Sources: APIs make combining multiple data sources into a single
analysis easy, enriching the EDA process.
Invoke APIs and Web Services

Python Libraries for APIs:

1. requests: Used to send HTTP requests to APIs and receive responses.

https://colab.research.google.com/drive/1-B7tpWiHUg15tOTGmT2UWV7ZPFt53bga?usp=shari
ng

2. json: Used to parse JSON responses from APIs.

3. urllib: Another library for accessing web services, often more detailed but less user-friendly than
requests.
Invoke APIs and Web Services
import requests

# Step 1: Define the API endpoint

url = "https://jsonplaceholder.typicode.com/posts/1"

# Step 2: Send a GET request to that URL

response = requests.get(url)

# Step 3: Print the status and the response content

print("Status Code:", response.status_code)
print("Response JSON:", response.json())
Invoke APIs and Web Services
OUTPUT
Status Code: 200

Response JSON: {'userId': 1, 'id': 1, 'title': 'sunt aut facere

repellat provident occaecati excepturi optio reprehenderit', 'body':
'quia et suscipit\nsuscipit recusandae consequuntur expedita et
cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum
est autem sunt rem eveniet architecto'}
Invoke APIs and Web Services
Status Code: 200
• This means the request was successful.

response.json() gives you a dictionary containing:

Invoke APIs and Web Services
Invoke APIs and Web Services
• If API works – normal output
Invoke APIs and Web Services
❌ If There’s a Problem

Error Type Example Trigger What You’ll See

"HTTP Error: 404 Client Error: Not

HTTPError Invalid URL or page not found
Found"

ConnectionError No internet / API server down "Connection Error: ..."

Timeout Slow or unresponsive API "Timeout Error: ..."

RequestException Catch-all for anything else "Something went wrong: ..."

Invoke APIs and Web Services

Example 2: Invoking an API in Python

Let’s fetch weather data from an example API:

import requests

# API endpoint and parameters

url = "http://api.weatherapi.com/v1/current.json"
params = {
"key": "YOUR_API_KEY", # Replace with your API key
"q": "New York", # Location
"aqi": "no" # Air Quality Index (optional)
}

# Send GET request to the API

response = requests.get(url, params=params)

# Parse the JSON response

if response.status_code == 200:
data = response.json()
print("Location:", data['location']['name'])
print("Temperature (C):", data['current']['temp_c'])
print("Condition:", data['current']['condition']['text'])
else:
print("Failed to fetch data. Status Code:", response.status_code)
Invoke APIs and Web Services

How APIs Work Here?

1.The client sends a request to the server using a specific endpoint.
2.The server processes the request and returns a response in JSON or XML format.
3.The client processes the response for further analysis.

Anatomy of an API Request

•Endpoint: The URL to which a request is sent (e.g., https://api.example.com/data).
•Request Methods:
•GET: Fetch data.
•POST: Send new data.
•PUT: Update existing data.
•DELETE: Remove data.
•Response: The server returns data in JSON, XML, or plain text format.
Invoke APIs and Web Services

Working with APIs in Python

Objective: How to interact with APIs using Python libraries.

• Python Libraries for APIs Requests

• A popular library to make HTTP requests.
• Install using: pip install requests.

• Sending API Requests

• Syntax: requests.get(endpoint), requests.post(endpoint, data).
• Parse JSON responses using .json().

• Common API Response Codes

• 200: Success.
• 404: Resource not found.
• 500: Internal server error.
Invoke APIs and Web Services

Practical Example in EDA:

1.Fetch data using APIs (e.g., stock prices).

2.Analyze data trends (e.g., using Pandas and Numpy).

3.Visualize trends (e.g., using Matplotlib, Seaborn).

Integrating APIs and web services with EDA techniques allows analysts to work
efficiently with dynamic and diverse datasets.
Python libraries for Analysis

Applications in EDA

• Cleaning real-world messy data for analysis.

• Extracting key metrics and patterns through grouping and aggregation.

• Transforming data to make it analysis-ready.

Visualize using Python: Matplotlib, Seaborn, and Folium.

Video Reference: https://www.youtube.com/watch?v=OOLlVlleaN4

Github Reference: https://github.com/oladapo-joseph/Automobile_Sales_Analysis
Visualize using Python: Matplotlib, Seaborn, and Folium.

1. Matplotlib: Basic Plotting

• Matplotlib is a foundational visualization library for creating static and interactive
plots.
Visualize using Python: Matplotlib, Seaborn, and Folium.

1. Matplotlib: Basic Plotting

• Matplotlib is a foundational visualization library for creating static and interactive
plots.
Visualize using Python: Matplotlib, Seaborn, and Folium.

1. Matplotlib: Basic Plotting

• Matplotlib is a foundational visualization library for creating static and interactive
plots.
Visualize using Python: Matplotlib, Seaborn, and Folium.

2. Seaborn: Statistical Visualization

• Seaborn builds on Matplotlib and is great for creating complex statistical graphics.
Visualize using Python: Matplotlib, Seaborn, and Folium.
Visualize using Python: Matplotlib, Seaborn, and Folium.
Visualize using Python: Matplotlib, Seaborn, and Folium.
Visualize using Python: Matplotlib, Seaborn, and Folium.

3. Folium: Interactive Maps

• Folium is perfect for creating interactive maps with markers and other features.
Visualize using Python: Matplotlib, Seaborn, and Folium.
ICT Teaching
• Experiential Learning:
https://docs.google.com/document/u/3/d/19I906QSWwJqziroItmA1uTc
b3xEjgjVq/edit?usp=drive_web&ouid=107372573615082269577&rtpo
f=true

• Problem Solving:
https://docs.google.com/document/u/3/d/16ZDBRZvOQAegQ8dqZdm-l
EG4o7MVmAvD/edit?usp=drive_web&ouid=10737257361508226957
7&rtpof=true

BreezeVIEW - CBRS Deployment and Operation
No ratings yet
BreezeVIEW - CBRS Deployment and Operation
45 pages
Rock Chute Design Data: Input Geometry
100% (1)
Rock Chute Design Data: Input Geometry
8 pages
What Is Exploratory Data Analysis?: Intuition
No ratings yet
What Is Exploratory Data Analysis?: Intuition
8 pages
BI-LEc 3
No ratings yet
BI-LEc 3
24 pages
03a EDA
No ratings yet
03a EDA
47 pages
03 Phan Tich Dau Tu Nang Cao - Phan Tich Kham Pha Du Lieu
No ratings yet
03 Phan Tich Dau Tu Nang Cao - Phan Tich Kham Pha Du Lieu
47 pages
ML EXP1_2201107
No ratings yet
ML EXP1_2201107
34 pages
What Is Exploratory Data Analysis - by Prasad Patil - Towards Data Science
No ratings yet
What Is Exploratory Data Analysis - by Prasad Patil - Towards Data Science
17 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
EDA 2
No ratings yet
EDA 2
69 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
EDA Feature eng- Estimation Inference and Hypothesis
No ratings yet
EDA Feature eng- Estimation Inference and Hypothesis
53 pages
Module 1 - 2 - EDA
No ratings yet
Module 1 - 2 - EDA
12 pages
Module 2
No ratings yet
Module 2
81 pages
Exploratory Data Analysis EDA Part of Data PreProcessing
No ratings yet
Exploratory Data Analysis EDA Part of Data PreProcessing
11 pages
Day 1 Article For Discussion
No ratings yet
Day 1 Article For Discussion
5 pages
Unit 4
No ratings yet
Unit 4
33 pages
EDA -task
No ratings yet
EDA -task
20 pages
Unit 3
No ratings yet
Unit 3
47 pages
Unit 3 Ids Notes
No ratings yet
Unit 3 Ids Notes
31 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
Unit - Iii - Eda
No ratings yet
Unit - Iii - Eda
25 pages
Practical - 1 - Data Exploration and Data Preparation - DAL - Lab
100% (1)
Practical - 1 - Data Exploration and Data Preparation - DAL - Lab
8 pages
Exploratory Data Analysis-1
No ratings yet
Exploratory Data Analysis-1
10 pages
EDA QB Full Answers
No ratings yet
EDA QB Full Answers
18 pages
Introduction-to-Exploratory-Data-Analysis-EDA
No ratings yet
Introduction-to-Exploratory-Data-Analysis-EDA
10 pages
Unit3 Eda
No ratings yet
Unit3 Eda
13 pages
Week-6 DS Practical
No ratings yet
Week-6 DS Practical
12 pages
Eda 2022 04 11 09352244
No ratings yet
Eda 2022 04 11 09352244
35 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
3 pages
DOC-20250125-WA0000.
No ratings yet
DOC-20250125-WA0000.
15 pages
AUTOMATED EDA Libraries
No ratings yet
AUTOMATED EDA Libraries
12 pages
EDA and Cleaning
No ratings yet
EDA and Cleaning
24 pages
UNIT 1 Exploratory Data Analysis
100% (1)
UNIT 1 Exploratory Data Analysis
8 pages
The analysis_In_EDA
No ratings yet
The analysis_In_EDA
7 pages
DSI237_GROUP_2
No ratings yet
DSI237_GROUP_2
27 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
13 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
m2 final
No ratings yet
m2 final
151 pages
exp 4-10 merged
No ratings yet
exp 4-10 merged
89 pages
Concepts of EDA, Outliers-Detection and Treatment
No ratings yet
Concepts of EDA, Outliers-Detection and Treatment
99 pages
Unit 1 - Intro To EDA
No ratings yet
Unit 1 - Intro To EDA
40 pages
Exploratory Data Analysis - Satyajit
No ratings yet
Exploratory Data Analysis - Satyajit
35 pages
Part2 Statistics
No ratings yet
Part2 Statistics
55 pages
Document (4)
No ratings yet
Document (4)
21 pages
3rd Session. Slides
No ratings yet
3rd Session. Slides
58 pages
EDA Unit 1 Notes
No ratings yet
EDA Unit 1 Notes
27 pages
Ch-1 Introduction To Data Analysis
No ratings yet
Ch-1 Introduction To Data Analysis
23 pages
C21_SMA_EXP4[1]
No ratings yet
C21_SMA_EXP4[1]
12 pages
EDA - Zep
No ratings yet
EDA - Zep
33 pages
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
No ratings yet
Systematic Approach To Perform Task Centric Exploratory Data Analysis With Case Study
8 pages
Data Exploration
No ratings yet
Data Exploration
5 pages
Exploratory Data Analysis
No ratings yet
Exploratory Data Analysis
10 pages
Engineering Statistics Handbook 2003
No ratings yet
Engineering Statistics Handbook 2003
1,522 pages
AIDS C04-Session-22
No ratings yet
AIDS C04-Session-22
22 pages
Exploratory Data Analysis Presentation
No ratings yet
Exploratory Data Analysis Presentation
16 pages
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
No ratings yet
22amh32 - Data Analytics and Data Science Unit I & Exploratory Data Analysis (Eda) 1. Exploratory Data Analysis (Eda)
9 pages
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
From Everand
Técnicas Estadísticas para la Ciencia de Datos a través de R. Aprendizaje Supervisado: Análisis Discriminante, Árboles de Decisión, Redes Neuronales y Modelos Lineales Generalizados
César Pérez López
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Data Analytics
From Everand
Data Analytics
Jeffery Short
1/5 (1)
CE133-1 - LEC3c - Determination of Magnitude
No ratings yet
CE133-1 - LEC3c - Determination of Magnitude
10 pages
Understanding DAG and Lazy Evaluation in Spark
No ratings yet
Understanding DAG and Lazy Evaluation in Spark
12 pages
Kinds of Variables and Their Uses
100% (2)
Kinds of Variables and Their Uses
2 pages
Amm Tasks Airbus A 319 320 321
No ratings yet
Amm Tasks Airbus A 319 320 321
6 pages
Principles of A Mechanical Shaker For Coffee Harvesting: Vol. X. January, 2008
No ratings yet
Principles of A Mechanical Shaker For Coffee Harvesting: Vol. X. January, 2008
15 pages
Mobilkran-Demag AC 395
0% (1)
Mobilkran-Demag AC 395
6 pages
AnalytixWise - Risk Analytics Unit 1 Introduction
100% (1)
AnalytixWise - Risk Analytics Unit 1 Introduction
45 pages
COA of Siberian Ginseng Extract
No ratings yet
COA of Siberian Ginseng Extract
1 page
Saf Freemile Fodu TD en V 1.2
No ratings yet
Saf Freemile Fodu TD en V 1.2
86 pages
Normalization
No ratings yet
Normalization
20 pages
Low Mid High Voltage Softstarter Description and Selection: HPS2 S18/30... 300/515
No ratings yet
Low Mid High Voltage Softstarter Description and Selection: HPS2 S18/30... 300/515
5 pages
Partial Discharge On Bushing
100% (2)
Partial Discharge On Bushing
87 pages
Multi-Channel Deep Convolutional Neural Networks For Multi-Classifying Thyroid Disease
No ratings yet
Multi-Channel Deep Convolutional Neural Networks For Multi-Classifying Thyroid Disease
14 pages
Progfunhandouts 2010
No ratings yet
Progfunhandouts 2010
43 pages
EC410-Chapter 1
No ratings yet
EC410-Chapter 1
36 pages
CNS Lab Manual
No ratings yet
CNS Lab Manual
25 pages
Sample Question Paper 2017-18 Science Class - X Time Allowed: 03 Hours Maximum Marks: 80 General Instructions
No ratings yet
Sample Question Paper 2017-18 Science Class - X Time Allowed: 03 Hours Maximum Marks: 80 General Instructions
4 pages
Tropical Cyclones and Floods in Fiji
No ratings yet
Tropical Cyclones and Floods in Fiji
16 pages
Var en
100% (1)
Var en
92 pages
Tarifa 2011 English
No ratings yet
Tarifa 2011 English
204 pages
Laws of Lenses Objective: Principle and Task
No ratings yet
Laws of Lenses Objective: Principle and Task
6 pages
GC 2024 04 30
No ratings yet
GC 2024 04 30
10 pages
Kumax: (1000 V / 1500 V) Cs3U-375 - 380 - 385 - 390 - 395Ms
No ratings yet
Kumax: (1000 V / 1500 V) Cs3U-375 - 380 - 385 - 390 - 395Ms
2 pages
Activity 5.5 Guided Notes
No ratings yet
Activity 5.5 Guided Notes
10 pages
C - D Production Function
No ratings yet
C - D Production Function
5 pages
Multi Layer Perceptron Haykin
No ratings yet
Multi Layer Perceptron Haykin
50 pages
Fuzzy Logic Control Systems I Lee 1990
No ratings yet
Fuzzy Logic Control Systems I Lee 1990
15 pages
KORG Collection: Owner's Manual
No ratings yet
KORG Collection: Owner's Manual
65 pages