0% found this document useful (0 votes)
20 views

Ai ML Exp2

The document discusses exploratory data analysis (EDA) of healthcare data. It describes five objectives of EDA: identifying data quality issues; understanding data distribution; exploring relationships; visualizing trends and patterns; and generating hypotheses. The key goals of EDA are outlined as data cleaning, descriptive statistics, data visualization, feature engineering, correlation analysis, data segmentation, and hypothesis generation. The document also differentiates between univariate EDA, which focuses on single variables, and multivariate EDA, which analyzes relationships between multiple variables. Overall, the document provides an overview of EDA techniques and their application to healthcare data.

Uploaded by

Kamat Hrishikesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

Ai ML Exp2

The document discusses exploratory data analysis (EDA) of healthcare data. It describes five objectives of EDA: identifying data quality issues; understanding data distribution; exploring relationships; visualizing trends and patterns; and generating hypotheses. The key goals of EDA are outlined as data cleaning, descriptive statistics, data visualization, feature engineering, correlation analysis, data segmentation, and hypothesis generation. The document also differentiates between univariate EDA, which focuses on single variables, and multivariate EDA, which analyzes relationships between multiple variables. Overall, the document provides an overview of EDA techniques and their application to healthcare data.

Uploaded by

Kamat Hrishikesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

EXPERIMENT NO-2

Aim: Perform Exploratory data analysis of Healthcare Data.

Objectives:
Exploratory Data Analysis (EDA) is a critical step in understanding and deriving insights from
healthcare data. Here are five objectives for performing EDA on healthcare data:
 Identify Data Quality Issues: The first objective is to assess the quality of the healthcare
data. This includes checking for missing values, outliers, and inconsistencies, which are
crucial for data integrity and reliable analysis.
 Understand Data Distribution: EDA helps in understanding the distribution of key
healthcare variables such as patient ages, diagnosis codes, and treatment outcomes. This
understanding can reveal trends and patterns within the data.
 Explore Relationships: EDA allows for the exploration of relationships between different
healthcare variables. For example, you can investigate how patient age impacts the likelihood
of specific medical conditions or treatment effectiveness.
 Visualize Trends and Patterns: EDA involves creating visualizations like histograms,
scatter plots, and box plots to highlight trends and patterns within the data. This helps in
making complex healthcare data more interpretable.
 Hypothesis Generation: EDA can lead to the generation of hypotheses for more focused
research. For instance, you may identify associations between certain patient characteristics
and health outcomes, leading to targeted investigations and studies in the healthcare domain.

Theory:

Exploratory Data Analysis (EDA): Exploratory Data Analysis is an approach to analyzing data
sets to summarize their main characteristics, often with the help of graphical representations. EDA
is used to gain a better understanding of the data, detect patterns, anomalies, and relationships, and
to inform subsequent data analysis. EDA is an essential step before conducting more advanced
statistical or machine learning analyses.

 The Foremost Goals of EDA


1. Data Cleaning: EDA involves examining the information for errors, lacking values, and
inconsistencies. It includes techniques including records imputation, managing missing statistics,
and figuring out and getting rid of outliers.

2. Descriptive Statistics: EDA utilizes precise records to recognize the important tendency,
variability, and distribution of variables. Measures like suggest, median, mode, preferred
deviation, range, and percentiles are usually used.

3. Data Visualization: EDA employs visual techniques to represent the statistics graphically.
Visualizations consisting of histograms, box plots, scatter plots, line plots,

heatmaps, and bar charts assist in identifying styles, trends, and relationships within the facts.

4. Feature Engineering: EDA allows for the exploration of various variables and their
adjustments to create new functions or derive meaningful insights. Feature engineering can

1
contain scaling, normalization, binning, encoding express variables, and creating interplay or
derived variables.

5. Correlation and Relationships: EDA allows discover relationships and dependencies between
variables. Techniques such as correlation analysis, scatter plots, and pass-tabulations offer insights
into the power and direction of relationships between variables.

6. Data Segmentation: EDA can contain dividing the information into significant segments based
totally on sure standards or traits. This segmentation allows advantage insights into unique
subgroups inside the information and might cause extra focused analysis.

7. Hypothesis Generation: EDA aids in generating hypotheses or studies questions based totally
on the preliminary exploration of the data. It facilitates form the inspiration for in addition
evaluation and model building.

8. Data Quality Assessment: EDA permits for assessing the nice and reliability of the
information. It involves checking for records integrity, consistency, and accuracy to make certain
the information is suitable for analysis.

 TYPES OF EDA

1. Univariate Exploratory Data Analysis (EDA): Univariate EDA focuses on the analysis of a
single variable at a time. Its primary goal is to understand and summarize the characteristics
of individual variables, typically using descriptive statistics and visualizations. Univariate
EDA can be further broken down into two main types:

 Descriptive Statistics: This type of univariate EDA involves calculating and examining
summary statistics for a single variable. Common statistics include mean, median, mode,
range, variance, standard deviation, and percentiles. Descriptive statistics provide an overview
of the central tendency, spread, and shape of the variable's distribution.
 Example: Calculating the mean and standard deviation of patient ages in a healthcare dataset.

 Data Visualization: Univariate EDA also includes creating visual representations of a single
variable's distribution. Common visualizations include histograms, box plots, bar charts, and
density plots. These visualizations help in understanding the shape, spread, and patterns
within the data.
 Example: Creating a histogram to visualize the distribution of patient ages in a healthcare
dataset.

2. Multivariate Exploratory Data Analysis (EDA): Multivariate EDA focuses on the


simultaneous analysis of relationships between multiple variables in a dataset. It aims to
uncover patterns, dependencies, and interactions between variables. Multivariate EDA can be
categorized into several types:

 Scatterplots: Scatterplots are used to visualize the relationship between two continuous
variables. They help identify correlations, trends, and outliers.
 Example: Creating a scatterplot to explore the relationship between patient age and
cholesterol levels in a healthcare dataset.

2
 Correlation Analysis: Correlation analysis quantifies the strength and direction of the linear
relationship between pairs of continuous variables. Common correlation coefficients include
Pearson's correlation and Spearman's rank correlation.
 Example: Calculating the Pearson correlation coefficient between patient weight and blood
pressure in a healthcare dataset.

 Categorical Data Analysis: Multivariate EDA also involves the analysis of categorical
variables. Techniques like contingency tables and chi-squared tests are used to examine the
relationships between categorical variables.
 Example: Analyzing the association between patient gender and the presence of specific
medical conditions in a healthcare dataset.

 Heatmaps: Heatmaps are used to visualize the relationships between multiple variables by
displaying a matrix of correlations or other measures.
 Example: Creating a heatmap to visualize correlations between various medical test results in
a healthcare dataset.

Univariate and multivariate EDA are both essential for understanding data and making informed
decisions. While univariate EDA provides insights into individual variables, multivariate EDA
uncovers complex relationships and interactions between variables, offering a more
comprehensive view of the data. These approaches are fundamental for data exploration,
hypothesis generation, and guiding subsequent analyses in a wide range of fields, including
healthcare, finance, and social sciences.

DIAGRAM:

CODE& OUTPUTS

3
 Loading the dataset and Getting Insights About The Dataset:

 EDA and more insight into the dataset

4
 OUTLIERS

5
6
CONCLUSION: In this experiment we got to study how to get insights about a dataset and how
to perform EDA(Exploratory Data Analysis), univariate EDA(Histogram), Multivariate
EDA(Scatterplot & Heatmap) on diabetes dataset.

You might also like