EDA Exploratory Data Analysis (1)
EDA Exploratory Data Analysis (1)
Analysis)
• Exploratory Data Analysis (EDA) refers to the method of
studying and exploring record sets to apprehend their
predominant traits, discover patterns, locate outliers,
and identify relationships between variables. EDA is
normally carried out as a preliminary step before
undertaking extra formal statistical analyses or
modeling.
Goals of EDA
• Data Cleaning
• Descriptive Statistics
• Data Visualization
• Feature Engineering
• Correlation
Types of EDA
1. Univariate Analysis: This sort of evaluation makes a
speciality of analyzing character variables inside the records
set. It involves summarizing and visualizing a unmarried
variable at a time to understand its distribution, relevant
tendency, unfold, and different applicable records. Techniques
like histograms, field plots, bar charts, and precis information
are generally used in univariate analysis.
2. Bivariate Analysis: Bivariate evaluation involves exploring
the connection between variables. It enables find associations,
correlations, and dependencies between pairs of variables.
Scatter plots, line plots, correlation matrices, and move-
tabulation are generally used strategies in bivariate analysis.
3. Multivariate Analysis: Multivariate analysis extends bivariate
evaluation to encompass greater than variables. It ambitions to
apprehend the complex interactions and dependencies among more
than one variables in a records set. Techniques inclusive of
heatmaps, parallel coordinates, aspect analysis, and primary
component analysis (PCA) are used for multivariate analysis.
4. Time Series Analysis: This type of analysis is mainly applied to
statistics sets that have a temporal component. Time collection
evaluation entails inspecting and modeling styles, traits, and
seasonality inside the statistics through the years. Techniques like
line plots, autocorrelation analysis, transferring averages, and ARIMA
(AutoRegressive Integrated Moving Average) fashions are generally
utilized in time series analysis.
5. Missing Data Analysis: Missing information is a not unusual
issue in datasets, and it may impact the reliability and validity of the
evaluation. Missing statistics analysis includes figuring out missing
values, know-how the patterns of missingness, and using suitable
techniques to deal with missing data. Techniques along with lacking
facts styles, imputation strategies, and sensitivity evaluation are
employed in lacking facts evaluation.
6. Outlier Analysis: Outliers are statistics factors that drastically
deviate from the general sample of the facts. Outlier analysis
includes identifying and knowledge the presence of outliers, their
capability reasons, and their impact at the analysis. Techniques
along with box plots, scatter plots, z-rankings, and clustering
algorithms are used for outlier evaluation.