0% found this document useful (0 votes)
45 views

EDA Mini Report

Uploaded by

rubaselva6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views

EDA Mini Report

Uploaded by

rubaselva6
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 32

INCORPORATING MODEL CHECKING INTO EXPLORATORY VISUAL

ANALYSIS: TRENDS, PATTERNS, AND CONSISTENCY IN DATA


VISUALIZATION

A MINIPROJECT REPORT

Submitted by

JEYA ARAVINTH S(953621104019)


MUTHULAKSHMI M(953621104030)
PRAVEENA A(953621104033)
SELVARUBA S(953621104038)
-

BACHELOR OF ENGINEERING

IN

COMPUTER SCIENCE AND ENGINEERING

RAMCO INSTITUTE OF TECHNOLOGY,

RAJAPALAYAM

NOVEMBER 2024

1
BONAFIDE CERTIFICATE

Certified that this mini-project report "Incorporating Model Checking into Exploratory
Visual Analysis: Trends, Patterns, and Consistency in Data Visualization" is the
bonafide work of Selvaruba S (953621104038), who carried out the mini-project work under
my supervision..

SIGNATURE SIGNATURE
Mrs.S.Vijaya Amala Devi B.E.,M.E. Dr.K.Vijayalakshmi M.E., Ph.D.

Faculty In-charge HEAD OF THE DEPARTMENT


Assistant Professor Department of CSE
Department of Computer Science and Engineering Ramco Institute of Technology
Ramco Institute of Technology North Venganallur Village
North Venganallur Village Rajapalayam – 626117
Rajapalayam– 626117

INTERNAL EXAMINER EXTERNAL EXAMINER

2
ABSTRACT

Exploratory Visual Analysis (EVA) is an essential technique in data visualization that allows
analysts to investigate datasets interactively, form hypotheses, and uncover insights.
However, current EVA systems often lack rigorous mechanisms to ensure the validity of
findings, leading to potential biases or misinterpretations. This paper presents EVM
(Exploratory Visual Model-checking), a novel framework that incorporates model checking
techniques into EVA to improve analytical reliability. By integrating formal verification
methods, EVM enables users to validate hypotheses directly within the visualization
environment, detecting inconsistencies or unsupported patterns in real time. EVM supports a
wide range of data types and visual analytics tasks, enhancing the accuracy and dependability
of insights derived from visual analysis. Case studies and experimental results demonstrate
that EVM can help analysts mitigate cognitive biases, avoid common logical pitfalls, and
refine hypotheses more effectively than traditional EVA methods. This paper highlights
EVM’s potential to bridge the gap between exploratory data analysis and rigorous model
verification, setting a foundation for future advancements in reliable visual analytics.

As data-driven decision-making becomes increasingly critical across domains, analysts often


rely on Exploratory Visual Analysis (EVA) to identify trends, generate hypotheses, and gain
insights from complex datasets. Despite its value, EVA has inherent limitations, primarily
due to the potential for cognitive biases, subjective interpretation, and a lack of formal
validation tools, which can lead to unreliable conclusions. This paper introduces EVM
(Exploratory Visual Model-checking), a framework designed to incorporate model checking
—a method traditionally used for validating system behaviors—into the EVA process.

EVM enhances traditional EVA by integrating automated model checking to validate user
hypotheses and uncover logical inconsistencies within the data. The system leverages formal
verification techniques to support hypothesis testing in real-time, automatically flagging
possible issues when exploratory insights contradict the underlying data model.

3
TABLE OF CONTENT:

CONTENT PAGE NO

1.INTRODUCTION 5

1.1 Trends, Patterns, and Consistency in Data 5


Visualization

1.2 Project objectives 11

1.3 Project specification 12

2.SYSTEM SPECIFICATION 15

2.1Hardware specification 15

2.2 Software specification 15

3. PACKAGES 16

3.1 Seaborn 16

3.2 Pandas 16

3.3 Matplotlib 17

3.3.1 Matplotlib bar plot 18

3.3.2 Matplotlib histogram 19

4.APPENDIX 21

4.1 Source code 21

4.2.Screenshots 24

5. CONCLUSION 29

6.FUTURE WORK 30

7. REFERENCE 31

4
1.INTRODUCTION

1.1 TRENDS, PATTERNS, AND CONSISTENCY IN DATA VISUALIZATION

Incorporating model checking into exploratory visual analysis represents a significant


advancement in the way complex datasets are analyzed and understood. Traditionally,
exploratory data analysis (EDA) relies heavily on human intuition and visual inspection to
identify patterns and trends. However, these methods can be subjective and prone to
overlooking subtle inconsistencies or errors in the data. The integration of model checking
with data visualization enhances the rigor of this process, ensuring that the identified trends
and patterns align with the expected behavior of the data.

This research aims to demonstrate how model checking techniques, often used in formal
verification of software systems, can be effectively applied to the realm of data visualization.
By introducing a structured approach to verifying the consistency and validity of visual
representations, the study helps ensure that the insights drawn from the data are reliable and
not artifacts of flawed analysis. This is particularly crucial when working with large, complex
datasets, where simple visual tools might miss underlying errors or inconsistencies.

The primary focus of this paper is to explore how these two fields—model checking and
exploratory visual analysis—can be integrated to improve the robustness of data
interpretation, enabling more confident decision-making in data-driven domains such as
healthcare, finance, and engineering.

This integration begins with identifying and validating trends in data, such as recurring
movements or sustained directional changes. Automated detection methods, like moving
averages and regression analysis, allow for a systematic approach to observing trends, while
model checking ensures that these detected trends are consistent over time and free from
unexpected deviations. Similarly, patterns within the data, such as cyclic or seasonal
behaviors, can be more rigorously validated through model checking, with specifications in
place to verify that these cycles align with anticipated intervals. In complex datasets, simple
visualizations might miss these nuances, but model checking helps to confirm that the
identified patterns are reliable and representative of actual data behavior rather than random
variations or artifacts.

5
1. Data Preprocessing and Cleaning Module

This module prepares the dataset for analysis by handling missing values, outliers, and
inconsistent data entries. It includes steps like imputing missing values, filtering noise, and
standardizing formats, ensuring that the data meets quality standards before further analysis.
Effective preprocessing minimizes the risk of distorted insights in visual analysis due to data
anomalies.

Key Steps:

 Imputation of Missing Values: Techniques such as mean imputation, median


imputation, or using machine learning models to predict missing data values.
 Outlier Detection and Removal: Identifying outliers using statistical methods like Z-
scores or IQR, and either removing or replacing them.
 Noise Filtering: Smoothing data by applying methods like moving averages or
Gaussian filters.
 Standardization: Converting all data into a common format or scale (e.g., converting
all dates to a single format or normalizing numeric values).

Tools:

 Pandas (Python): Data manipulation and preprocessing.


 Scikit-learn (Python): Imputation and preprocessing utilities.
 R: For outlier detection and data cleaning techniques.

2. Trend Detection and Validation Module

This module employs statistical methods such as moving averages, linear and polynomial
regression, and time-series decomposition to identify trends within the dataset. The trends are
then validated through model checking, which confirms that they align with expected
behaviors and are free from unexpected deviations. This module ensures that trends are not
artifacts but are reliable indicators of the data's direction.

6
Key Steps:

 Moving Averages: Identifying overall trends by smoothing out short-term


fluctuations.
 Linear/Polynomial Regression: Identifying the direction (positive or negative) of
trends.
 Time-Series Decomposition: Breaking down data into seasonal, trend, and residual
components.

Tools:

 Statsmodels (Python): For time-series decomposition and regression.


 SciPy/NumPy: For moving averages and statistical tests.
 Temporal Logic Tools: To verify temporal consistency in time-series data.

3. Pattern Recognition and Cyclic Analysis Module

Focused on identifying repeating patterns, such as cyclic or seasonal behaviors, this module
uses Fourier analysis and seasonal decomposition techniques. Model checking within this
module verifies the consistency and expected intervals of these patterns, ensuring they
represent true cyclic behavior rather than random variations.

Key Steps:

 Fourier Analysis: Identifying periodic patterns in the data by transforming the data
into the frequency domain.
 Seasonal Decomposition: Breaking the time-series data into components that
represent trend, seasonal, and residual variations.
 Cyclic Pattern Identification: Using statistical methods or machine learning
algorithms to identify cyclic patterns (e.g., daily, monthly, yearly).

Tools:

 SciPy (Python): Fourier and signal processing.


 Statsmodels (Python): Seasonal decomposition.
 Seasonal Trend decomposition: For separating seasonal effects from other data
components.

7
4. Consistency Verification Module

This module cross-validates insights across different types of visualizations, such as


histograms, line charts, and heatmaps, to ensure consistency in interpretation. Model
checking frameworks within this module ensure that the core findings remain consistent,
regardless of visualization style, which is essential for complex datasets that might reveal
different aspects in various visual formats.

Key Steps:

 Visualization Comparison: Comparing trends and patterns identified in different


visualizations, such as line charts, bar graphs, and heatmaps.
 Cross-validation: Verifying that insights derived from one visualization type hold
true when represented in other forms.

Tools:

 Matplotlib/Seaborn (Python): For generating various types of visualizations.


 Tableau/Power BI: For interactive visual exploration and consistency checking.

5. Anomaly Detection and Correction Module

This module uses model checking to detect anomalies, such as outliers or unexpected shifts,
that could affect the reliability of trends and patterns. Anomalies are flagged and either
corrected or documented to maintain the dataset's integrity. This is especially crucial in large
datasets where subtle inconsistencies could significantly impact interpretation.

Key Steps:

 Outlier Detection: Using statistical methods such as Z-scores, IQR, or DBSCAN


(density-based clustering) to detect outliers.
 Anomaly Correction: Either removing or imputing values that do not fit within
expected ranges or behaviors.

Tools:

 Scikit-learn: For outlier detection and anomaly detection.


 Pandas: For data manipulation and correction of anomalies.
8
6. Visualization and Model Checking Interface Module

Integrating model checking with visualization tools, this module provides an interface for
users to visually inspect data while receiving automated validation feedback on identified
trends and patterns. By linking model checking algorithms to visual feedback, this module
enhances the user’s ability to explore data while ensuring that each insight meets predefined
accuracy standards.

Key Steps:

 User Interface: Allows users to interact dynamically with visualizations (e.g., zoom,
filter, adjust).
 Real-time Feedback: Model checking algorithms run in the background, providing
validation and feedback on the data’s accuracy and trends.

Tools:

 Dash/Streamlit: For creating interactive dashboards.


 D3.js: For interactive web-based data visualizations.
 Jupyter Notebooks: For combining interactive visualizations with code execution.

7. Reporting and Documentation Module

After completing the visual analysis, this module generates reports documenting verified
trends, patterns, and any anomalies. Model checking outcomes are logged to provide
transparency on data consistency and accuracy, helping stakeholders trust the findings and
supporting further data-driven decision-making.

Key Steps:

 Report Generation: Producing comprehensive reports with charts, tables, and


findings from the analysis.
 Documentation of Model Checking: Including model checking results to show the
validity of the analysis.

Tools:

 Jupyter Notebooks: For dynamic report generation.


 ReportLab: For generating PDF reports.

9
8. Hypothesis Generation and Testing Module

This module assists analysts in formulating and testing hypotheses based on the visualized
data. Using statistical testing methods, such as chi-square tests for categorical data or t-tests
for continuous data, the module helps to verify or refute assumptions about trends or patterns
in the data. By incorporating model checking, the module ensures that the results of
hypothesis testing are consistent and align with underlying data characteristics, reducing the
likelihood of drawing incorrect conclusions from visual insights.

Key Steps:

 Formulation of Hypotheses: Analysts propose hypotheses based on observed trends


or patterns.
 Statistical Testing: Testing hypotheses using methods like t-tests, ANOVA, chi-
square tests, etc.
 Model Checking: Ensuring the hypothesis testing results align with the expected
model or behaviors.

Tools:

 SciPy/Statsmodels: For hypothesis testing.


 R: For statistical analysis and hypothesis testing.

9. Interactive Data Exploration and User Feedback Module

This module allows users to interact dynamically with visualizations, modifying variables,
selecting data ranges, and adjusting visualization parameters to explore data from multiple
perspectives. The module incorporates user feedback loops, where analysts can mark areas of
interest or concern, prompting model checking to assess specific sections of the data more
closely. This interactive element promotes a hands-on approach to exploration while ensuring
insights remain validated and accurate through real-time model checking feedback.

Key Steps:

 Dynamic Exploration: Users can interact with visualizations, adjust data parameters,
and zoom into specific data points.
 Feedback Loop: Users provide feedback (e.g., marking anomalies or areas of
interest) that triggers further model checking on those specific areas.

10
1.2 PROJECT OBJECTIVE AND SCOPE

The primary objective of this project is to bridge model checking—a technique traditionally
used in software verification—with exploratory data visualization to establish a more robust
framework for data analysis. Exploratory data analysis (EDA) often relies on visual tools to
uncover trends, patterns, and relationships within datasets. However, without formal
validation, these insights may be subject to human error, interpretation bias, or artifacts from
incomplete data. This project proposes to enhance the reliability of visualizations by
embedding model checking within the EDA process, thus providing an added layer of
verification to validate the accuracy and consistency of visual insights.

Through this approach, the project will focus on incorporating model checking methods into
visualizations such as scatter plots and heatmaps. These types of visualizations are commonly
used to represent multidimensional relationships and data distributions. For example, scatter
plots can reveal correlations or clusters, while heatmaps can illustrate concentration patterns.
By applying model checking techniques, these visualizations are examined rigorously to
confirm that any detected patterns—such as a trend line or clustering—are consistent with the
dataset’s expected statistical or structural properties. This mitigates the risk of misinterpreting
random variations as genuine insights, ultimately providing a more dependable analysis
framework.

The scope of this project extends to key domains where data accuracy is critical, including
healthcare, finance, and engineering. For instance, in healthcare, consistent data visualization
can aid in identifying patterns in patient demographics or treatment outcomes, leading to
more effective healthcare planning. In finance, accurate visualization is vital for identifying
market trends and making informed investment decisions, while in engineering, it can support
pattern recognition in performance metrics or quality control. By ensuring the reliability of
visual data insights, the project contributes to more confident decision-making across these
fields.

Furthermore, this integration of model checking and visual analysis will enable analysts to
seamlessly transition between exploring data intuitively and verifying it rigorously. Users can
interact with visualizations, such as adjusting parameters or filtering data, while receiving
real-time feedback on the validity of the visual patterns they observe. This combination aims
to elevate the standard of data-driven insights by aligning intuitive analysis with formal,

11
systematic validation, creating a powerful toolset that enhances both the interpretability and
trustworthiness of data visualization in complex datasets.

1.3 PROJECT SPECIFICATION

This project is dedicated to advancing the rigor and dependability of exploratory visual
analysis by integrating model checking methodologies. Exploratory visual analysis,
commonly used for initial insights into datasets, typically relies on visualization techniques to
uncover trends, correlations, and patterns. However, while these visual insights can be
compelling, they often lack formal validation, which can lead to incorrect interpretations or
over-reliance on apparent trends that may not hold under closer scrutiny. By incorporating
model checking into this process, the project seeks to apply systematic verification to ensure
the reliability of identified trends and patterns, providing more accurate and dependable
insights.

Specifically, the project will focus on visualizations that are widely used in data exploration,
including scatter plots, heatmaps, and time-series plots. Scatter plots are frequently employed
to show correlations and distribution patterns among variables, heatmaps can reveal
concentration patterns and clustering tendencies, and time-series plots illustrate trends and
changes over time. Each of these visualization types will be analyzed through model
checking to confirm that the displayed relationships align with statistical expectations and
established properties of the data, reducing the likelihood of spurious findings due to
randomness or sampling anomalies.

A key aspect of this approach involves building statistical models or using known data
properties as benchmarks for the visualizations. For instance, trends identified in a time-series
plot might be checked against a moving average model to verify consistency over time. In
scatter plots, detected clusters or correlations will be verified to ensure they match with
known relationships or are not influenced by outlier effects. Heatmaps will similarly be
validated to confirm that density patterns represent genuine underlying data features and are
not skewed by anomalies or biased data segments. This validation process ensures that the
insights derived are truly reflective of the data, rather than artifacts of incomplete or biased
analysis.

In addition to ensuring statistical alignment, the project will incorporate demographic and
contextual elements—such as data variability, range, and potential outliers—to provide a
more comprehensive validation framework. By factoring in these elements, model checking

12
can account for deviations within expected limits, distinguishing genuine insights from
anomalies. For example, in datasets with significant demographic diversity or high
variability, visualizations might show more fluctuations; model checking will help identify
whether these variations fall within expected bounds or indicate an underlying trend.

Non-functional Requirements

1. Usability

o Interface design should be intuitive, with clear navigation and user guidance,
including tooltips and onboarding tutorials.

2. Reliability

o Model-checking algorithms should consistently verify data patterns and trends


across various datasets and visualizations.

o Real-time monitoring should operate without lag to ensure immediate


validation feedback.

3. Performance

o Data preprocessing and validation should occur within minimal latency, even
for large datasets.

o Optimized for speed, with real-time model-checking integrated into


visualizations to maintain smooth user interactions.

4. Scalability

o The framework should accommodate large datasets and expanding data


sources.

o Scalable architecture to support the addition of more data sources, modules, or


visualizations as required.

5. Compatibility

o Cross-platform compatibility with popular data visualization tools (Tableau,


Power BI) and support for major file formats (CSV, Excel, JSON).

o Compatible with Windows, macOS, and Linux for a wide range of users.

13
6. Data Security

o Data processing and analysis should follow strict data integrity protocols.

o Access control to ensure only authorized users can modify data or


visualization settings.

7. Documentation and Support

o Complete documentation covering module functionality, troubleshooting, and


user support.

o Regular updates for feature improvements, and prompt response to bug


reports.

8. Maintainability

o Modular code design for easy maintenance and updates.

o Well-documented codebase and modular structure, allowing seamless


integration of new features.

9. Error Handling and Logging

o Detailed logging for all model-checking validation processes, capturing any


inconsistencies or anomalies.

o Error recovery mechanisms to prevent data loss or incorrect validation


outcomes.

10. Extensibility

o The EVM framework should be designed with extensibility in mind, allowing


developers to easily add new model-checking algorithms, data analysis
techniques, or visualization types without major modifications to the core
system. This flexibility supports long-term adaptability, enabling the
framework to evolve with advancements in data science and model-checking
methodologies.

11. User Customization

14
o The system should allow users to customize their visualization and validation
settings, such as specifying threshold parameters for trend and anomaly
detection, choosing color schemes, and defining the frequency of validation
checks. This personalization ensures that users can tailor the framework to best
suit their data needs and analytical preferences, enhancing usability across
different user profiles.

2.SYSTEM SPECIFICATION

2.1 Hardware specification

 Processor : Intel dual core


 Processor speed : 1.04GHZ
 Ram : 1GB
 Monitor
 Keyboard
 Mouse

2.2 Software specification

 OS
 Language : Python
 Compiler : googlecolab

Additional Software Considerations:

Libraries and Frameworks:

 Data Analysis: Pandas, NumPy, Matplotlib

 Machine Learning: TensorFlow, Scikit-learn, Keras

 Web Development (if applicable): Flask, Django

 Version Control: Git (for managing code versions and collaborations)

Cloud Storage/Backup:

 Google Drive or other cloud storage solutions for file management and project
collaboration.

15
Web Browser:

 A modern web browser (e.g., Chrome, Firefox) is essential for accessing Google
Colab, cloud storage, and other online tools.

3.PACKAGES

3.1 SEABORN

 Seaborn is an amazing visualization library for statistical graphics plotting in Python.


It provides beautiful default styles and color palettes to make statistical plots more
attractive. It is built on top matplotlib library and is also closely integrated with the
data structures from pandas.
 Seaborn aims to make visualization the central part of exploring and understanding
data. It provides dataset-oriented APIs so that we can switch between different visual
representations for the same variables for a better understanding of the dataset.

3.2 PANDAS

 Pandas is a Python library used for working with data sets.


 It has functions for analyzing, cleaning, exploring, and manipulating data.
 The name "Pandas" has a reference to both "Panel Data", and "Python Data Analysis"
and was created by Wes McKinney in 2008.

Why Use Pandas

Pandas allows us to analyze big data and make conclusions based on statistical theories.
Pandas can clean messy data sets, and make them readable and relevant.
Relevant data is very important in data science.
Pandas gives you answers about the data. Like:

 Is there a correlation between two or more columns?


 What is average value?
 Max value?

16
 Min value?
 Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty
or NULL values. This is called cleaning the data.

INSTALLING PANDAS PACKAGE

pip install pandas

Import Pandas

Once Pandas is installed, import it in your applications by adding the import keyword:

import pandas

3.3 MATPLOTLIB

 Matplotlib is a cross-platform, data visualization and graphical plotting library for


Python and its numerical extension NumPy.
 As such, it offers a viable open source alternative to MATLAB. Developers can also
use matplotlib’s APIs(Application Programming Interfaces) to embed plots inGUI
applications.

A Python matplotlib script is structured so that a fewlines of code are all that is required in
most instancesto generate a visual data plot.

The matplotlib scripting layer overlays two APIs:

 The pyplot API is a hierarchy of Python codeobjects topped by matplotlib.pyplot


 An OO (Object-Oriented) API collection of objectsthat can be assembled with greater
flexibility thanpyplot. This API provides direct access to Matplotlib’sbackend layers.

Matplotlib and Pyplot in Python :

The pyplot API has a convenient MATLAB-style statefulinterface. In fact, matplotlib was
originally written as an open source alternative for MATLAB. The OO API and its interface
is more customizable and powerful than pyplot, but considered more difficult to use. As a
result, the pyplot interface is more commonly used, and is referred to by default in this
article.

17
Understanding matplotlib’s pyplot API is key to understanding how to work with plots:

 matplotlib.pyplot.figure: Figure is the top-level container. It includes everything


visualized in a plot including one or more Axes.
 matplotlib.pyplot.axes: Axes contain most of the elements in a plot: Axis, Tick,
Line2D, Text, etc., and sets the coordinates. It is the area in which data is plotted.
Axes include the X-Axis, Y-Axis, and possibly a Z-Axis, as well.

Installing Matplotlib :

pip install matplotlib

3.3.1 MATPLOTLIB BAR PLOT:

A bar plot or bar chart is a graph that represents the category of data with rectangular bars
with lengths and heights that is proportional to the values which they represent. The bar plots
can be plotted horizontally or vertically. A bar chart describes the comparisons between the
discrete categories. One of the axis of the plot represents the specific categories being
compared, while the other axis represents the measured values corresponding to those
categories.

Creating a bar plot:

The matplotlib API in Python provides the bar() function which can be used in MATLAB
style use or as an object-oriented API. The syntax of the bar() function to be used with the
axes is as follows:- plt.bar(x, height, width, bottom, align).

EXAMPLE:

import matplotlib.pyplot as plt

products = ['Product A', 'Product B', 'Product C', 'Product D']

sales = [250, 350, 200, 450]

plt.bar(products, sales, color='green')

plt.xlabel('Products')

18
plt.ylabel('Sales')

plt.title('Sales by Product')

plt.show()

Output:

FIGURE 1:BAR CHART

3.3.2 MATPLOTLIB HISTOGRAM:

A histogram is an accurate representation of the distribution of numerical data. It


is an estimate of the probability distribution of a continuous variable. It is a kind of bar graph.

 Bin the range of values.


 Divide the entire range of values into a series of intervals.

19
 Count how many values fall into each interval.
 Example: If you have a dataset with values ranging from 0 to 100, you might divide it
into bins of size 10, resulting in bins for values 0-10, 10-20, 20-30, and so on.
 Matplotlib allows you to specify the number of bins or the specific bin edges when
creating the histogram.

The matplotlib.pyplot.hist() function plots a histogram. It computes and draws the


histogram of x.

EXAMPLE:

from matplotlib import pyplot as plt


import numpy as np
expenses = np.array([220, 350, 150, 500, 600, 700, 300, 450, 800, 250, 350, 900, 1000, 500,
450])
fig, ax = plt.subplots(1, 1)
ax.hist(expenses, bins=[0, 250, 500, 750, 1000])
ax.set_title("Histogram of Monthly Expenses")
ax.set_xticks([0, 250, 500, 750, 1000])
ax.set_xlabel('Expenses (USD)')
ax.set_ylabel('Number of People')
plt.show()

OUTPUT:

20
FIGURE 2:HISTOGRAM

4.APPENDIX

4.1 SOURCE CODE

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

# Load the dataset

df = pd.read_csv("economic_indicators.csv")

# Distribution of Inflation, Unemployment Rate, and GDP Growth

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Inflation Distribution

21
sns.histplot(df['inflation'], kde=True, ax=axes[0], color="blue")

axes[0].set_title("Distribution of Inflation")

axes[0].set_xlabel("Inflation Rate (%)")

# Unemployment Rate Distribution

sns.histplot(df['unemployment_rate'], kde=True, ax=axes[1], color="green")

axes[1].set_title("Distribution of Unemployment Rate")

axes[1].set_xlabel("Unemployment Rate (%)")

# GDP Growth Distribution

sns.histplot(df['gdp_growth'], kde=True, ax=axes[2], color="purple")

axes[2].set_title("Distribution of GDP Growth")

axes[2].set_xlabel("GDP Growth (%)")

plt.tight_layout()

plt.show()

# GDP Growth Over Time per Country

plt.figure(figsize=(14, 8))

sns.lineplot(data=df, x="date", y="gdp_growth", hue="country_id", marker="o",


palette="tab10")

plt.title("GDP Growth Over Time per Country")

plt.xlabel("Date")

plt.ylabel("GDP Growth (%)")

plt.legend(title="Country ID", bbox_to_anchor=(1.05, 1), loc="upper left")

plt.xticks(rotation=45)

plt.tight_layout()

plt.show()

# Mean Inflation Across All Countries Over Time

22
daily_mean_inflation = df.groupby("date")["inflation"].mean().reset_index()

plt.figure(figsize=(12, 6))

sns.lineplot(data=daily_mean_inflation, x="date", y="inflation", marker="o", color="red")

plt.title("Mean Inflation Over Time (All Countries)")

plt.xlabel("Date")

plt.ylabel("Mean Inflation Rate (%)")

plt.xticks(rotation=45)

plt.tight_layout()

plt.show()

# Rolling Average Unemployment Rate over Time per Country

df['unemployment_rate_rolling_avg'] = df.groupby('country_id')
['unemployment_rate'].transform(

lambda x: x.rolling(window=7, min_periods=1).mean())

plt.figure(figsize=(14, 8))

sns.lineplot(data=df, x="date", y="unemployment_rate_rolling_avg", hue="country_id",


palette="tab10", marker="o")

plt.title("7-Day Rolling Average Unemployment Rate per Country")

plt.xlabel("Date")

plt.ylabel("Unemployment Rate (7-Day Rolling Avg, %)")

plt.legend(title="Country ID", bbox_to_anchor=(1.05, 1), loc="upper left")

plt.xticks(rotation=45)

plt.tight_layout()

plt.show()

# Plotting Consumer Confidence Index Over Time

plt.figure(figsize=(14, 8))

23
sns.lineplot(data=df, x="Year", y="Consumer Confidence Index", marker="o",
color="brown")

sns.lineplot(data=df, x="Year", y="Public Debt (% of GDP)", marker="o", color="orange")

sns.lineplot(data=df, x="Year", y="Unemployment Rate (%)", marker="o", color="red")

plt.title("Consumer Confidence Index Over Time")

plt.xlabel("Year")

plt.ylabel("Consumer Confidence Index")

plt.xticks(rotation=45)

plt.tight_layout()

plt.show()

4.2 SCREENSHOTS

24
FIGURE 3:DATASET IMPORT

FIGURE 4: DISTRIBUTION

25
FIGURE 5:GDP GROWTH

FIGURE 6:MEAN INFLATION

26
FIGURE 7:UNEMPLOYMENT ROLLING AVERAGE

27
FIGURE 8:INFLATION ROLLING AVERAGE

FIGURE 9:GDP GROWTH ROLLING AVERAGE

28
FIGURE 10:DATASET(ECONOMIC_INDICATORS)

5. CONCLUSION

The introduction of Exploratory Visual Model-checking (EVM) represents a significant


advancement in enhancing the rigor and accuracy of data-driven insights. Traditional
Exploratory Visual Analysis (EVA) techniques, while invaluable for uncovering patterns and
generating hypotheses, often rely heavily on human interpretation. This subjectivity can
introduce cognitive biases and lead to misinterpretation, particularly in complex datasets
where trends are not immediately obvious or are masked by noise. By integrating formal
model-checking techniques into EVA, EVM systematically addresses these limitations,
creating a more structured and reliable framework for data analysis.

Through automated hypothesis validation and real-time detection of inconsistencies, EVM


provides a mechanism for cross-referencing visual insights with underlying data integrity.
This ensures that any observed patterns or correlations are supported by the data, reducing the
chance of overlooking hidden inconsistencies or misinterpreting random artifacts as
meaningful insights. Analysts are thereby able to refine their hypotheses more rigorously,
eliminating logical errors that might otherwise go unnoticed and enhancing the credibility of
their findings. The result is a more trustworthy exploratory analysis process, capable of
producing insights that are both grounded in data and free from subjective biases.

29
The efficacy of EVM has been demonstrated through case studies, which reveal its utility in
complex, high-stakes environments such as healthcare, finance, and engineering. In these
fields, where data-driven decisions can have substantial impacts, EVM’s ability to detect
inconsistencies and refine hypotheses proves invaluable. Analysts can be more confident that
their conclusions are not merely plausible but are verified against objective criteria,
minimizing the risk of costly misjudgments. This integration of formal verification into
exploratory analysis is particularly beneficial in today’s data-driven landscape, where
decision-makers require accurate and reliable insights to navigate complex challenges.

Looking forward, the future development of EVM will focus on optimizing its scalability to
handle large, real-time datasets without compromising performance. Additionally, further
integration with popular visualization tools is essential to make EVM widely accessible to
data analysts across various domains. By making EVM compatible with existing EVA tools
and scalable for large data environments, this framework has the potential to become a
cornerstone in data analysis workflows.

6.FUTURE WORK

While Exploratory Visual Model-checking (EVM) demonstrates promising improvements


in data analysis by integrating model checking into Exploratory Visual Analysis (EVA),
several areas remain for future development to expand its capabilities and usability:

1. Scalability and Performance Optimization: One of the main challenges for EVM is
ensuring its efficiency when working with large, complex datasets. Future work
should focus on optimizing the underlying model checking algorithms to handle big
data more effectively. This includes developing more efficient algorithms for real-
time validation, parallelization techniques, and approaches that reduce the
computational overhead of model checking, particularly in high-dimensional or
streaming data scenarios.
2. Integration with Existing Data Visualization Tools: For EVM to gain broader
adoption, it is essential to seamlessly integrate with popular visualization tools and
platforms, such as Tableau, Power BI, or D3.js. Future efforts could focus on
developing plug-ins or APIs that allow EVM to work with these existing tools without

30
requiring significant changes to current workflows. This would make it easier for
analysts to adopt EVM without needing to learn entirely new systems.
3. Extending Support for Diverse Data Types: While EVM is designed to handle a
wide range of data types, there is room to expand its capabilities. Future work should
focus on enhancing EVM's ability to support specialized data types, such as temporal,
spatial, or graph data, and integrate with emerging data formats. Improved support for
different data modalities will make EVM more versatile across various domains,
including healthcare, geospatial analysis, and social network analysis.
4. Improved User Interface and Experience: While EVM offers powerful verification
capabilities, the user interface (UI) could be enhanced to make the framework more
accessible to a broader audience. Future developments could focus on designing
intuitive, user-friendly interfaces that allow users—particularly those without formal
training in model checking—to easily interpret validation results and incorporate them
into their analysis process.
5. Adaptive Model Checking for Dynamic Data: Real-time data streams are
increasingly common in many fields, from financial markets to IoT systems. Future
work could explore how EVM can be adapted to handle dynamic, continuously
updating data in real time.

7.REFERENCES

WEBSITES:

https://arxiv.org/abs/
https://www.researchgate.net/publication/374941893_EVM_Incorporating_Model_Checking_
into_Exploratory_Visual_Analysis
https://idl.uw.edu/papers/evm

PERFORMANCE

VIVA-VOCE

MINI PROJECT 31

TOTAL
32

You might also like