0% found this document useful (0 votes)
26 views

Report Reference

Uploaded by

Ashiya Ajare
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Report Reference

Uploaded by

Ashiya Ajare
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

A PROJECT REPORT ON

Time Series-Based AI for Accurate Biomass Management


Prediction

Submitted Towards the


Partial Fulfilment of the Requirements of

Bachelor of Engineering (Computer Engineering)

by

Vaidehi Patil Exam No: B190134317


Pranav Shimpi Exam No: B190134324
Sayali Kulkarni Exam No: B190134276
Sanket Shirsath Exam No: B190134342

Under the Guidance of


Prof. C. R. Patil

Department of Computer Engineering


K. K. Wagh Institute of Engineering Education & Research
Hirabai Haridas Vidyanagari, Amrutdham, Panchavati,
Nashik-422003
Savitribai Phule Pune University
A. Y. 2023-2024 Sem I
K. K. Wagh Institute of Engineering Education and Research
Department of Computer Engineering

CERTIFICATE

This is to certify that the Project Titled

Time Series-Based AI for Accurate Biomass Management Prediction

Submitted by

Vaidehi Patil Exam No: B190134317


Pranav Shimpi Exam No: B190134324
Sayali Kulkarni Exam No: B190134276
Sanket Shirsath Exam No: B190134342

is a bonafide work carried out by students under the supervision of Prof. C. R. Patil
and it is submitted towards the partial fulfilment of the requirement of Bachelor of
Engineering (Computer Engineering) during academic year 2023-2024.

Prof. C. R. Patil Prof. Dr. S. S. Sane


Internal Guide Head
Department of Computer Engineering Department of Computer Engineering
Abstract

This research project introduces a comprehensive approach to transform biomass


resource management through the utilization of advanced time series analysis tech-
niques, specifically the Prophet algorithm, and the integration of vital environmen-
tal variables. By developing a predictive model rooted in historical biomass data,
the objective is to provide precise forecasts of future biomass availability, offering
valuable insights for strategic planning and resource allocation. The methodology
includes the application of interpolation techniques to optimize the distribution net-
work, with a focus on determining optimal distances between harvesting sites, de-
pots, and refineries. The overarching goal is to enhance logistics efficiency while
minimizing resource wastage. Crafting cost-effective distribution routes not only
reduces transportation expenses but also significantly curtails energy consumption,
ensuring timely deliveries and promoting sustainable, eco-conscious operations. To
empower decision-makers with actionable insights, statistical dashboards are intro-
duced, seamlessly integrated with Google Maps for geospatial pattern interpretation.
These dynamic dashboards serve as visual aids in identifying efficient biomass dis-
tribution pathways, facilitating informed resource management. Furthermore, the
research extends to the identification and comprehensive analysis of raw materials
critical for biomass production and their transformation into biofuel. This aspect of
the project aligns seamlessly with broader sustainability goals by contributing to the
generation of clean and renewable energy. Importantly, this holistic approach tran-
scends resource optimization, serving as a potent tool in reducing greenhouse gas
emissions and mitigating the impacts of climate change.

Keywords: Time series analysis, Transportation cost reduction, Biomass forecast-


ing, Geospatial visualization, Environmental sustainability, Distribution network

I
Acknowledgment

First and foremost, We would like to thank my project guide, Prof.C. R. Patil, for
her guidance and support. We will forever remain grateful for the constant support
and guidance extended by our guide, in making this project report.
Through our many discussions, she helped us to form and solidify ideas. With a deep
sense of gratitude, we wish to express our sincere thanks to, Prof. Dr. S. S. Sane for
his immense help in planning and executing the works in time. Our grateful thanks
to the departmental staff members for their support.
We would also like to thank our wonderful colleagues and friends for listening our
ideas, asking questions and providing feedback and suggestions for improving our
ideas.

Vaidehi Patil
Pranav Shimpi
Sayali Kulkarni
Sanket Shirsath
(B.E. Computer Engg.)

II
INDEX

1 Introduction 1
1.1 Project Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation of the Project . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Problem Definition and scope 6


2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Goals and objectives . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Assumption and Scope . . . . . . . . . . . . . . . . . . . . 7
2.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Type of Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Project Plan 13
3.1 Project Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Team Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2.1 Team structure . . . . . . . . . . . . . . . . . . . . . . . . 15

4 Software requirement specification 16


4.1 Functional Requirements . . . . . . . . . . . . . . . . . . . . . . . 17
4.2 Non Functional Requirements . . . . . . . . . . . . . . . . . . . . 18
4.3 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.3.1 User Interface Constraints . . . . . . . . . . . . . . . . . . 18
4.3.2 Hardware Constraints . . . . . . . . . . . . . . . . . . . . . 19
4.3.3 Software Constraints . . . . . . . . . . . . . . . . . . . . . 19
4.4 Hardware Requirements . . . . . . . . . . . . . . . . . . . . . . . 19
4.5 Software Requirements . . . . . . . . . . . . . . . . . . . . . . . . 21
4.6 Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Detailed Design 23
5.1 Architectural Design . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Data design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2.1 Data structure . . . . . . . . . . . . . . . . . . . . . . . . 25
5.2.2 Database description . . . . . . . . . . . . . . . . . . . . . 30
5.3 Component design/ Data Model . . . . . . . . . . . . . . . . . . . 31
5.3.1 Class Diagram . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3.2 Flow Chart . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6 Experimental setup 33
6.1 Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6.1.1 Biomass History . . . . . . . . . . . . . . . . . . . . . . . 34
6.1.2 Distance Matrix . . . . . . . . . . . . . . . . . . . . . . . . 34
6.2 Technology Used . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2.1 Prophet . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2.2 MERN stack . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2.3 React Native . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2.4 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.2.5 KNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.2.6 TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2.7 PyCharm . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.2.8 Visual Studio Code . . . . . . . . . . . . . . . . . . . . . . 37
6.3 Performance Parameters . . . . . . . . . . . . . . . . . . . . . . . 38
6.4 Efficiency Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7 Summary and Conclusion 40

Annexure A Plagiarism Report 44

Annexure B Paper Published (if any) 45

IV
Annexure C Sponsorship detail (if any) 46

V
List of Figures

3.1 Gantt Chart A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14


3.2 Gantt Chart B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

5.1 Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25


5.2 Use-Case Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Flowchart Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . 32
List of Tables

5.1 Biomass History Table . . . . . . . . . . . . . . . . . . . . . . . . 27


5.2 Distribution Network Data Table . . . . . . . . . . . . . . . . . . . 28
5.3 Distribution Network Data Table . . . . . . . . . . . . . . . . . . . 29
5.4 Raw Material Analysis Table . . . . . . . . . . . . . . . . . . . . . 30
CHAPTER 1

INTRODUCTION
1.1 PROJECT IDEA

To develop a predictive model that utilizes advanced AI and time series anal-
ysis techniques, this research project aims to accurately forecast future biomass
availability by leveraging historical biomass data and integrating vital environmental
factors. The project employs time series analysis algorithms, such as ARIMA and
LSTM, to analyze the temporal patterns in the data and make more accurate predic-
tions. It also employs interpolation techniques to optimize the distribution network,
calculating optimal distances between harvesting sites, depots, and refineries. This
ensures efficient logistics, minimizes resource wastage, and creates cost-effective
distribution routes that reduce transportation costs and energy consumption. To sup-
port decision-making, the project incorporates statistical dashboards that seamlessly
integrate with Google Maps, providing visual aids to decode geospatial patterns and
identify efficient distribution pathways. Additionally, the research delves into the
identification and analysis of raw materials essential for biomass production and
their transformation into biofuel, aligning with sustainability goals and contribut-
ing to the generation of clean, renewable energy. This comprehensive approach not
only enhances resource management but also serves as a powerful tool in reducing
greenhouse gas emissions and mitigating climate change.

1.2 MOTIVATION OF THE PROJECT

• Accurate Biomass Prediction


The project addresses the need for accurate forecasting of future biomass avail-
ability, which is crucial for strategic planning and resource allocation. This
ensures that resources are used efficiently and effectively, reducing wastage
and optimizing operations.

• Distribution Network Optimization


By employing interpolation techniques, the project aims to optimize the dis-
tribution network, calculating optimal distances between harvesting sites, de-
pots, and refineries. This enhances logistics efficiency and minimizes resource
wastage, leading to a more streamlined and sustainable operation.

KKWIEER, Department of Computer Engineering 2023 2


• Cost-Effective Transportation
The creation of cost-effective distribution routes reduces transportation costs
and energy consumption. This is achieved by minimizing the distances trav-
eled and optimizing the routes taken, leading to a more environmentally friendly
operation.

• Sustainable Operations
The project promotes sustainable and environmentally friendly operations by
ensuring timely deliveries and minimizing resource wastage. This aligns with
broader sustainability goals and contributes to a cleaner and more sustainable
future.

• Decision-Making Support
The project incorporates statistical dashboards that seamlessly integrate with
Google Maps, providing visual aids to decode geospatial patterns and identify
efficient distribution pathways. This supports decision-making processes and
provides actionable insights for resource management.

• Raw Material Analysis


The research delves into the identification and analysis of raw materials essen-
tial for biomass production and their transformation into biofuel. This con-
tributes to a better understanding of the resources required and supports the
generation of clean, renewable energy.

• Contribution to Sustainability Goals


The project aligns with broader sustainability goals by supporting the genera-
tion of clean, renewable energy. This contributes to a reduction in greenhouse
gas emissions and mitigates the impacts of climate change, leading to a more
sustainable future.

KKWIEER, Department of Computer Engineering 2023 3


1.3 LITERATURE SURVEY

1. ”Time-series prediction research based on combined prophet-LSTM mod-


els”
Traditional time series forecasting models are complex and have low accu-
racy for non-linear time series.The combined Prophet-LSTM model is a new
approach that improves accuracy and extracts composite features well.The
combined model is simple, efficient, flexible, and highly robust.The combined
model outperforms traditional models in predicting the trend change of tem-
perature.

2. ”Evaluation of ARIMA, Facebook Prophet and a boosting algorithm


framework for monthly precipitation prediction of a semi-arid district of
north Karnataka, India”
Time series prediction of precipitation was carried out using Auto-ARIMA,
ThymeBoost, and Prophet.ThymeBoost is the best model for longer raw data
and medium normalized data.Prophet is the best model for normalized data.Any
model could be used for medium raw, short raw, and short normalized data.Exogenous
variables and seasonal ARIMA may be considered to improve prediction ca-
pability.

3. ”Anomaly Detection in Time Series: A Comprehensive Evaluation”


Detecting anomalous subsequences in time series data is important for many
applications. There are many anomaly detection algorithms, but no compre-
hensive study has compared them. This study evaluates 71 state-of-the-art
anomaly detection algorithms on 976 datasets. The study provides a concise
overview of the techniques and their strengths and weaknesses. The results
should ease the algorithm selection problem and open up new research direc-
tions.

4. ”Time Series Forecasting using Facebook Prophet for Cloud Resource


Management”
Proposed a forecasting model based on Facebook Prophet for Azure VM work-

KKWIEER, Department of Computer Engineering 2023 4


load.Used log transformations and percentile score method to improve accu-
racy.Increased forecasting accuracy by over 85% on average.CPU usage data
is unstable and does not fit the model as well.Neural network combined with
machine learning could further improve accuracy

KKWIEER, Department of Computer Engineering 2023 5


CHAPTER 2

PROBLEM DEFINITION AND SCOPE


2.1 PROBLEM STATEMENT

Existing biomass management systems are inefficient and inaccurate, leading


to resource waste, transportation costs, and environmental harm. This project aims
to address these issues by developing a predictive model, optimizing distribution
networks, and supporting decision-making.

2.1.1 Goals and objectives

1. To develop a predictive model that utilized time series analysis to accurately


forecast future biomass availability by leveraging historical biomass and inte-
grating relevant environmental factors

2. To employ interpolation techniques to determine and efficient distribution net-


work. This involves calculating optimal distances between harvesting sites to
depots and depots to refinery

3. To create economical distribution routes that minimizes transportation costs


and reduces energy consumption leading to timely deliveries along with sus-
tainable and environmentally friendly operation.

4. To encapsulate visual representation using a statistical dashboards seamlessly


integrating google map to decode the geospatial patterns and illustrate efficient
pathways for distribution

5. To identify and analyze the raw material for production and transformation
into biofuel

2.1.2 Assumption and Scope

2.1.2.1 Assumptions

• Data Availability: The project assumes that historical biomass data, environ-
mental data, and relevant information on biomass resources are readily avail-
able for analysis and modeling.

KKWIEER, Department of Computer Engineering 2023 7


• Model Accuracy: The predictive model developed using advanced AI and
time series analysis techniques is assumed to provide accurate forecasts. The
accuracy of the model depends on the quality and representativeness of the
historical data.

• Environmental Factors: The project assumes that the integration of envi-


ronmental factors significantly impacts biomass availability and that relevant
environmental data is accessible.

• Interpolation Techniques: It is assumed that interpolation techniques can


effectively optimize the distribution network by determining optimal distances
between harvesting sites, depots, and refineries.

• Cost-Effective Routes: The project assumes that the creation of cost-effective


distribution routes will result in reduced transportation costs and energy con-
sumption.

• Visual Representation: The assumption is that the integration of statistical


dashboards with Google Maps will effectively provide visual aids for geospa-
tial pattern interpretation and support decision-making processes.

• Raw Material Analysis: It is assumed that the identification and analysis of


raw materials for biomass production and their transformation into biofuel are
feasible and contribute to clean and renewable energy generation.

• Stakeholder Cooperation: The success of the project assumes cooperation


and engagement from relevant stakeholders, including biomass producers, dis-
tributors, and policymakers.

• Environmental Impact: The project assumes that implementing optimized


distribution routes will have a positive impact on reducing the environmental
footprint of biomass management.

• Technological Feasibility: The project assumes that the required technology


for implementing advanced AI, time series analysis, and geospatial integration
is readily available and feasible to implement.

KKWIEER, Department of Computer Engineering 2023 8


2.1.2.2 Scope

• Development of Predictive Model: The project focuses on the development


of a predictive model that utilizes advanced AI and time series analysis tech-
niques. The model will leverage historical biomass data and integrate relevant
environmental factors to accurately forecast future biomass availability.

• Optimization of Distribution Network: The scope includes employing in-


terpolation techniques to optimize the distribution network. This involves cal-
culating optimal distances between harvesting sites, depots, and refineries, en-
hancing logistics efficiency, and minimizing resource wastage.

• Creation of Cost-Effective Distribution Routes: The project aims to create


distribution routes that are cost-effective, reduce transportation costs and en-
ergy consumption, and ensure timely deliveries while promoting sustainable
and environmentally friendly operations.

• Integration of Visual Representation and Geospatial Analysis: The project


involves encapsulating visual representation using statistical dashboards and
integrating geospatial analysis by seamlessly integrating with Google Maps.
This will provide visual aids for decoding geospatial patterns and identifying
efficient distribution pathways.

• Identification and Analysis of Raw Materials: The scope extends to the


identification and comprehensive analysis of raw materials essential for biomass
production and their transformation into biofuel. This contributes to the gen-
eration of clean and renewable energy.

• Testing and Validation: The project includes extensive testing and valida-
tion to ensure the effectiveness of the predictive model, optimized distribution
network, and visual representation in real-world scenarios and diverse envi-
ronmental conditions.

• Stakeholder Engagement: The scope involves engaging and collaborating


with relevant stakeholders, including biomass producers, distributors, and pol-

KKWIEER, Department of Computer Engineering 2023 9


icymakers, to ensure the project’s success and alignment with broader sustain-
ability goals.

2.2 METHODOLOGY

• Data collection and Integration: Compile past biomass and environmental


data, then combine for analysis.

• Time Series Analysis: Develop predictive models and analyze biomass trends
like ARIMA , Prophet

• Distribution Network Optimization: Implement interpolation for effective rout-


ing when optimizing distribution networks along with a supervised learning
classifier KNN (K-Nearest Neighbors) or interpolation

• Decision maker- Empowerment: Implementing GIS dashboards will help decision-


makers manage resources effectively

• Raw Material Analysis: Analyze the raw materials to find the components
needed to convert biomass into biofuel.

2.3 OUTCOME

• Accurate Prediction of Biomass Availability: The development of a pre-


dictive model using advanced AI and time series analysis techniques will pro-
vide accurate forecasts of future biomass availability, leading to better strategic
planning and resource allocation.

• Optimized Distribution Network: The project will result in an optimized dis-


tribution network, with calculated optimal distances between harvesting sites,
depots, and refineries, leading to enhanced logistics efficiency and minimized
resource wastage.

• Cost-Effective Distribution Routes: The creation of cost-effective distribu-


tion routes will reduce transportation costs and energy consumption, ensuring

KKWIEER, Department of Computer Engineering 2023 10


timely deliveries and promoting sustainable and environmentally friendly op-
erations.

• Visual Representation and Geospatial Analysis: The integration of sta-


tistical dashboards with Google Maps will provide visual aids for decoding
geospatial patterns and identifying efficient distribution pathways, supporting
decision-making processes and resource management.

• Comprehensive Raw Material Analysis: The identification and comprehen-


sive analysis of raw materials essential for biomass production and their trans-
formation into biofuel will contribute to the generation of clean, renewable
energy.

• Effective Decision-Making: The project will provide actionable insights and


visual representation, enabling informed decision-making processes and effi-
cient resource management.

• Enhanced Security Measures: The implementation of advanced authentica-


tion techniques will enhance security measures and prevent spoofing attacks
and identity theft.

• User Adoption and Efficiency: The system will be adopted by users effi-
ciently, leveraging its features to support decision-making processes and re-
source management.

• Positive Environmental Impact: The implementation of optimized distribu-


tion routes will have a positive impact on reducing the environmental footprint
of biomass management.

• Technological Feasibility: The successful implementation of advanced AI,


time series analysis, and geospatial integration technologies will demonstrate
the feasibility and effectiveness of the comprehensive system.

KKWIEER, Department of Computer Engineering 2023 11


2.4 TYPE OF PROJECT

The project ”Enhancing Biomass Management Efficiency through Advanced AI and


Time Series Analysis for Accurate Prediction” can be classified as a ”Research Ori-
ented Project” and it falls under multiple domains, including:

• Machine Learning: The project leverages advanced machine learning tech-


niques, including AI and time series analysis, to develop a predictive model
for forecasting future biomass availability. Machine learning is valuable for
tasks like prediction and pattern recognition, which are essential for strategic
planning and resource allocation in biomass management.

• Optimization Techniques: The project employs interpolation techniques to


optimize the distribution network by calculating optimal distances between
harvesting sites, depots, and refineries. Optimization techniques are used to
enhance logistics efficiency, minimize resource wastage, and reduce trans-
portation costs.

• Geospatial Analysis: The project involves the integration of statistical dash-


boards with Google Maps for geospatial pattern interpretation. Geospatial
analysis is used to decode spatial patterns and identify efficient distribution
pathways for biomass resources.

• Data Analysis: The project focuses on the identification and comprehensive


analysis of raw materials essential for biomass production and their transfor-
mation into biofuel. Data analysis is used to understand the resources required
and analyze their transformation process, contributing to the generation of
clean, renewable energy.

• Sustainability and Environmental Studies: The project aims to create cost-


effective distribution routes that reduce transportation costs and energy con-
sumption, promoting sustainable and environmentally friendly operations. It
also aims to reduce the environmental footprint of biomass management.

KKWIEER, Department of Computer Engineering 2023 12


CHAPTER 3

PROJECT PLAN
3.1 PROJECT TIMELINE

A project timeline chart, or Gantt chart, is a visual tool used in project management
to display project tasks and their timing. It shows tasks as bars over time, with their
start and end dates, dependencies, milestones, and progress. It helps plan, track, and
communicate project schedules.

Figure 3.1: Gantt Chart A

Figure 3.2: Gantt Chart B

KKWIEER, Department of Computer Engineering 2023 14


3.2 TEAM ORGANIZATION

The team consist of 3 distinct elements namely, the project mentor, the project
leader, and the team members.

1. Project Mentor
Project Mentor receive regular updates about progress directly from the team
lead. For the team, the team lead is their only point of contact. Here the team
lead and mentor work more closely. Here, Prof. C. R. Patil act as the mentor.

2. Team leader
This individual will co-ordinate all directions from the Mentor. Team lead who
is responsible for guiding the technical aspects of the project. Here, Pranav
Shimpi takes up the role of team lead.

3. Team member(s)
Also known as work horse of the team. The ground implementation is carried
out by them. Here, Vaidehi Patil, Sanket Shirsath, and Sayali Kulkarni forms
part of the team.

3.2.1 Team structure

Name Role
Vaidehi Patil Research, Documentation and Implementation
Pranav Shimpi UI Designing , Documentation and Implementation
Sayali Kulkarni Research, Documentation and Implementation
Sanket Sirsath Backend Implementation and Interface Design

KKWIEER, Department of Computer Engineering 2023 15


CHAPTER 4

SOFTWARE REQUIREMENT
SPECIFICATION
4.1 FUNCTIONAL REQUIREMENTS

• Time Series Analysis (TSA) - Accuracy Enhancement: This functional re-


quirement involves the use of advanced time series analysis techniques to en-
hance the accuracy of predictions related to biomass availability. The goal is to
utilize historical data and relevant environmental factors to make more precise
forecasts.

• Prophet and ARIMA: Prophet and ARIMA (AutoRegressive Integrated Mov-


ing Average) are specific time series analysis algorithms that can be used for
forecasting. Prophet is designed for daily observations that display patterns
on different time scales, while ARIMA is a widely used statistical method for
time series forecasting.

• Interpolation Technique: This requirement involves employing interpola-


tion techniques to optimize the distribution network. This includes calculating
optimal distances between harvesting sites, depots, and refineries to enhance
logistics efficiency and minimize resource wastage.

• Statistical Dashboards: The system must provide statistical dashboards for


visual representation and data analysis. These dashboards should allow users
to analyze data, trends, and patterns, supporting informed decision-making.

• Geospatial Pattern Interpretation: This requirement involves integrating


geospatial analysis, possibly through Google Maps, to interpret spatial pat-
terns related to biomass distribution. The goal is to identify efficient distribu-
tion pathways and support resource management decisions.

• Raw Material Analysis: This requirement involves identifying and analyzing


the raw materials essential for biomass production and their transformation
into biofuel. The analysis should contribute to understanding the resources
required and the transformation process.

KKWIEER, Department of Computer Engineering 2023 17


4.2 NON FUNCTIONAL REQUIREMENTS

• Accuracy: The predictive model can be evaluated using a variety of metrics,


such as mean absolute error and root mean squared error. The model can be
tuned to improve its accuracy by adjusting the parameters of the algorithm and
using more data.

• Reliability: The predictive model and distribution network can be made more
reliable by using redundant components and implementing fail-safe mecha-
nisms. The system can also be monitored for performance and errors to iden-
tify and fix any problems quickly.

• Scalability: The system can be scaled by using cloud computing resources


and distributed databases. The system can also be designed to be modular so
that new components can be added easily.

• Usability: The system can be made more user-friendly by providing clear


instructions and documentation. The system can also be designed with a user-
centered design approach to make it easy to learn and use.

• Security: The system can be made more secure by using encryption, authen-
tication, and authorization mechanisms. The system can also be regularly au-
dited for vulnerabilities to identify and fix any security holes.

4.3 CONSTRAINTS

4.3.1 User Interface Constraints

• The dashboard should have a timeline view to show the historical trend of
biomass production and availability.

• The dashboard should have a forecasting view to show the predicted biomass
production and availability for future periods.

• The GIS should allow users to create and manage custom data layers, such as
biomass management zones.

KKWIEER, Department of Computer Engineering 2023 18


• The GIS should allow users to perform spatial analysis on custom data layers,
such as calculating the biomass potential of a given region.

• The GIS should allow users to export custom data layers to different formats,
such as Shapefile and GeoJSON.

4.3.2 Hardware Constraints

• Ensure ample storage for the dataset and models

• If the framework will be used to deploy the model in production, then more
powerful hardware may be required to handle the load.

4.3.3 Software Constraints

• Develop using appropriate deep learning framework such as TensorFlow or


PyTorch to train and deploy the model.

• The framework must use the Prophet library to forecast time series data

• The framework must be able to connect to a database, such as Firestore, to


store and retrieve biomass data and predictions.

• The framework must be able to generate interactive data visualizations, such


as charts and maps, to display the biomass data and predictions.

4.4 HARDWARE REQUIREMENTS

• Data Collection and Processing Hardware

– Industrial-grade sensors for monitoring biomass parameters such as mois-


ture content, density, calorific value, and growth rates.

– Microcontrollers or microprocessors to interface with sensors, collect


data, and perform pre-processing tasks.

• Edge Devices or Gateways

KKWIEER, Department of Computer Engineering 2023 19


– Edge devices or gateways to aggregate data from multiple sensor nodes
and perform initial data processing, filtering, and cleaning.

– Secure data transmission protocols (e.g., MQTT, AMQP) for sending


processed data to the central server or control system.

• Central Server or Control System

– A high-performance server with sufficient computational resources to


handle large volumes of time-series data.

– Data storage systems (e.g., relational databases, data warehouses) to store


historical and real-time biomass data.

– Machine learning frameworks (e.g., TensorFlow, PyTorch) for training


and running time series-based AI models.

• Data Visualization and User Interface

– Web-based dashboards or visualization tools for real-time monitoring of


biomass data and AI model predictions.

– User interface hardware (e.g., touch screens, tablets) for interacting with
the system and providing decision-making support.

• Security Measures

– Encryption modules and secure authentication mechanisms to protect


data integrity and confidentiality during transmission and storage.

– Network security protocols (e.g., firewalls, intrusion detection systems)


to safeguard the system from cyberattacks.

– Physical security measures to prevent unauthorized access to hardware


components and data storage devices.

• Redundancy and Reliability

– Redundant network paths and communication modules to ensure data


transmission continuity in case of network failures.

KKWIEER, Department of Computer Engineering 2023 20


– Backup servers or failover systems to maintain system availability in case
of hardware failures.

– Regular data backups and disaster recovery plans to minimize data loss
and ensure business continuity.

• Environmental Monitoring

– Additional sensors for monitoring environmental conditions (e.g., tem-


perature, humidity, air quality) in the vicinity of biomass storage or pro-
duction facilities.

– Data integration and analysis to identify potential environmental impacts


of biomass management practices.

4.5 SOFTWARE REQUIREMENTS

• Machine Learning Frameworks:

– TensorFlow: Open-source library for numerical computation using data


flow graphs. Popular for machine learning, including time series fore-
casting.

– PyTorch: Open-source machine learning library known for flexibility and


ease of use.

– Prophet: Time series forecasting library developed by Facebook, de-


signed for data with strong seasonal patterns.

• Data Analysis Libraries:

– Pandas: Python library for data manipulation and analysis, well-suited


for working with time series data.

– NumPy: Python library for numerical computation, fundamental for sci-


entific computing in Python.

– Matplotlib and Seaborn: Python libraries for data visualization, used to


explore and understand time series data.

KKWIEER, Department of Computer Engineering 2023 21


• Geospatial Analysis Libraries:

– GeoPandas: Python library for working with geospatial data in Pandas


dataframes, integrating geospatial data with Pandas capabilities.

– PyGIS: Python library for interacting with geographic information sys-


tems (GIS) software, providing tools for geospatial data processing and
visualization.

• Additional Software Requirements:

– Database: Necessary for storing historical and real-time biomass and en-
vironmental data. Popular choices include PostgreSQL, MySQL, and
SQLite.

– Web Server: Essential for serving data visualizations, prediction results,


and user interfaces. Popular options include Apache, Nginx, and Guni-
corn.

– Deployment Tool: Streamlines deploying software and models to a pro-


duction environment. Popular tools include Ansible, Chef, and Puppet.

4.6 INTERFACES

An interface is a contract or set of rules that defines how different software


components or modules can interact. It specifies the methods, properties, or behav-
iors that must be implemented, allowing for consistent communication and integra-
tion between various parts of a software system.

KKWIEER, Department of Computer Engineering 2023 22


CHAPTER 5

DETAILED DESIGN
5.1 ARCHITECTURAL DESIGN

The architecture diagram of the project shows a high-level overview of the


system’s components and how they interact with each other. The system is composed
of the following main components:

• Data warehouse: The data warehouse stores all of the historical and real-time
data that is used by the system. This data includes biomass data, environmental
data, and distribution network data.

• Data preprocessing: The data preprocessing component cleans and prepares


the data in the data warehouse for use by the predictive model and distribution
network.

• Predictive model: The predictive model uses historical biomass data and en-
vironmental data to forecast future biomass availability.

• Distribution network: The distribution network uses the forecasts from the
predictive model to calculate optimal distribution routes and costs.

• Statistical dashboards: The statistical dashboards provide users with a visual


representation of the data and calculations performed by the system.

The different components of the system interact with each other as follows:

• The data preprocessing component takes data from the data warehouse and
cleans and prepares it for use by the predictive model and distribution network.

• The predictive model takes the preprocessed data and forecasts future biomass
availability.

• The distribution network takes the forecasts from the predictive model and
calculates optimal distribution routes and costs.

• The statistical dashboards take the data and calculations from the other com-
ponents and provide users with a visual representation.

KKWIEER, Department of Computer Engineering 2023 24


The architecture diagram also shows the following additional components:

• Real-time data collection: This component collects real-time data from vari-
ous sources, such as sensors and harvesting equipment.

• Distance interpolator: This component uses interpolation techniques to cal-


culate distances between different locations.

• Geospatial mapping: This component uses Google Maps to provide users


with a visual representation of the distribution network.

Figure 5.1: Block Diagram

5.2 DATA DESIGN

5.2.1 Data structure

• Biomass History Data

– Internal Data Structure


The model uses an internal data structure to store the biomass history
data, such as the:

KKWIEER, Department of Computer Engineering 2023 25


* Index: This is a unique identifier for each record in the table.

* Longitude: This field indicates the longitude of the location where


the biomass data was collected.

* Latitude: This field indicates the latitude of the location where the
biomass data was collected.

* Year: This field indicates the year in which the biomass data was
collected.

* Biomass: This field indicates the quantity of biomass that was avail-
able at the location in the given year.

– Global Data Structure


It maintains a global database of biomass history data, which can be used
to:

* Forecast future biomass availability: Develop a predictive model


using advanced AI and time series analysis techniques to accurately
forecast future biomass availability based on historical data and en-
vironmental factors.

* Optimize distribution routes: Employ interpolation techniques and


optimization algorithms to calculate optimal distances between har-
vesting sites, depots, and refineries, and create cost-effective dis-
tribution routes that minimize transportation costs and energy con-
sumption.

* Assess the environmental impacts of different biomass manage-


ment practices: Analyze the environmental impacts of various biomass
management practices, including transportation, storage, and pro-
cessing, and identify sustainable and environmentally friendly prac-
tices.

– Database Design (Table)

• Distribution Network Data

KKWIEER, Department of Computer Engineering 2023 26


Column Name Data Type Description
Index Integer Unique identifier for the record
Longitude Float Longitude of the location where the
biomass data was collected
Latitude Float Latitude of the location where the
biomass data was collected
Year Integer Year in which the biomass data was col-
lected
Biomass Float Quantity of biomass available at the loca-
tion in the given year (in metric tons)
Table 5.1: Biomass History Table
– Internal Data Structure
The model uses an internal data structure to store the distribution network
data, such as the:

* Locations of harvesting sites, depots, refineries, and power plants

* Distances between these locations

* Image of the distribution network

– Global Data Structure


It maintains a global database of distribution network data, which can be
used to:

* Optimize distribution routes

* Minimize transportation costs

KKWIEER, Department of Computer Engineering 2023 27


– Database Design (Table)

Column Name Data Type Description


Location ID Integer Unique identifier for the distribution net-
work location
Type String Type of distribution network location
(harvesting site, depot, refinery, power
plant)
Address String Address of the distribution network loca-
tion
GPS Coordinates Point GPS coordinates of the distribution net-
work location
Image Blob Image of the distribution network location
Table 5.2: Distribution Network Data Table

• Environmental Data

– Internal Data Structure


The model uses an internal data structure to store the Environmental data,
such as the:

* Weather data (e.g., temperature, precipitation, humidity, etc.)

– Global Data Structure


It maintains a global database of distribution network data, which can be
used to:

* Forecast future biomass availability

* Assess the environmental impacts of different biomass management


practices

KKWIEER, Department of Computer Engineering 2023 28


– Database Design (Table)

Column Name Data Type Description


Latitude Float Latitude of the environmental data record
Longitude Float Longitude of the environmental data
record
Time Datetime Time and date at which the environmental
data was recorded
Temperature Float Temperature in degrees Celsius
Atmospheric Float Atmospheric pressure in hectoPascals
pressure
Humidity Float Humidity in percent
Rainfall Float Rainfall in millimeters
Table 5.3: Distribution Network Data Table

• Raw Material Analysis Database

– Internal Data Structure


Use a distributed database, such as Hadoop or Spark, to store and retrieve
data for a large number of raw material types, measurement types, and
time periods efficiently.

* raw material id: Unique identifier for the raw material type

* measurement type id: Unique identifier for the measurement type

* time period id: Unique identifier for the time period

* value: The value of the measurement

– Global Data Structure


It maintains a global database of raw material analysis data, which can
be used to:

* Use a data warehouse, such as Snowflake or Redshift, to provide a


unified view of the data for all users.

* The global data structure should contain the same columns as the
internal data structure, plus an optional location id column to store
the location where the measurement was taken.

KKWIEER, Department of Computer Engineering 2023 29


– Database Design (Table)

raw material id Integer Unique identifier for the raw material type
measurement type id Integer Unique identifier for the measurement
type
time period id Integer Unique identifier for the time period
value Float The value of the measurement
location id Integer Unique identifier for the location where
(Optional) the measurement was taken
Table 5.4: Raw Material Analysis Table

5.2.2 Database description

• Historical Biomass Dataset: This dataset contains historical biomass data, in-
cluding the location, biomass type, measurement type, time period, and value.
This data can be used to train and evaluate a time series-based AI model to
predict future biomass availability.

• Distance Matrix Dataset: This dataset contains the distances between all
pairs of locations. This data can be used to calculate the cost and time of
transporting biomass between different locations.

• Environment Variables Dataset: This dataset contains environmental data


such as temperature, precipitation, soil type, and other variables for each loca-
tion. This data can be used to improve the accuracy of biomass predictions by
taking into account environmental factors that can affect biomass growth and
availability.

• Raw Material Analysis Dataset: This dataset contains information about the
composition, moisture content, and heating value of different biomass types.
This data can be used to identify the best biomass types for different applica-
tions and to optimize biomass supply chains.

KKWIEER, Department of Computer Engineering 2023 30


5.3 COMPONENT DESIGN/ DATA MODEL

5.3.1 Class Diagram

A class diagram is a type of UML (Unified Modeling Language) diagram used in


software engineering to visually represent the structure and relationships of classes
and objects in a system or software application.

Figure 5.2: Use-Case Diagram

5.3.2 Flow Chart

A flowchart is a graphical representation or diagram that visualizes a process,


system, or algorithm using various symbols and arrows to depict the flow of steps
or activities in a sequential manner. Flowcharts are used in various fields, including
programming, business process analysis, project management, and problem-solving

KKWIEER, Department of Computer Engineering 2023 31


Figure 5.3: Flowchart Diagram

KKWIEER, Department of Computer Engineering 2023 32


CHAPTER 6

EXPERIMENTAL SETUP
6.1 DATA SET

6.1.1 Biomass History

The internal data structure of the biomass history dataset includes essential attributes
such as a unique index, longitude, latitude, year, and biomass quantity collected at
various locations over time. This rich dataset provides a comprehensive repository
of historical biomass information, encompassing diverse locations, biomass types,
measurement methods, and temporal periods. Leveraging this extensive data, an
AI model can be trained and evaluated using time series analysis, enabling predic-
tive insights into future biomass availability. By utilizing the spatial and temporal
dimensions encapsulated in this dataset, the AI model can forecast and anticipate
the availability of biomass, supporting informed decision-making and planning in
various sectors reliant on sustainable biomass resources.

6.1.2 Distance Matrix

The dataset containing distances between all pairs of locations, represented as a 2418
x 2418 matrix, serves as a critical resource for assessing the transportation logistics
involved in moving biomass between different locations. This comprehensive ma-
trix provides insights into the costs and time requirements for transporting biomass
from a source grid block to various destination grid blocks. Notably, the asymme-
try within the matrix reflects nuanced factors such as one-way routes, U-turns, or
other transport-related variables, leading to differing distances for journeys from a
source to a destination and vice versa. Leveraging this detailed spatial informa-
tion, logistical planning and optimization strategies can be developed, considering
the varying distances and directional dependencies for effective biomass transporta-
tion, ultimately aiding in efficient resource allocation and decision-making within
the biomass industry.

KKWIEER, Department of Computer Engineering 2023 34


6.2 TECHNOLOGY USED

6.2.1 Prophet

Prophet is an open-source forecasting tool developed by Facebook. It is designed for


forecasting time series data with daily observations and seasonal patterns. Prophet is
particularly well-suited for data with missing values, outliers, and irregular trends. It
uses an additive model that combines trend, seasonality, and holiday effects to make
accurate forecasts. Prophet is easy to use, with a simple interface that requires min-
imal configuration and tuning. It is available in Python and R, making it accessible
to a wide range of users.

6.2.2 MERN stack

The MERN stack is a collection of technologies used to build web applications.


It stands for MongoDB, Express.js, React, and Node.js. MongoDB is a NoSQL
database that stores data in a flexible, JSON-like format. Express.js is a lightweight
web application framework for Node.js, which is a JavaScript runtime for building
scalable network applications. React is a JavaScript library for building user inter-
faces, and it allows developers to create reusable components for building interactive
web applications. The MERN stack is popular for building full-stack web applica-
tions, as it provides a complete end-to-end solution for web development.

6.2.3 React Native

React Native is a framework for building mobile applications using JavaScript and
React. It allows developers to create native mobile apps for iOS and Android using
a single codebase. React Native provides a set of components that map to native
UI components, allowing developers to create a seamless user experience on both
platforms. React Native also supports hot-reloading, which means that changes to
the code can be instantly reflected in the app without requiring a rebuild. React
Native is popular for its ease of use, performance, and ability to create cross-platform
mobile apps.

KKWIEER, Department of Computer Engineering 2023 35


6.2.4 Interpolation

Interpolation is a mathematical technique used to estimate unknown values within


a range of known data points. It involves constructing a function or a polynomial
that passes through the given data points, and then using this function to estimate the
values at other points within the range. Interpolation is commonly used in computer
graphics, image processing, and numerical analysis to estimate values between dis-
crete data points. There are several types of interpolation methods, including linear
interpolation, polynomial interpolation, and spline interpolation. Linear interpola-
tion is the simplest form of interpolation, where a straight line is drawn between
two known data points to estimate the value at an unknown point. Polynomial in-
terpolation involves fitting a polynomial of a given degree through the known data
points, while spline interpolation uses piecewise polynomials to fit the data points
and provide a smooth curve.

6.2.5 KNN

K-nearest neighbors (Knn) is a machine learning algorithm used for classification


and regression tasks. It is a type of instance-based learning, where the model stores
the training data and uses it to make predictions for new data points. In the Knn
algorithm, a new data point is classified or regressed based on the majority vote
or the average value of its k-nearest neighbors in the training data. The distance
between the data points is usually measured using the Euclidean distance, but other
distance metrics can also be used. The value of k is a hyperparameter that determines
the number of neighbors to consider when making predictions. A small value of k
results in a flexible model that can capture local patterns in the data, but may be
sensitive to noise. A large value of k results in a smooth model that is less sensitive
to noise, but may not capture local patterns in the data. The Knn algorithm is simple
to implement and requires no training, but it can be computationally expensive for
large datasets and may not perform well for high-dimensional data.

KKWIEER, Department of Computer Engineering 2023 36


6.2.6 TensorFlow

TensorFlow is an open-source machine learning framework developed by Google.


It’s designed for building and training machine learning models, including deep neu-
ral networks. TensorFlow is known for its flexibility and scalability, making it a
popular choice for a wide range of applications, from research to production-level
AI systems. It provides a comprehensive ecosystem of tools and libraries for ma-
chine learning, deep learning, and data manipulation, making it a powerful platform
for AI development.

6.2.7 PyCharm

PyCharm is a popular integrated development environment (IDE) for Python pro-


gramming. Developed by JetBrains, it offers a comprehensive set of features and
tools to streamline Python development. PyCharm provides code highlighting, auto-
completion, debugging, and project management features, making it a favorite among
Python developers. It comes in both a free community edition and a paid professional
edition, offering a powerful and user-friendly environment for Python development.

6.2.8 Visual Studio Code

Visual Studio Code (VSCode) is a highly popular, open-source code editor devel-
oped by Microsoft. It is designed to be lightweight, efficient, and capable of sup-
porting a wide array of programming languages including JavaScript, TypeScript,
Python, Java, C++, C#, and more. VSCode boasts an array of key features that cater
to developers’ needs, including an integrated terminal, rich extension support, Git
integration, debugging tools, remote development capabilities, and real-time code
sharing. The integrated terminal in VSCode enables developers to run shell com-
mands and scripts directly from the editor, while its vast ecosystem of extensions
allows developers to enhance its functionality and customize its appearance. Git
integration facilitates version control and collaboration, and the built-in debugger
assists in identifying and rectifying issues in the code. VSCode also supports remote
development, allowing developers to work on remote machines, containers, and vir-
tual machines, and its Live Share extension enables real-time collaborative coding.

KKWIEER, Department of Computer Engineering 2023 37


Overall, Visual Studio Code is a versatile and powerful code editor that offers an ef-
ficient and productive development environment for a wide variety of programming
languages and frameworks.

6.3 PERFORMANCE PARAMETERS

• Accuracy: Accuracy is the proportion of correct predictions among all predic-


tions made.
Accuracy = (TP+TN) / (TP+FP+FN+TN)

• Precision: Precision is the proportion of true positive predictions among all


positive predictions made.
Precision = TP/ TP+FP

• Recall: Recall is the proportion of true positive predictions among all actual
positive instances.
Recall = TP/ TP+ FN

• Error Rate: Error Rate is the proportion of incorrect predictions among all
predictions made.
Error Rate = FP +FN / P+N

6.4 EFFICIENCY ISSUES

1. Data set Size : Size of the data set may vary from one data set to other, so
we might face the scenarios where the model might go the condition of the
Under-fitting or Over-fitting. Therefore defining the minimum and maximum
size of the data set could be important.

2. Storage: With the increase in the size of the data set from the real world im-
ages the issues about the Storage could arise when the model is actually im-
plemented for the real world applications.

3. Scalability : We must Ensure that with the increase in the number of user and
the speed at which system is being used it should still work properly under
those conditions.

KKWIEER, Department of Computer Engineering 2023 38


4. Maintenance : When implemented in the real world Applications, Mainte-
nance is a thing that needs to be done as per the need of the user like , as there
in update in the model used , change in the versions of the framework used
and etc.

5. Resource Constraints: Working within limitations like computing power, mem-


ory, and budget.

6. Model Complexity: Complex models may require more training time, mem-
ory, and processing power. Simplifying model architectures or using model
compression techniques can improve efficiency.

7. Inference Speed: For real-time applications, slow inference can be problem-


atic. Efficient model architectures, quantization, and hardware acceleration
can help improve inference speed.

8. Energy Consumption: High computational demands can lead to increased en-


ergy consumption. Efficient models are crucial for mobile and edge devices
where battery life is a concern.

KKWIEER, Department of Computer Engineering 2023 39


CHAPTER 7

SUMMARY AND CONCLUSION


Biomass management is a complex and challenging task. It is important to
manage biomass efficiently in order to ensure a sustainable supply of biomass for
energy and other purposes. Advanced AI and time series analysis can be used to en-
hance biomass management efficiency by providing accurate predictions of biomass
availability.
This project proposes a data design and data structure for enhancing biomass
management efficiency through advanced AI and time series analysis for accurate
prediction. The proposed data structure is scalable, flexible, and efficient. It can
be used to store and retrieve data for a large number of locations and time periods.
The data structure can also be used to train and deploy AI models to predict future
biomass availability.
The project’s success highlights the potential of advanced AI and time se-
ries analysis techniques in enhancing biomass management efficiency. The compre-
hensive approach adopted by the project has optimized resource allocation, reduced
transportation costs, and promoted sustainability in biomass management and bio-
fuel production. The project’s outcomes will serve as a valuable guide for future
research and development in the field of sustainable energy and resource manage-
ment.
The project has successfully completed its research phase and is currently
in the deployment phase. The developed data design and data structure have been
implemented, and the AI models are currently undergoing training. Additionally,
the application GUI is being designed to provide a user-friendly interface for model
deployment and data visualization.

KKWIEER, Department of Computer Engineering 2023 41


REFERENCES
[1] J. Gao, “Time-series prediction research based on combined prophet-lstm mod-
els,” pp. 143–147, 2022.

[2] C. K. K, S. D. Barma, N. Bhat, R. Girisha, and K. Gouda, “Evaluation of arima,


facebook prophet and a boosting algorithm framework for monthly precipitation
prediction of a semi-arid district of north karnataka, india,” pp. 1–5, 2022.

[3] M. Daraghmeh, A. Agarwal, R. Manzano, and M. Zaman, “Time series forecast-


ing using facebook prophet for cloud resource management,” pp. 1–6, 2021.

[4] D. Ageng, C.-Y. Huang, and R.-G. Cheng, “A short-term household load fore-
casting framework using lstm and data preparation,” IEEE Access, vol. 9,
pp. 167911–167919, 2021.

[5] G. Dudek, P. Pełka, and S. Smyl, “A hybrid residual dilated lstm and exponential
smoothing model for midterm electric load forecasting,” IEEE Transactions on
Neural Networks and Learning Systems, vol. 33, no. 7, pp. 2879–2891, 2022.

KKWIEER, Department of Computer Engineering 2023 43


ANNEXURE A

PLAGIARISM REPORT
ANNEXURE B

PAPER PUBLISHED (IF ANY)


ANNEXURE C

SPONSORSHIP DETAIL (IF ANY)

You might also like