Enhancing Chemical Risk Prediction With ConvLSTM and Machine Learning Application to Environmental Impact
Enhancing Chemical Risk Prediction With ConvLSTM and Machine Learning Application to Environmental Impact
ABSTRACT
MODULES
SYSTEM SPECIFICATION
CONCLUSION
FUTURE ENHANCEMENT
INTRODUCTION
• Air pollution is caused by harmful substances like toxic gases and particulate matter released into the
atmosphere, mainly from industrial activities and transportation.
• Novel chemical industries in urban areas have increased the release of toxic emissions, harming air quality and
threatening public health.
• Traditional methods for predicting air pollution, such as time series analysis, often struggle to capture the
complex, non-linear patterns in air quality data.
• Accurate air quality forecasting is crucial for timely interventions, but existing solutions fail to provide reliable
predictions in dynamic environments.
• This project proposes a ConvLSTM-based deep learning framework that integrates spatial and temporal
features to forecast air quality, offering real-time monitoring, alerts, and location-based pollution visualization
for better regulatory action.
ABSTRACT
Air pollution from toxic gas emissions of novel chemical industries poses
a significant threat to public health and urban air quality. Traditional
prediction methods often fail to capture the complex non-linear patterns of
air pollution, such as PM2.5 concentrations. This project introduces an
advanced deep learning framework using Convolutional Long Short-Term
Memory (ConvLSTM) networks to accurately forecast air quality by
integrating both spatial and temporal data features. The system enables
real-time monitoring of emissions, with automated alerts to regulatory
bodies and data analysis through a cloud-based platform. Additionally, it
offers intuitive pollution visualization through Google Map API, helping
decision-makers take informed actions to protect public health and the
environment. 4
MODULES
1. Air Quality Forecaster Web APP
3. Data Pre-processing
4. Feature Selection
7. AQI Forecasting
The first step involves acquiring training and testing datasets to predict
real-time air pollutant concentrations. These concentrations depend on
meteorological data, traffic, and workday/holiday information.
We collect real-time data for six pollutants (PM2.5, PM10, NO2, SO2, CO,
O3) in Delhi and Agra from the CPCB website every hour. The XML data
is parsed, combined with other data in CSV format, and used to train the
model. Next-hour meteorological and traffic forecasts help predict future
concentrations.
The dataset is split into 80% (28,052 hours) for training and 20% (7,012
7
hours) for testing to evaluate model accuracy.
3. Data Pre Processing
Ensuring data quality and effective representation is crucial for a forecasting model’s
performance and generalizability. Key preprocessing steps include:
Imputation of Missing Data: Over 50% of SPM data was missing, so it was removed.
Missing values in other fields were imputed using second-order polynomial estimation,
which outperformed mean substitution and linear interpolation.
Preprocessing Steps:
Formatting: Unclear or irrelevant attributes were removed, such as "Thal" from the UCI
dataset.
Cleaning: Incomplete and redundant entries were removed.
Sampling: Applied to improve algorithm efficiency.
Outlier Handling: Irregular pollutant data (Aug–Oct 2020) was removed. Power
transformation was used to modify data, making it more robust to noise.
8
4. Feature Selection
Air Quality Features: Includes pollutants like NOx, NO, NO2, PM2.5, PM10, SO2, CO,
NH3, O3, Benzene, Toluene, and Xylene.
ConvLSTM merges CNN’s spatial feature extraction with LSTM’s temporal learning,
making it ideal for Spatio-temporal data like air pollution trends. CNN-LSTM (LRCN)
first extracts features using CNN layers, then processes them with LSTM for AQI
prediction.
Advantages:
Better Coverage – Utilizes data from adjacent monitoring stations to predict pollution in
uncovered areas.
Future Potential – Though applied in an Indian city due to data limitations, the model
10
can be expanded to other locations and time periods.
6. Air Quality Index Prediction
1. The IND-AQI categorizes air quality based on pollutant concentration and its health impact.
Moderate (51-100, Yellow) – Sensitive groups should limit outdoor exercise; keep windows
closed.
Unhealthy for Sensitive Groups (101-150, Orange) – Risk of irritation and respiratory issues;
avoid outdoor exertion.
Unhealthy (151-200, Red) – Increased heart/lung risks; everyone should limit outdoor exposure.
Very Unhealthy (201-300, Purple) – Severe health effects; sensitive groups should stay indoors.
Hazardous (301-500+, Maroon) – High risk for all; avoid outdoor activities, use air purifiers, and12
wear masks.
8. AQI Google Map Visualization
HOME PAGE
ADMIN PAGE
14
LITERATURE SURVEY-2
Training Phase
Data Analysis
15
LITERATURE SURVEY-2
PRE-PROCESSING DATA
MISSING DATA
16
LITERATURE SURVEY-2
CLUSTRING
17
LITERATURE SURVEY-2
YEARLY DATA
18
SYSTEM SPECIFICATION
• Hardware Specification
i. Processors : Intel® Core™ i5 processor or higher
ii.RAM : 8 GB or more
iii.Disk Space : 320 GB or more
iv.Operating Systems : Windows® 10/11c
26
SYSTEM SPECIFICATION
• Software Specification
i. Language : Python 3.7.4 (64-bit or 32-bit)
ii.Design : HTML, CSS, Bootstrap
iii.IDE : IDLE, PyCharm, or VS Code
iv.Web Framework : Flask 1.1.1
v.Database : MySQL 5
vi.Web Server : Wampserver 2i (for local deployment)
vii.API : Google Maps API for pollution visualization
viii.Packages :TensorFlow,Keras,Pandas,Scikit-learn,Matplotlib/ Seaborn
27
Conclusion
IN CONCLUSION, THE PROPOSED SYSTEM FOR PREDICTING AND LOCALIZING TOXIC EMISSIONS
FROM NOVEL CHEMICAL INDUSTRIES OFFERS AN INNOVATIVE AND INTEGRATED SOLUTION FOR
ADDRESSING URBAN AIR QUALITY CHALLENGES. THE DEVELOPMENT FOCUSES ON USING
ADVANCED DEEP LEARNING TECHNIQUES, PARTICULARLY THE CONVLSTM MODEL, TO PREDICT AIR
QUALITY ACCURATELY BY CAPTURING BOTH SPATIAL AND TEMPORAL POLLUTION PATTERNS. THE
SYSTEM'S ABILITY TO MONITOR TOXIC GAS EMISSIONS IN REAL-TIME, COMBINED WITH CLOUD-
BASED DATA ANALYSIS AND GOOGLE MAPS API INTEGRATION, ENSURES PROACTIVE MANAGEMENT
AND INFORMED DECISION-MAKING. THE AUTOMATED ALERT MECHANISM FOR THE POLLUTION
CONTROL BOARD ENHANCES REGULATORY RESPONSE TIMES, HELPING TO MITIGATE THE HEALTH
RISKS ASSOCIATED WITH INDUSTRIAL EMISSIONS. ADDITIONALLY, THE SCALABILITY AND
EFFICIENCY OF THE SYSTEM MAKE IT ADAPTABLE TO VARIOUS URBAN SETTINGS, ALLOWING FOR
BROAD IMPLEMENTATION. THIS PROJECT SUCCESSFULLY COMBINES MACHINE LEARNING, REAL-
TIME MONITORING, AND INTERACTIVE VISUALIZATION, PROVIDING A COMPREHENSIVE SOLUTION
FOR URBAN AIR QUALITY MANAGEMENT. ITS IMPACT ON PUBLIC HEALTH AND ENVIRONMENTAL
SUSTAINABILITY IS SIGNIFICANT, AS IT SUPPORTS TIMELY INTERVENTIONS AND LONG-TERM
IMPROVEMENTS IN AIR QUALITY. AS THE SYSTEM CONTINUES TO EVOLVE, IT HAS THE POTENTIAL
TO CONTRIBUTE SIGNIFICANTLY TO URBAN PLANNING AND ENVIRONMENTAL POLICY, CREATING
HEALTHIER, MORE SUSTAINABLE CITIES.
FUTURE ENHANCEMENT