0% found this document useful (0 votes)
13 views5 pages

TIJER2306218

This document presents a novel machine learning technique for monitoring air pollution using various algorithms such as Linear Regression, Support Vector Machine, Bagging, and Random Forest. The project aims to classify and track air pollutants to provide real-time data for public health and policy decisions. Analysis shows that the Random Forest algorithm yields the highest accuracy in predicting pollutant levels based on historical data collected from sensors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views5 pages

TIJER2306218

This document presents a novel machine learning technique for monitoring air pollution using various algorithms such as Linear Regression, Support Vector Machine, Bagging, and Random Forest. The project aims to classify and track air pollutants to provide real-time data for public health and policy decisions. Analysis shows that the Random Forest algorithm yields the highest accuracy in predicting pollutant levels based on historical data collected from sensors.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

TIJER || ISSN 2349-9249 || © June 2023 Volume 10, Issue 6 || www.tijer.

org

A Novel Machine Learning technique for Air


Pollution Detection
Sivamurugan S 1, Vijayashree K 2, Nithya P 3,Vedapriya N 4
2 3 4
1 Assistant Professor, Student, Student, Student
Department of Artificial Intelligence and Data Science,
1
1
Sri Sairam Engineering College, West Tambaram, Chennai, India 1
Abstract
The objective of the air quality detection project is to monitor and measure the quality of air in a particular area using sensors and
other data sources. The project aims to classify and track the level of pollutants in the air, such as particulate matter, carbon
monoxide, and ozone, and provide real-time data on air quality to the public. The project can help individuals and organisations
make decisions about their exposure to air pollution, and can also inform policy decisions related to air quality improvement. Since
, the amount of air pollution increases in cities day by day it surpassed the government set air quality index value which causes
heavy damage to human life as well as the environment around us. This proposal mainly focuses on which air pollutant has more
impact on human life in future based on past collected data . Machine learning algorithms such as Linear Regression , Support
Vector Machine , Bagging and Random Forest are used to train the collected dataset to know best accuracy. Graph is plotted based
on the accuracy level of all four algorithms in Microsoft Excel . Another graph is plotted to detect the air pollutant which creates
more harmful situations in future .This analysis helps us to update the people to protect the public health and to maintain
environmental sustainability and also to create awareness about global climatic changes .

Keywords:
Machine Learning Algorithms, Air Pollutant , Pollution , Weka , Excel , Graph , Accuracy , Human Health , Sustainable
Environment .

1.Introduction
In this modern era of advancements and urbanization, one of the most crucial problems in society is air pollution. Air pollution is
caused by any physical, chemical or biological agents that change the characteristics of the natural form of atmosphere. It is a
pressing global issue that poses significant risk to human health and ecosystem and the overall well-being of the planet. Household
combustion devices, automobile smoke emission, industries and forest fires are the most common sources of air pollution that
release Carbon monoxide, Carbon dioxide, Nitrogen dioxide, Sulphur oxide, Chlorofluorocarbons, Particulate Matter, and other air
pollutants that cause air pollution into the environment. WHO data show that almost 99% people are breathing air that crosses the
WHO guideline limits and is exposed to large amounts of pollutants. The low and middle income countries are found to be affected
the most.During several billion years of chemical and biological evolution, the composition of earth’s atmosphere has changed.
Ambient air quality standards are permissible exposure of all living and nonliving things for 24 hours per day, 7 days per week.

Air pollution poses significant damage to both humans and the environment, so monitoring pollutants level is crucial. We can do
this with the help of Machine Learning models. Machine learning is a subset of Artificial Intelligence that helps the computer to
learn how to build models based on training data. Machine Learning can inspect a wide range of data and recognize particular
trends and patterns. Machine learning is the ability given to a computer program to do a task without any external programming
and this task is achieved by using some statistical and advanced mathematical algorithms. Machine can be considered to be
learning if it can gain experience by doing certain tasks and develop its performance in doing similar tasks in the future. There are
essentially three types of Machine Learning: Supervised learning, Unsupervised learning and Reinforcement learning. The four
main Machine Learning Algorithms used for training the dataset in this project are Linear Regression, Support Vector Machine
(SVM), Bagging and Random Forest. The pollutant levels in a location can be collected with the help of sensors and the dataset can
be used to train the Machine Learning model.
2.Literature Survey
In [1] authors proposed that the Machine Learning models are showing very good accuracy and efficiency in terms of training the
model. Only Machine Learning models can handle and train the rigorous dataset collected with advanced techniques and sensors.
The Machine Learning algorithm KNN is showing accuracy of 99.1071% in their air pollution prediction.
In [2] the authors concluded their work by saying that concentration of air pollutants in ambient air is governed by the various
parameters such as wind speed, wind direction, relative humidity, and temperature. Air Quality Index(AQI), is used to measure the
quality of air. The proposed work is a supervised learning approach using various algorithms such as LR, SVM, DT and RF. The
result has shown that AQI predictions obtained through RF are promising and which are analysed with results.
In [3] the authors intend to develop models based on past data and use them to make future decisions. The future is evaluated or
forecasted in accordance with the past. The Time series supplements an additional time order dependence among observations.
This dependency provides both a knowledge source and a knowledge barrier. According to the authors of this review, the majority
of research has concentrated on evaluating or forecasting the AQI and pollutant concentration levels, which will provide a precise
idea of AQI. Several researchers opt for Artificial Neural Network (ANN), ARIMA Model, Linear Regression, and Logistic
Regression for forecasting of AQI and air pollutants concentration. When protruding the AQI or the subsequent concentration level
of several pollutants, the future needs may take attributes into the picture , including meteorological framework and air
contaminants. As the data switches at particular periods of time, it is also possible to use real-time data analysis through the cloud
to get better outcomes for increased performance.

TIJER2306218 TIJER - INTERNATIONAL RESEARCH JOURNAL www.tijer.org 839


TIJER || ISSN 2349-9249 || © June 2023 Volume 10, Issue 6 || www.tijer.org
In [4] the authors have examined the application of machine learning algorithms in the forecasting and prediction of air pollution.
The review has also analysed the proliferation of pollutants and their effects and level of concentration in the places away from the
source. The dispersion module has made use of the Gaussian air dispersion model that was carried out using python in spyder IDE.
For air pollution forecasting and prediction, different machine learning algorithms were applied on the data and differentiated with,
which comprise Random Forest, Multi-layer Perceptron, K-Nearest Neighbour, Support Vector Regression, and Multi-linear
Regression. The outcome of the result confirms that the Multi-layer Perceptron algorithm has shown to give least mean squared
error compared to the other machine learning algorithms. Their work suggests that Future work can be done in comparing several
air dispersion models to predict and analyse the spread of air pollution.
In [5] the authors present a spatial temporal predictive model to overcome the limitation where it conducted several experiments
using different models.The data set provides an information on the city NO2,O3,SO2 levels for through 10 years.Relation between
the pollutants to their geographical locations translates the problem into a classification issue.In order to predict the continuous
values,it have used SVM, SVR, LSTM,ARIMA model, k-means clustering algorithms and determined a low cost-complexity
combination of models.
In [6] the authors proposed a system that will focus on the monitoring of air pollutants with the combination of IOT with a machine
learning algorithm called Recurrent Neural Network, specifically Long short term memory.In this paper they monitor the air
quality with the help of IOT devices.The data used in this work is collected from the DHT11 sensor for generating real-time digital
temperature and humidity.The system utilises air sensors to detect and transmit this data to microcontroller. Then the
microcontroller stores the data into the web server. For predicting the LSTM is implemented.
In [7] the authors done a research to detemine the air quality index and their conclusion is various researchers collect the data set
from the kaggle repository and air quality monitoring sites and divided into two training and testing where used various machine
learning algorithm are compared irrespective of pollutants.The algorithms are Linear Regression, Decision tree,Random forest,
Artificial Neural Network and Support Vector Machine.In this paper they compared the analysis of result obtained by various
researchers with various algorithms had taken meteorological data like temperature,wind speed,humidity in predicting accurately
the upcoming pollutant level.
In [8] the authors developed powerful machine training techniques to prevent air pollution.We discussed the use of pollution
estimation machine- learning algorithms and the Indian air quality index in turn (AQI).We noted that that the decision tree
Algorithm gave the best result among all the algorithms, with an overall accuracy of 99.8%.The number of model parameters and
optimized output was reduced with structure regularization ,which in turn alleviated model complexity.
In [9] the authors predict the two pollutants concentration NOx and CO in industrial sites by the use of a nonlinear Auto Regressive
model (NARX) based Artificial Neural Network (ANN). Database used to train the neural network corresponds to historical time
series of meteorological variables (wind speed, wind direction, temperature and relative humidity) and concentrations of pollutants
in the petrochemical plant of Skikda site. The estimation performance is determined using the Roots Mean Square Error (RMSE)
and Mean Absolute Error (MAE). Results will show the importance of the meteorological variable set on the prediction of
pollutants concentrations and the neural network efficiency.
3.Machine Learning Algorithms used to train the collected Dataset
After preprocessing the collected dataset , many machine learning algorithms are applied to train the data. Algorithms such as
Linear Regression , Support Vector Machine (SVM) , Bagging and Random Forest are used to train the data in our project. We
make use of Weka Software to train the Machine Learning Algorithms.
In Linear regression , Nitric Oxide (NO) pollutant shows Correlation coefficient of 0.7057 , Mean Absolute Error (MAE) of 2.484 ,
Root Mean Squared Error (RMSE) of 3.9591 , Relative Absolute Error (RAE) of 72.8638% and Root Relative Squared Error
(RRSE) of 70.6707%. Nitrogen Dioxide(NO2) pollutant shows Correlation coefficient of 0.7065 , Mean Absolute Error (MAE) of
3.7507 , Root Mean Squared Error of 6.19 , Relative Absolute Error of 74.0614% and Root Relative Squared Error of 70.6159%.
Carbon Monoxide (CO) shows Correlation coefficient of 0.1758 , Mean Absolute Error of 0.9449 , Root Mean Squared Error of
1.8374, Relative Absolute Error of 105.766% and Root Relative Squared Error of 98.4775% . Sulphur Dioxide (SO2) shows
Correlation coefficient of 0.1801 , Mean absolute Error of 3.508 , Root Mean Squared Error of 5.0338 , Relative Absolute Error of
98.153% and Root Relative Squared Error of 98.3077% . Ozone(O3) shows Correlation coefficient of 0.0941, Mean Absolute
Error of 12.2493 , Root Mean Squared Error of 15.1453 , Relative Absolute Error of 100.8507% and Root Relative Squared Error
of 99.4526%.
Under Support Vector Machine (SVM) , Nitric Oxide (NO) pollutant shows Correlation coefficient of 0.7169 , Mean Absolute
Error (MAE) of 2.3729 , Root Mean Squared Error (RMSE) of 3.9709 , Relative Absolute Error (RAE) of 69.6057% and Root
Relative Squared Error (RRSE) of 70.8818%. Nitrogen Dioxide(NO2) pollutant shows Correlation coefficient of 0.7056 , Mean
Absolute Error (MAE) of 3.6413 , Root Mean Squared Error of 6.3438 , Relative Absolute Error of 71.9003% and Root Relative
Squared Error of 72.3701%. Carbon Monoxide (CO) shows Correlation coefficient of 0.1272 , Mean Absolute Error of 0.7592 ,
Root Mean Squared Error of 1.8955, Relative Absolute Error of 84.9777% and Root Relative Squared Error of 101.587% . Sulphur
Dioxide (SO2) shows Correlation coefficient of 0.1664 , Mean absolute Error of 3.1706 , Root Mean Squared Error of 5.2839 ,
Relative Absolute Error of 88.7125% and Root Relative Squared Error of 103.1914% . Ozone(O3) shows Correlation coefficient of
-0.1613 , Mean Absolute Error of 12.1523 , Root Mean Squared Error of 15.5177, Relative Absolute Error of 100.0525% and Root
Relative Squared Error of 101.8978%.
In Bagging , Nitric Oxide (NO) pollutant shows Correlation coefficient of 0.7699 , Mean Absolute Error (MAE) of 2.2347 , Root
Mean Squared Error (RMSE) of 3.5658 , Relative Absolute Error (RAE) of 65.5517% and Root Relative Squared Error (RRSE) of
63.6497%. Nitrogen Dioxide(NO2) pollutant shows Correlation coefficient of 0.7041 , Mean Absolute Error (MAE) of 3.5376 ,
Root Mean Squared Error of 6.2114 , Relative Absolute Error of 69.8529% and Root Relative Squared Error of 70.8598%. Carbon
Monoxide (CO) shows Correlation coefficient of 0.3985 , Mean Absolute Error of 0.8359 , Root Mean Squared Error of 1.7193,
Relative Absolute Error of 92.4473% and Root Relative Squared Error of 92.1459% . Sulphur Dioxide (SO2) shows Correlation
coefficient of 0.2524 , Mean absolute Error of 3.376 , Root Mean Squared Error of 4.9515 , Relative Absolute Error of 94.461%
and Root Relative Squared Error of 96.7001% . Ozone(O3) shows Correlation coefficient of 0.3709 , Mean Absolute Error of
11.1896 , Root Mean Squared Error of 14.1306, Relative Absolute Error of 92.1257% and Root Relative Squared Error of
92.7893%.

TIJER2306218 TIJER - INTERNATIONAL RESEARCH JOURNAL www.tijer.org 840


TIJER || ISSN 2349-9249 || © June 2023 Volume 10, Issue 6 || www.tijer.org
Under Random Forest , Nitric Oxide (NO) pollutant shows Correlation coefficient of 0.7776 , Mean Absolute Error (MAE) of
2.1459 , Root Mean Squared Error (RMSE) of 3.5375 , Relative Absolute Error (RAE) of 62.9452% and Root Relative Squared
Error (RRSE) of 63.145%. Nitrogen Dioxide(NO2) pollutant shows Correlation coefficient of 0.7086 , Mean Absolute Error
(MAE) of 3.3567 , Root Mean Squared Error of 6.1755 , Relative Absolute Error of 66.2814% and Root Relative Squared Error of
70.4507%. Carbon Monoxide (CO) shows Correlation coefficient of 0.4544 , Mean Absolute Error of 0.7981 , Root Mean Squared
Error of 1.6642, Relative Absolute Error of 89.3303% and Root Relative Squared Error of 89.1936% . Sulphur Dioxide (SO2)
shows Correlation coefficient of 0.2464 , Mean absolute Error of 3.2566 , Root Mean Squared Error of 5.0585 , Relative Absolute
Error of 91.1204% and Root Relative Squared Error of 98.7898% . Ozone(O3) shows Correlation coefficient of 0.429 , Mean
Absolute Error of 10.865 , Root Mean Squared Error of 13.8247, Relative Absolute Error of 89.4537% and Root Relative Squared
Error of 90.7809%.

4.Analysis based on Trained Machine Learning Algorithm

Analysing the above graphs ,in majority cases Random Forest holds the lowest error rate in Mean Absolute Error(MAE) , Root
Mean Squared Error (RMSE), Relative Absolute Error (RAS) and Root Relative Squared Error (RRSE) . We know that accuracy
and error rate are inversely proportional . As a result, the Random Forest algorithm has a higher accuracy rate compared with
Linear Regression , Bagging and Support Vector Machine(SVM). Therefore , we can conclude that Random Forest is the best
algorithm by training the collected dataset.

5.Role of Microsoft Excel in Analysing the Air Pollutant data:


Dataset collected from Kaggle of year 2015-2020 air pollutants monitored in Chennai . The data holds the air pollutants such as
Nitric Oxide(NO) , Nitrogen Dioxide (NO2), Carbon Monoxide (CO) , Sulphur Dioxide(SO2) and Ozone(O3) generated in
Chennai city . The average pollutant cause was calculated for all the air pollutants for every year. The calculated values are
tabulated below.

TIJER2306218 TIJER - INTERNATIONAL RESEARCH JOURNAL www.tijer.org 841


TIJER || ISSN 2349-9249 || © June 2023 Volume 10, Issue 6 || www.tijer.org
From the tabulations , it is observed that the consumption of ozone ( O3 ) increases every year but there was a fall in the year 2020
. O3 is released by a huge number of vehicles , factories and other industrial activities . Since , there was a lockdown in Chennai
due to corona 2.0 restriction , usage of Ozone was reduced a bit in 2020. Ozone can damage the tissues of the respiratory system
causing high irritation and inflammation . It also reduces the volume of air that the lungs breathe in and causes shortness of breath
. Ozone layer depletion causes increased ultraviolet radiation at Earth’s surface . It affects human health severely such as skin
cancer , immune deficiency disorders and also eye cataracts . Elevated levels of Ozone (O3) leads to reduced agricultural crop and
commercial forest yields , reduced growth and survivability of saplings and increased susceptibility to diseases and pests .
From this analysis , our project aims to inform both the government and public to give awareness about the high consumption of
ozone and its effects on both human life and environmental sustainability . So that the society is able to prevent themselves from
the impacts of ozone and other harmful air pollutants .

6.Effects and Causes of Air Pollutants

[1] Nitric Oxide (NO) and Nitrogen Dioxide (NO2):


Effects : NO and NO2,collectively known as nitrogen oxides (NOx), contribute to the formation of smog and acid rain.They can
irritate the respiratory system which cause symptoms such as coughing and wheezing.
Causes : The primary sources of NO and NO2 are combustion processes such as fossil fuel combustion in power plants and
vehicles.

[2] Sulfur Dioxide (SO2):


Effects : SO2 is a major contributor to the formation of acid rain which can have effects on ecosystems and Short term exposure
to high levels of SO2 can cause respiratory symptoms, such as difficulty in breathing ,and aggravate existing respiratory conditions.
Causes : The combustion of fossil fuels, particularly coal and oil, in power plants, industrial processes, residential heating, is the
primary source of SO2 emissions.Volcanic eruptions also release significant amounts of SO2 into the atmosphere.

[3] Carbon Monooxide(CO):


Effects : CO is a poisonous gas that interfaces with the delivery of oxygen in the body. High levels of CO which lead to headache
,dizziness ,nausea and even death in several cases.Individuals with cardiovascular diseases are easily affected by CO.
Causes : Incomplete combustion of fossil fuels such as gasoline,diesel and natural gas in vehicles ,power plants , and residential
heating system

[4] Ozone (O3):


Effects : O3 is a primary component of smog and cause respiratory problems,such as coughing,throat irritation , and difficulty in
breathing.
Causes : Ground-level ozone is formed when nitrogen oxides (NOx) and volatile organic compounds (VOCs)
react in the presence of sunlight. The primary sources of these pollutants include vehicle emissions, industrial processes, gasoline
vapors, and certain consumer products.

TIJER2306218 TIJER - INTERNATIONAL RESEARCH JOURNAL www.tijer.org 842


TIJER || ISSN 2349-9249 || © June 2023 Volume 10, Issue 6 || www.tijer.org
7.Block Diagram:

8.Conclusion
In conclusion, our project demonstrates that machine learning models can be used to forecast air quality with a high degree of
accuracy. By analysing historical air quality data, we were able to develop a model that can predict air quality in real-time. This
model has the potential to help individuals and organisations take informed action to reduce their exposure to harmful pollutants
and improve public health. Further research can be done to improve the accuracy of the model and to explore other applications of
machine learning in environmental science.

9.Reference
[1] Deepu B P, Dr. Ravindra P Rajput, “Air Pollution Prediction using Machine Learning”,International Research Journal of
Engineering and Technology (IRJET),Volume: 09 Issue: 07 | July 2022
[2] Madhuri VM, Samyama Gunjal GH, Savitha Kamalapurkar, “Air Pollution Prediction Using Machine Learning Supervised
Learning Approach”,INTERNATIONAL JOURNAL OF SCIENTIFIC & TECHNOLOGY RESEARCH VOLUME 9, ISSUE 04,
APRIL 2020.
[3] Vidit Kumar, Sparsh Singh, Zaid Ahmed, Ms. Nikita Verma, “Air Pollution Prediction using Machine Learning Algorithms: A
Systematic Review”,International Journal of Engineering Research & Technology (IJERT),Vol. 11 Issue 12, December 2022
[4] Shreyas Simu∗,Varsha Turkar∗, Rohit Martires∗, Vranda Asolkar∗, Swizel Monteiro∗, Vaylon Fernandes∗, and Vassant
Salgaoncar, “Air Pollution Prediction using Machine Learning”, ETC Department, Don Bosco College Engineering, Fatorda, Goa,
India
[5] K. Rajakumari, V. Priyanka,“Air Pollution Prediction in Smart Cities by using Machine Learning Techniques”, 2020,
International Journal of Innovative Technology and Exploring Engineering (IJITEE), Volume 9, Issue 05.
[6] Ayele, Temesgen Walelign, and RutvikMehta.”Air pollution monitoring and prediction using IoT.” In 2018 Second
International Conference on Inventive Communication 6 Fig. 12. RH w.r.t Temperature Fig. 13. RH w.r.t CO and Computational
Technologies (ICICCT), pp. 1741-1745. IEEE,2018.
[7] Venkat Rao Pasupuleti, Uhasri , Pavan Kalyan, “Air Quality Prediction Of Data Log By Machine Learning”, 2020 , IEEE
[8] SriramKrishna Yarragunta, Mohammed Abdul Nabi, Jeyanthi.P, “Prediction of Air Pollutants Using Supervised Machine
Learning”, 2021, IEEE
[9] NadjetDjebbri, and MouniraRouainia. ”Artificial neural networks based air pollution monitoring in industrial sites.” In 2017
International Conference on Engineering and Technology (ICET), pp. 1-5. IEEE,2017.
[10] Jiang, Ningbo, and Matthew L. Riley. ”Exploring the utility of the random forest method for forecasting ozone pollution in
SYDNEY.” Journal of Environment Protection and Sustainable Development 1.5 (2015): 245-254

TIJER2306218 TIJER - INTERNATIONAL RESEARCH JOURNAL www.tijer.org 843

You might also like