Implementation of Random Forest Algorithm for Air Quality Classification: A Case Study of DKI Jakarta's Air Quality Index
Implementation of Random Forest Algorithm for Air Quality Classification: A Case Study of DKI Jakarta's Air Quality Index
Abstract: Air quality monitoring and classification in urban environments present significant challenges for
environmental management and public health policy. This study implements an optimized Random Forest (RF) algorithm
to classify air quality levels in DKI Jakarta, Indonesia, using the Air Quality Index (AQI) data from 2021. The analysis
incorporates six key pollutants: PM10, PM2.5, NO2, SO2, CO, and O3, with data collected from the Environmental
Management Agency of DKI Jakarta. The RF model was developed using 5000 decision trees with optimized parameters
(mtry=2) and evaluated through stratified sampling with a 70:30 train-test split. The model achieved an exceptional
accuracy of 99.09% with a low Out-of-Bag (OOB) error rate of 2.35%. Feature importance analysis revealed that
particulate matter (PM2.5 and PM10) were the most influential factors, collectively accounting for 78.70% of the model's
decision-making process. The high performance metrics across all air quality categories (Good, Moderate, and Unhealthy)
demonstrate the model's reliability in classification tasks. This research provides insights into environmental monitoring
and policymaking, presenting a framework adaptable to other urban settings. The findings highlight the crucial role of
particulate matter in air quality assessment and suggest targeted strategies for pollution control.. (Abstract)
Keywords: Air Quality Classification, Random Forest, Machine Learning, Air Quality Index, Environmental Monitoring, Jakarta.
How to Cite: Mochammad Junus; Vidorova Nurcahyani; Rachmad Saptono; Nurefa Maulana; Indra Lukmana Putra; Zidan
Fahreza (2025). Implementation of Random Forest Algorithm for Air Quality Classification: A Case Study of DKI
Jakarta's Air Quality Index. International Journal of Innovative Science and Research
Technology, 10(3), 2169-2173. https://doi.org/10.38124/ijisrt/25mar1548
The model achieved an overall accuracy of 99.09% on achieved an accuracy of 98.37% in classifying air quality in
the test dataset, with an Out-of-Bag (OOB) error rate of urban environments.The detailed performance metrics for
2.35%. Based on findings from similar studies, as reported each category are presented in Table 1:
by (Shaziayani et al., 2022), the Random Forest model
Variable Importance Analysis Figure 2 illustrates the contribution of each pollutant to the
One of the key advantages of Random Forest is its classification model:
ability to quantify the relative importance of input variables.