A novel ensemble machine learning method for accurate air quality prediction

Emeç M., Yurtsever M.

International Journal of Environmental Science and Technology, 2024 (SCI-Expanded) identifier

  • Publication Type: Article / Article
  • Publication Date: 2024
  • Doi Number: 10.1007/s13762-024-05671-z
  • Journal Name: International Journal of Environmental Science and Technology
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Agricultural & Environmental Science Database, Aqualine, Aquatic Science & Fisheries Abstracts (ASFA), Biotechnology Research Abstracts, CAB Abstracts, Compendex, Environment Index, Geobase, INSPEC, Pollution Abstracts, Veterinary Science Database
  • Keywords: Air quality, Ensemble model, Machine learning, Regression
  • Istanbul University Affiliated: Yes


Air pollution continues to be an important problem that causes health issues worldwide. Factors such as industrial development, increased vehicle traffic, and energy production have a negative impact on air quality by releasing harmful gases and particles into the atmosphere. Consequently, this can lead to respiratory diseases, cardiovascular problems, and other health complications. Predicting air quality is a crucial step in safeguarding human health and informing environmental policies. Many cities employ measurement instruments and data collection systems to monitor and forecast air quality. This data can be analyzed using machine learning models to predict future air pollution levels. This article examines the performance of a new stacking ensemble model for estimating PM2.5, based on air quality datasets from major cities such as Beijing and Istanbul. The model combines predictions from various machine learning models. In the initial stage of the study, the performance of commonly used models in the literature, such as multi-layer perceptron, support vector regression, and random forest, were evaluated. These models were assessed for their ability to predict PM2.5 using metrics such as mean absolute error (MAE), root mean squared error (RMSE) and R-squared (R2). This evaluation determines the proximity of the model predictions to the actual data. The stacking ensemble model examined in this study yielded the best results for PM2.5 predictions, with MAE of 6.67, RMSE of 8.80 and R2 of 0.91. In conclusion, the stacking ensemble model for air pollution prediction offers a promising approach for achieving superior results compared to traditional machine learning models.