Forecasting PM2.5 Concentrations Using a Transformer Model
Abstract:
Air pollution caused by fine particulate matter (PM2.5) has become a serious environmental problem in many large urban areas, especially in rapidly urbanizing regions. Accurate prediction of PM2.5 concentrations plays an important role in air quality management and the development of early warning systems for air pollution. This study evaluates the applicability of machine learning and deep learning approaches for forecasting PM2.5 concentrations using time-series data combined with meteorological variables. The dataset includes PM2.5 concentrations together with meteorological variables such as temperature, relative humidity, and wind speed collected in Hanoi. Data preprocessing steps include outlier detection using the Interquartile Range (IQR) method, data normalization using the Z-score approach, and the construction of time-series features. Several forecasting models were implemented and compared, including ARIMA, Random Forest, LSTM, GRU, and Transformer models. The experimental results show that deep learning models outperform traditional statistical approaches in PM2.5 prediction. Among the evaluated models, the Transformer model achieved the best performance with lower prediction errors and a better ability to capture temporal variations in PM2.5 concentrations. The results demonstrate the potential of deep learning techniques for air quality forecasting and provide a scientific basis for developing early warning systems for air pollution in large urban areas.