PREDICTING FLIGHT DELAY USING RANDOM FOREST ALGORITHM, XGBOOST ALGORITHM, AND STACKING ENSEMBLE METHOD
Abstract
Flight delays are problematic for both passengers and airlines. With the increasing amount of flight traffic volume, time punctuality is important since it significantly influences passengers’ satisfaction and airline companies' financial performance. Many studies have been conducted to predict these delays by using machine learning algorithms. In some research, it was found that combining more than one machine learning algorithm can improve the prediction results. Therefore, in this research, a comparison of machine learning ensemble methods like bagging, boosting, and stacking to predict flight delays is compared. The objective of this research is to find the best-performing ensemble method for flight delay prediction. A dataset from Kaggle named ‘Flight Status Prediction’ is used as the dataset for this research. Then, the dataset is cleaned and modified using the preprocessing steps. After that, the dataset is fitted to each ensemble model using the Random Forest algorithm as the bagging method, the Extreme Gradient Boosting (XGBoost) algorithm as the boosting method, and combining both algorithms using the stacking method with Random Forest as the first learner, and the results are evaluated based on the accuracy, recall, and precision values. The results are gotten from two different dimensional reduction methods, which are feature selection and principal component analysis (PCA). The results obtained from this study are that the XGBoost model performs best on predicting flight delays with a mean average accuracy of above 95% in both dimensionality reduction methods, while the Stacking Ensemble method performs the worst with a mean accuracy of less than 92% in both dimensionality reduction methods.
Keywords
Full Text:
PDFReferences
IATA Sustainability & Economics. Air Passenger Market Analysis January 2024 Resilient industry-wide growth brings global traffic to near recovery. IATA, https://www.iata.org/en/iata-repository/publications/economic-reports/air-passenger-market-analysis-january-2024/ (2024).
Lu M, Wei P, He M, et al. Flight Delay Prediction Using Gradient Boosting Machine Learning Classifiers. https://doi.org:/10.32604/jqc.2021.016315
Airline Flight Delay Prediction Using Machine Learning Models, https://dl.acm.org/doi/fullHtml/10.1145/3497701.3497725 (accessed 4 April 2024).
Seongeun Kim EP. Prediction of flight departure delays caused by weather conditions adopting data-driven approaches. Journal of Big Data, https://journalofbigdata.springeropen.com/articles/10.1186/s40537-023-00867-5 (2024, accessed 4 April 2024).
Mienye ID, Sun Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access 2022; 10: 99129–99149. https://doi.org/10.1109/access.2022.3207287
Li Y, Chen W. A Comparative Performance Assessment of Ensemble Learning for Credit Scoring. Mathematics 2020; 8: 1756. https://doi.org/10.3390/math8171756.
Wang X, Wang Z, Wan L, et al. Prediction of Flight Delays at Beijing Capital International Airport Based on Ensemble Methods. Applied Sciences 2022; 12: 10621. https://doi.org/10.3390/app122110621.
Sahoo R, Pasayat AK, Bhowmick B, et al. A hybrid ensemble learning-based prediction model to minimise delay in air cargo transport using bagging and stacking. International Journal of Production Research 2022; 60: 644–660. https://doi.org/ 10.1080/00207543.2021.1915196.
Horiguchi Y, Baba Y, Kashima H, et al. Predicting Fuel Consumption and Flight Delays for Low-Cost Airlines. Proceedings of the AAAI Conference on Artificial Intelligence 2017; 31: 4686–4693. https://doi.org/10.1609/aaai.v31i1.11332
Zhang Y, Liu J, Shen W. A Review of Ensemble Learning Algorithms Used in Remote Sensing Applications. Applied Sciences 2022; 12: 8654. https://doi.org/10.3390/app12218654.
Lambelho M, Mitici M, Pickup S, et al. Assessing strategic flight schedules at an airport using machine learning-based flight delay and cancellation predictions. Journal of Air Transport Management 2020; 82: 101737. https://doi.org/10.1016/j.jairtraman.2019.101737.
Cosmas Haryawan, Yosef Muria Kusuma Ardhana. ANALISA PERBANDINGAN TEKNIK OVERSAMPLING SMOTE PADA IMBALANCED DATA. JIRE 2023; 6: 73–78. https://doi.org/10.1016/j.jire.2023.01.009.
Nasution MZ. PENERAPAN PRINCIPAL COMPONENT ANALYSIS (PCA) DALAM PENENTUAN FAKTOR DOMINAN YANG MEMPENGARUHI PRESTASI BELAJAR SISWA (Studi Kasus : SMK Raksana 2 Medan). JurTI (Jurnal Teknologi Informasi) 2019; 3: 41–48. https://doi.org/10.29303/jurti.v3i1.18.
H. A. KRISTANTO, “AIRPORT WEATHER INFORMATION SYSTEM IN INDONESIAN TO PREDICT FLIGHT DELAY,” other, Prodi Ilmu Komputer Unika Soegijapranata, 2013. Accessed: Jan. 03, 2025. [Online]. Available: https://repository.unika.ac.id/3511/
W. SOESANTIO, “Comparing Random Forest Algorithm and Support Vector Machine for Predicting the Level of Satisfaction with Flights,” other, Universitas Katholik Soegijapranata Semarang, 2022. Accessed: Jan. 03, 2025. [Online]. Available: https://repository.unika.ac.id/30029/
DOI: https://doi.org/10.24167/proxies.v8i2.13018
Copyright (c) 2025 Proxies : Jurnal Informatika
View My Stats