COMPARISON OF EXTREME GRADIENT BOOSTING ALGORITHM AND ARTIFICIAL NEURAL NETWORK ON DIABETES PREDICTION

Jevon Carla, Y.b Dwi Setianto

Abstract


Diabetes is one of the serious diseases and it causes the sufferer to have high blood sugar due to the body unable to produce the required amount of insulin to regulate glucose. It may cause complications or may increase the risk of developing another disease like heart disease, kidney disease, blindness, etc. One of the best ways to fight this disease is by early diagnosis. If there are a lot of patient records, the machine learning classification algorithms play a great role in predicting whether a person has diabetes or not. The used dataset is Diabetes UCI Dataset from kaggle which has been collected using direct questionnaires from the patients of Sylhet Diabetes Hospital in Sylhet, Bangladesh, and approved by a doctor. The dataset has 520 data and 17 attributes.  Several studies have been made in the last few decades and some of them show that Artificial Neural Networks (ANN) are one of the best algorithms for diabetes predictions, Extreme Gradient Boosting (XGBoost) is one of the popular machine learning algorithms used for classification, because of that reason the writer wants to find out whether XGBoost can be used on diabetes prediction and compare it with ANN. Both algorithms models were trained with the same ratio 80:20, 75:25, 70:30. 60:40, and 50:50.  There are four models for the ANN with 3 hidden layers, 4 hidden layers, 5 hidden layers, and 6 hidden layers, as for the XGBoost models there are the first model with default parameters and the second one with the hyperparameters tuning. The accuracy, precision, recall, and f1 score of the models will be compared to find out which one has better performance. XGBoost performance able to achieve better performance but the third ANN models able to achieve highest score on 80:20, with 75:25 XGBoost with hyperparameters tuning able to achieve highest score, but XGBoost with default parameters have the same score as the the third ANN model, with 70:30 ratio, the third ANN model and both XGBoost models have the same score and have the highest score among all ratio. with 60:40 ratio, the first to third ANN models and XGBoost with default parameters have the same accuracy score but the third ANN models have the highest recall but lower precision than the XGBoost models. And with 50:50 XGBoost 2 has the best overall performances than the other models. 

Keywords


diabetes; ann; xgboost; prediction; comparison

Full Text:

PDF

References


Jasim, I. S., Deniz Duru, A., Shaker, K., Abed, B. M., & Saleh, H. M. (2017). Evaluation and measuring classifiers of diabetes diseases. 2017 International Conference on Engineering and Technology (ICET), 1–4. https://doi.org/10.1109/ICEngTechnol.2017.8308165

Sarwar, M. A., Kamal, N., Hamid, W., & Shah, M. A. (2018). Prediction of Diabetes Using Machine Learning Algorithms in Healthcare. 2018 24th International Conference on Automation and Computing (ICAC), 1–6. https://doi.org/10.23919/IConAC.2018.8748992

Hughes, J. A., Houghten, S., & Brown, J. A. (2020). Models of Parkinson’s Disease Patient Gait. IEEE Journal of Biomedical and Health Informatics, 24(11), 3103–3110. https://doi.org/10.1109/JBHI.2019.2961808

Zhang, Y., Lin, Z., Kang, Y., Ning, R., & Meng, Y. (2018). A Feed-Forward Neural Network Model For The Accurate Prediction Of Diabetes Mellitus. 7(8), 5. https://www.ijstr.org/final-print/aug2018/A-Feed-forward-Neural-Network-Model-For-The-Accurate-Prediction-Of-Diabetes-Mellitus-.pdf

Lakhwani, K., Bhargava, S., Hiran, K. K., Bundele, M. M., & Somwanshi, D. (2020). Prediction of the Onset of Diabetes Using Artificial Neural Network and Pima Indians Diabetes Dataset. 2020 5th IEEE International Conference on Recent Advances and Innovations in Engineering (ICRAIE), 1–6. https://doi.org/10.1109/ICRAIE51050.2020.9358308

Sonar, P., & JayaMalini, K. (2019). Diabetes Prediction Using Different Machine Learning Approaches. 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), 367–371. https://doi.org/10.1109/ICCMC.2019.8819841

Akter, T., Khan, Md. I., Ali, M. H., Satu, Md. S., Uddin, Md. J., & Moni, M. A. (2021). Improved Machine Learning based Classification Model for Early Autism Detection. 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), 742–747. https://doi.org/10.1109/ICREST51555.2021.9331013

Sun, X. (2021). Application and Comparison of Artificial Neural Networks and XGBoost on Alzheimer’s Disease. Proceedings of the 2021 International Conference on Bioinformatics and Intelligent Computing, 101–105. https://doi.org/10.1145/3448748.3448765

Srivastava, S., Sharma, L., Sharma, V., Kumar, A., & Darbari, H. (2019). Prediction of Diabetes Using Artificial Neural Network Approach. In K. Ray, S. N. Sharan, S. Rawat, S. K. Jain, S. Srivastava, & A. Bandyopadhyay (Eds.), Engineering Vibration, Communication and Information Processing (Vol. 478, pp. 679–687). Springer Singapore. https://doi.org/10.1007/978-981-13-1642-5_59

Sisodia, D., & Sisodia, D. S. (2018). Prediction of Diabetes using Classification Algorithms. Procedia Computer Science, 132, 1578–1585. https://doi.org/10.1016/j.procs.2018.05.122

Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. https://doi.org/10.1145/2939672.2939785

Mubarok, M. R., & Herteno, R. (2022). HYPER-PARAMETER TUNING PADA XGBOOST UNTUK PREDIKSI KEBERLANGSUNGAN HIDUP PASIEN GAGAL JANTUNG. 09, 11.

Phan, Q.-T., Wu, Y.-K., & Phan, Q.-D. (2020). A Comparative Analysis of XGBoost and Temporal Convolutional Network Models for Wind Power Forecasting. 2020 International Symposium on Computer, Consumer and Control (IS3C), 416–419. https://doi.org/10.1109/IS3C50286.2020.00113

Hossain, R., & Timmer, D. D. (2021). Machine Learning Model Optimization with Hyper Parameter Tuning Approach. 8.

Frimpong, E. A., Oluwasanmi, A., Baagyere, E. Y., & Zhiguang, Q. (2021). A feedforward artificial neural network model for classification and detection of type 2 diabetes. Journal of Physics: Conference Series, 1734(1), 012026. https://doi.org/10.1088/1742-6596/1734/1/012026

Ubaidillah, R., Muliadi, M., Nugrahadi, D. T., Faisal, M. R., & Herteno, R. (2022). Implementasi XGBoost Pada Keseimbangan Liver Patient Dataset dengan SMOTE dan Hyperparameter Tuning Bayesian Search. JURNAL MEDIA INFORMATIKA BUDIDARMA, 6(3), 1723. https://doi.org/10.30865/mib.v6i3.4146

Budholiya, K., Shrivastava, S. K., & Sharma, V. (2022). An optimized XGBoost based diagnostic system for effective prediction of heart disease. Journal of King Saud University - Computer and Information Sciences, 34(7), 4514–4523. https://doi.org/10.1016/j.jksuci.2020.10.013




DOI: https://doi.org/10.24167/proxies.v5i1.12443

Copyright (c) 2024 Proxies : Jurnal Informatika



View My Stats