COMPARISON OF DECISION TREE ALGORITHM AND K-NEAREST NEIGHBOR (KNN) ALGORITHM PERFORMANCE IN DIABETES CASE STUDY

Silvano Pratama Jubilate Deo, Y.B. Dwi Setianto

Abstract


Diabetes is a chronic metabolic disease characterized by elevated blood sugar levels, which can result in damage to the eyes and vital organs. Type 2 diabetes is a variant of diabetes that most often affects adults over 18 years old, the symptoms caused by this variant are not very noticeable and to identify it requires a long test process. The use of classification algorithms in predicting diabetes, can help minimize the risk in the early stages of the disease and help health practitioners in controlling the impact of diabetes. In this study, the authors compare the performance of Decision Tree and K-Nearest Neighbor algorithms in predicting diabetes, on the Pima Indian Diabetes dataset. Both algorithm models were trained with 3 dataset sharing ratios, which are 80:20, 70:30 and 65:35. In addition, the authors also implemented GridSearchCV hyperparameter tuning to find the best parameters of both models. Accuracy, precision, recall and F-1 score of the two models are used to determine which model has the best performance. The results show that the Decision Tree algorithm without hyperparameter tuning has the best performance at a ratio of 70:40, resulting in accuracy 83.33%. K=7 is the most optimal K value in the KNN algorithm, resulting in an accuracy of 77.65%. Hyperparameter tuning GridSearchCV can work optimally at a ratio of 80:20 and 65:35, in finding the best parameters in decision algorithms. But there is still overfitting in decision tree algorithms.

Keywords


Diabetes; Decision Tree; K-Nearest Neighbor; GridSearchCV

Full Text:

PDF

References


Sivanesan, R., Devika, K., & Dhivya, R. (2017). A Review on Diabetes Mellitus diagnoses using classification on Pima Indian Diabetes Data Set. In International Journal of Advance Research in Computer Science and Management Studies (Vol. 5, Issue 1). https://www.academia.edu/35039029/A_Review_on_Diabetes_Mellitus_diagnoses_using_classification_on_Pima_Indian_Diabetes_Data_Set

Mahboob Alam, T., Iqbal, M. A., Ali, Y., Wahab, A., Ijaz, S., Imtiaz Baig, T., Hussain, A., Malik, M. A., Raza, M. M., Ibrar, S., & Abbas, Z. (2019). A model for early prediction of diabetes. Informatics in Medicine Unlocked, 16. https://doi.org/10.1016/j.imu.2019.100204

G. Tripathi and R. Kumar, "Early Prediction of Diabetes Mellitus Using Machine Learning," 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 2020, pp. 1009-1014, doi: 10.1109/ICRITO48877.2020.9197832. https://doi.org/10.1016/j.imu.2019.100204

Alaa Khaleel, F., & Al-Bakry, A. M. (2021). Diagnosis of diabetes using machine learning algorithms. Materials Today: Proceedings. https://doi.org/10.1016/j.matpr.2021.07.196

Abedini, M., Bijari, A., & Banirostam, T. (2020). Classification of Pima Indian Diabetes Dataset using Ensemble of Decision Tree, Logistic Regression and Neural Network. IJARCCE, 9(7), 1–4. https://doi.org/10.17148/ijarcce.2020.9701

Karyono, G. (2016). ANALISIS TEKNIK DATA MINING “ALGORITMA C4.5 DAN K-NEAREST NEIGHBOR” UNTUK MENDIAGNOSA PENYAKIT DIABETES MELLITUS. In Seminar Nasional Teknologi Informasi (Vol. 12). http://ojs.palcomtech.ac.id/index.php/SNTIBD/article/view/396

E. K. Hashi, M. S. U. Zaman and M. R. Hasan, "An expert clinical decision support system to predict disease using classification techniques," 2017 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox's Bazar, Bangladesh, 2017, pp. 396-400, doi: 10.1109/ECACE.2017.7912937. https://doi.org/10.1109/ECACE.2017.7912937

Kandhasamy, J. P., & Balamurali, S. (2015). Performance analysis of classifier models to predict diabetes mellitus. Procedia Computer Science, 47(C), 45–51. https://doi.org/10.1016/j.procs.2015.03.182

Sinha, P., & Sinha, P. (2015). Comparative study of chronic kidney disease prediction using KNN and SVM. International Journal of Engineering Research and Technology, 4(12), 608-12. https://www.ijert.org/research/comparative-study-of-chronic-kidney-disease-prediction-using-knn-and-svm-IJERTV4IS120622.pdf

Pranto, B., Mehnaz, S. M., Mahid, E. B., Sadman, I. M., Rahman, A., & Momen, S. (2020). Evaluating machine learning methods for predicting diabetes among female patients in Bangladesh. Information (Switzerland), 11(8). https://doi.org/10.3390/INFO11080374




DOI: https://doi.org/10.24167/proxies.v6i1.12455

Copyright (c) 2024 Proxies : Jurnal Informatika



View My Stats