Comparison Between Random Forest And Xgboost Performance In Text Classification For Emotion Detection

Jessica Angela Lukito, Shinta Estri Wahyuningrum

Abstract


Humans can not read minds. In this era, where most people are using text-based communication through social media which are non-Face-to-Face interactions. A lot of misunderstandings happened during online conversations like texting because of unclear messages that leads to confusion. Unfortunately, the misunderstanding of a message can cause many negative things to happen such as fight, separation and many more. To resolve this issue, many research has been done by researchers. In some research, several researchers said that Random Forest is the best algorithm for text classification, while others said that XGBoost which is part of Decision Tree is the best. Moreover, there is a study that said Decision Tree is the worst performing algorithm for text classification. With this study, Random Forest and XGBoost as part of Decision Tree will be compared with several pre-processing scenarios and methods. Dataset used for this study is obtained from the Kaggle website which contains 416,809 unique values of sentences.


Keywords


Emotion Detection; Decision Tree; Random Forest; XGBoost

Full Text:

PDF

References


S. Susandri, S. Defit, and M. Tajuddin, “SENTIMENT LABELING AND TEXT CLASSIFICATION MACHINE LEARNING FOR WHATSAPP GROUP,” JITK (Jurnal Ilmu Pengetahuan dan Teknologi Komputer), vol. 9, no. 1, 2023, doi: 10.33480/jitk.v9i1.4201

A. Palanivinayagam, C. Z. El-Bayeh, and R. Damaševičius, “Twenty Years of Machine-Learning-Based Text Classification: A Systematic Review,” Algorithms, vol. 16, no. 5, 2023, doi: 10.3390/a16050236

A. Occhipinti, L. Rogers, and C. Angione, “A pipeline and comparative study of 12 machine learning models for text classification,” 2022. doi: 10.1016/j.eswa.2022.117193

A. Alshamsi, R. Bayari, and S. Salloum, “Sentiment analysis in English Texts,” Advances in Science, Technology and Engineering Systems, vol. 5, no. 6, 2020, doi: 10.25046/AJ0506200

S. Soni, S. S. Chouhan, and S. S. Rathore, “TextConvoNet: a convolutional neural network based architecture for text classification,” Applied Intelligence, vol. 53, no. 11, 2023, doi: 10.1007/s10489-022-04221-9

K. Kowsari, K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and D. Brown, “Text classification algorithms: A survey,” Information (Switzerland), 2019. doi: 10.3390/info10040150

T. Pranckevičius and V. Marcinkevičius, “Comparison of Naive Bayes, Random Forest, Decision Tree, Support Vector Machines, and Logistic Regression Classifiers for Text Reviews Classification,” Baltic Journal of Modern Computing, vol. 5, no. 2, 2017, doi: 10.22364/bjmc.2017.5.2.05

E. B. B. Palad, M. J. F. Burden, C. R. Dela Torre, and R. B. C. Uy, “Performance evaluation of decision tree classification algorithms using fraud datasets,” Bulletin of Electrical Engineering and Informatics, vol. 9, no. 6, 2020, doi: 10.11591/eei.v9i6.2630

J. Ababneh, “Application of Naïve Bayes, Decision Tree, and K-Nearest Neighbors for Automated Text Classification,” Mod Appl Sci, vol. 13, no. 11, 2019, doi: 10.5539/mas.v13n11p31

N. N. Prachi, M. Habibullah, M. E. H. Rafi, E. Alam, and R. Khan, “Detection of Fake News Using Machine Learning and Natural Language Processing Algorithms,” Journal of Advances in Information Technology, vol. 13, no. 6, 2022, doi: 10.12720/jait.13.6.652-661

I. Lasri, A. Riadsolh, and M. Elbelkacemi, “Real-time Twitter Sentiment Analysis for Moroccan Universities using Machine Learning and Big Data Technologies,” International Journal of Emerging Technologies in Learning, vol. 18, no. 5, 2023, doi: 10.3991/ijet.v18i05.35959

R. Rishickesh, R. P. Ram Kumar, A. Shahina, and A. Nayeemullah Khan, “Identification of duplication in questions posed on knowledge sharing platform quora using machine learning techniques,” International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 12, 2019, doi: 10.35940/ijitee.L3017.1081219




DOI: https://doi.org/10.24167/proxies.v9i1.13090

Copyright (c) 2025 Proxies : Jurnal Informatika



View My Stats