Topic Analysis Using LDA-LSTM on Shopee User Comment

Shannon Dominique Saputra, Albertus Dwiyoga Widiantoro Dwiyoga Widiantoro

Abstract


Shopee's growth has also shaped online shopping in Indonesia. This study analyzed user reviews to measure satisfaction and identify key service issues using a hybrid framework: LDA for topic modeling and LSTM for sentiment classification. Class imbalance was addressed using a combination of Random Oversampling and Neighborhood Cleaning Rule (ROS-NCL).

The results showed that LSTM + ROS-NCL outperformed ROS, NCL, and SMOTE, with 95% accuracy and a precision, recall, and F1-score of 0.94 each. These findings demonstrate that oversampling combined with noise cleaning effectively improves performance on imbalanced data, while also providing practical insights for feature development, logistics improvements, and Shopee's promotional strategy.


Keywords


LDA; LSTM; NCL; ROS; Sentiment Analysis; Shopee; Text Classification

Full Text:

PDF

References


ISEAS–Yusof Ishak Institute, “E-commerce in Indonesia: Impressive Growth but Facing Serious Challenges,” 2021. [Online]. Available: https://www.iseas.edu.sg/articles-commentaries/iseas-perspective/2021-102-e-commerce-in-indonesia-impressive-growth-but-facing-serious-challenges-by-siwage-dharma-negara-and-endang-sri-soesilowati/

The Jakarta Post, “COVID-19 gives rise to new normal online shopping: Survey,” 2020. [Online]. Available: https://www.thejakartapost.com/news/2020/05/23/covid-19-gives-rise-new-normal-online-shopping-survey.html

ISEAS–Yusof Ishak Institute, ibid.

Google, Temasek, Bain & Company, “e-Conomy SEA 2023,” 2023. [Online]. Available: https://economysea.withgoogle.com

U.S. Department of Commerce, International Trade Administration, “Indonesia – E-commerce Sector,” 2022. [Online]. Available: https://www.trade.gov/market-intelligence/indonesia-e-commerce-sectors

H. H. Tran, “Sentiment Analysis in E-commerce: Methods and Applications,” Journal of Retailing and Consumer Services, vol. 72, p. 103115, 2023, doi:10.1016/j.jretconser.2023.103115.

R. K. B. Putri, S. Purnama, and D. S. Nugroho, “Prediction of Research Trends Using LDA Based Topic Modeling,” IJACSA, vol. 12, no. 7, pp. 443–451, 2021, doi:10.14569/IJACSA.2021.0120751.

A. A. Rahman and M. A. Hossain, “A Topic Modeling Comparison Between LDA and NMF,” Procedia Computer Science, vol. 214, pp. 865–874, 2022, doi:10.1016/j.procs.2022.02.143.

D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” JMLR, vol. 3, pp. 993–1022, 2003.

Y. Lu, H. Liu, and F. Zhao, “Advanced Sentiment Analysis in Online Shopping: Implementing LSTM Models,” Electronic Commerce Research, 2025, doi:10.1007/s10660-025-09948-4.

M. Khan, S. Hussain, and A. Bukhari, “Sentiment Analysis in E-commerce Platforms: A Review of Current Techniques and Future Directions,” arXiv preprint, arXiv:2505.03828, 2025.

Hidayatullah, A. F., & Nugroho, E. (2023). Topic modeling for e-commerce customer reviews using LDA. International Journal of Intelligent Engineering and Systems, 16(2), 128–136. https://doi.org/10.22266/ijies2023.0430.12

Hartono, P. C., Widiantoro, A. D., Harnadi, B., & Nugroho, E. W. (2025). Hybrid Approach of LDA, ROS-NCL, and CNN for Topic Mining in Fintech User Reviews. 2025 International Conference on Advancement in Data Science, E-Learning and Information System (ICADEIS), 1–6. https://doi.org/10.1109/ICADEIS65852.2025.10933096

Widiantoro, A. D., Mustafid, M., & Sanjaya, R. (2023). Leveraging Latent Dirichlet Allocation for Feature Extraction in User Comments: Enhancements to User-Centered Design in Indonesian Financial Technology. Ingénierie Des Systèmes d Information, 28(5), 1423–1433. https://doi.org/10.18280/isi.280530

Basu, S., et al. (2022). Handling class imbalance using ROS and Neighborhood Cleaning Rule for sentiment classification. arXiv preprint arXiv:2208.09619. https://arxiv.org/abs/2208.09619

Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv:1412.3555.

Akhtar, M. S., et al. (2020). Multi-task learning with deep bidirectional LSTM for sentiment analysis. PeerJ Computer Science, 6, e2542. https://peerj.com/articles/cs-2542

Wang, H., Can, D., Kazemzadeh, A., Bar, F., & Narayanan, S. (2012). A system for real-time Twitter sentiment analysis of 2012 US presidential election cycle. Proceedings of the ACL 2012 System Demonstrations, 115–120.




DOI: https://doi.org/10.24167/sisforma.v13i1.15534

Refbacks

  • There are currently no refbacks.




SISFORMA: Journal of Information Systems | p-ISSN: 2355-8253 | e-ISSN: 2442-7888 | View My Stats

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.