HATE SPEECH PREDICTION USING K-MEANS ALGORITHM

Lim, Alexandre N Pratama, Hironimus Leong

Abstract


Hate speech in social media nowadays is a common thing to happen. Inspired by the issue, this research utilize data mining algorithm and methods to predict and classify it. By using dataset from twitter, this research will focus to define Hate Speech. Before beginning to use the algorithm, firstly the dataset needs to be cleaned, after that the data will be converted tu numeric values by using TF-IDF. With N-Gram, the final results will be more stable in terms of accuracy. After the preprocessing is done, then the K-Means Algorithm is used. The final results of the research is that by using Tri-Gram, accuracy is better than Bi-Gram and Uni-Gram with highest reach of 80% efficiency. 


Keywords


K-Means; Clustering; TF-IDF; N-Gram; Hate Speech

Full Text:

PDF

References


Pak, A., & Paroubek, P. (2010). Twitter as a corpus for sentiment analysis and opinion mining. Proceedings

of the 7th International Conference on Language Resources and Evaluation, LREC 2010, 1320–1326.

https://doi.org/10.17148/ijarcce.2016.51274

Saputra, T. I., & Arianty, R. (2019). Implementasi Algoritma K-Means Clustering Pada Analisis Sentimen

Keluhan Pengguna Indosat. Jurnal Ilmiah Informatika Komputer, 24(3), 191–198.

https://doi.org/10.35760/ik.2019.v24i3.2361

Lutfi, A. A., Permanasari, A. E., & Fauziati, S. (2018). Corrigendum: Sentiment Analysis in the Sales Review

of Indonesian Marketplace by Utilizing Support Vector Machine. Journal of Information Systems Engineering

and Business Intelligence, 4(2), 169. https://doi.org/10.20473/jisebi.4.2.169

Wangsanegara, N. K., & Subaeki, B. (2015). Implementasi Natural Language Processing Dalam Pengukuran

Ketepatan Ejaan Yang Disempurnakan (Eyd) Pada Abstrak Skripsi Menggunakan Algoritma Fuzzy Logic. Jurnal

Teknik Informatika, 8(2). https://doi.org/10.15408/jti.v8i2.3185

Parveen, H., & Pandey, S. (2017). Sentiment analysis on Twitter Data-set using Naive Bayes algorithm.

Proceedings of the 2016 2nd International Conference on Applied and Theoretical Computing and

Communication Technology, ICATccT 2016, 416–419. https://doi.org/10.1109/ICATCCT.2016.7912034

Rezwanul, M., Ali, A., & Rahman, A. (2017). Sentiment Analysis on Twitter Data using KNN and SVM.

International Journal of Advanced Computer Science and Applications, 8(6), 19–25.

https://doi.org/10.14569/ijacsa.2017.080603

Riaz, S., Fatima, M., Kamran, M., & Nisar, M. W. (2019). Opinion mining on large scale data using

sentiment analysis and k-means clustering. Cluster Computing, 22, 7149–7164. https://doi.org/10.1007/s10586-

-1077-z

Windarto, A. P. (2017). Penerapan Datamining Pada Ekspor Buah-Buahan Menurut Negara Tujuan

Menggunakan K-Means Clustering Method. Techno.Com, 16(4), 348–357.

https://doi.org/10.33633/tc.v16i4.1447

Alkhairi, P., & Windarto, A. P. (2019). Penerapan K-Means Cluster pada Daerah Potensi Pertanian Karet

Produktif di Sumatera Utara. Seminar Nasional Teknologi Komputer & Sains, 762–767. http://seminar-

id.com/prosiding/index.php/sainteks/article/download/228/223

Dewi, S. M., Windarto, A. P., Damanik, I. S., & Satria, H. (2019). Analisa Metode K-Means pada

Pengelompokan Kriminalitas Menurut Wilayah. Seminar Nasional Sains & Teknologi Informasi (SENSASI),

–625. http://prosiding.seminar-id.com/index.php/sensasi/article/download/376/368




DOI: https://doi.org/10.24167/proxies.v3i2.12430

Copyright (c) 2024 Proxies : Jurnal Informatika



View My Stats