Pengelompokan Jurnal Ilmiah Berdasarkan Judul Menggunakan LDA

Yosefina Oktaviani Santoso, R Setiawan Aji Nugroho


Scientific journals develop very rapidly along with the development of science. Reporting from, the number of scientific journals has reached over 39 million. The large number of scientific journals makes it challenging to grouping scientific journals. Grouping become more difficult because each scientific journal can have more than one topic. Therefore, special methods are needed to group the scientific journals.One of the well-known topic modeling methods is Latent Dirichlet Allocation (LDA). This research is an implementation of the LDA algorithm to do topic modeling in scientific journals. The topic modeling in this study uses the title as a corpus. Various titles are processed into bag of words in the pre-processing process so that they can be used to distribute. The results of the distribution stage are used for sampling with the Gibbs Sampling method. Through the sampling process, testing can also be done to determine the optimal parameters. The testing in this study used perplexity to find the most optimal number of iterations and topics. The result from this research are that LDA Algorithm successfully performs topic modeling in scientific journals by generating a list of keywords for each topic and grouping documents on each topic. The optimal parameters based on the results of perplexity comparison are 3 topics and 500 iterations.


Topic Modeling; LDA; Perplexity; Scientific Journal


D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet Allocation,” J. Mach. Learn. Res., vol. 3, no. null, pp. 993–1022, Mar. 2003.

K. B. Putra and R. P. Kusumawardani, “Analisis Topik Informasi Publik Media Sosial di Surabaya Menggunakan Pemodelan Latent Dirichlet Allocation (LDA),” 2017. Accessed: May 15, 2021. [Online]. Available:

P. Anupriya and S. Karpagavalli, “LDA based topic modeling of journal abstracts,” in 2015 International Conference on Advanced Computing and Communication Systems, Jan. 2015, pp. 1–5. doi: 10.1109/ICACCS.2015.7324058.

C. Jacobi, W. van Atteveldt, and K. Welbers, “Quantitative analysis of large amounts of journalistic texts using topic modelling,” Digital Journalism, vol. 4, no. 1, pp. 89–106, Jan. 2016, doi: 10.1080/21670811.2015.1093271.

X. Quan, C. Kit, Y. Ge, and S. J. Pan, “Short and sparse text topic modeling via self-aggregation,” in Proceedings of the 24th International Conference on Artificial Intelligence, Buenos Aires, Argentina, Jul. 2015, pp. 2270–2276.

R. Alghamdi and K. Alfalqi, “A Survey of Topic Modeling in Text Mining,” International Journal of Advanced Computer Science and Applications, vol. 6, no. 1, 2015, doi: 10.14569/IJACSA.2015.060121.

D. M. Blei, “Probabilistic topic models,” Commun. ACM, vol. 55, no. 4, pp. 77–84, Apr. 2012, doi: 10.1145/2133806.2133826.

T. L. Griffiths and M. Steyvers, “Finding scientific topics,” PNAS, vol. 101, no. suppl 1, pp. 5228–5235, Apr. 2004, doi: 10.1073/pnas.0307752101.

P. M. Prihatini, I. K. Suryawan, and I. N. Mandia, “METODE LATENT DIRICHLET ALLOCATION UNTUK EKSTRAKSI TOPIK DOKUMEN,” Logic : Jurnal Rancang Bangun dan Teknologi, vol. 17, no. 3, pp. 153–157, 2017, doi: 10.31940/logic.v17i3.604.

R. Y. K. Lau, Y. Xia, and Y. Ye, “A Probabilistic Generative Model for Mining Cybercriminal Networks from Online Social Media,” IEEE Computational Intelligence Magazine, vol. 9, no. 1, pp. 31–43, Feb. 2014, doi: 10.1109/MCI.2013.2291689.

K. Canini, L. Shi, and T. Griffiths, “Online Inference of Topics with Latent Dirichlet Allocation,” in Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, Apr. 2009, vol. 5, pp. 65–72. [Online]. Available:

W. Sriurai, “Improving Text Categorization By Using A Topic Model,” Advanced Computing : an International Journal, vol. 2, Dec. 2011, doi: 10.5121/acij.2011.2603.

A. Schofield, M. Magnusson, L. Thompson, and D. Mimno, “Pre-Processing for Latent Dirichlet Allocation,” 2017.


Copyright (c) 2021 Proxies : Jurnal Informatika

View My Stats