Analysis of User Comments Based on Topic Modeling using LDA on OVO E-Wallet

— Fintech OVO in Indonesia is an important part of cashless payment services. Users take advantage of the commenting service on the Playstore to convey messages to OVO managers. Hundreds of comments always appear every day, and this if not responded to will be a problem. The topic method of the Latent Dirichlet Allocation (LDA) model will be used to analyze the occurrence of user topics. Based on the 6-topic LDA model, we found that the trending topic was in topic 1, with a topic probability value of 0.235. Topic 1 mentions transaction difficulties with premium services with high OVO usage While the ease of transactions has the lowest total probability. The results of this topic can be used as a reference for OVO service providers to focus their performance on improving OVO applications. The impact of this research on service providers is to find out the topics discussed by OVO application users.


I. INTRODUCTION
OVO is one of the FinTech e-wallet services used by people in Indonesia. Android-based OVO user complaints submitted to the OVO Playstore. Research using OVO fintech comment data from Playstore extracts meaningful information, and generates features from data extracted using NLTK [1] for sentiment analysis.
Fintech can be a cross-disciplinary subject that combines finance, technology management and innovation management [2]. Fintech improves financial services processes by using technological solutions in line with existing business models. Fintech can be implemented on mobile devices including wireless, digital assistants, radio frequency devices, and NFC communication-based devices [3]. Fintech is integrated with mobile payments to complete financial transactions [4]. Fintech is a risky but valuable outcome of financial innovation [5], because it generates value for investors [6].
Through an online survey of US food delivery app customers, this study analyzes that app users direct them to use and recommend technology-based services [7]. This indicates that user comments are very important in the development of the company, it is also important to manage information dissemination effectively and strategically [8].
In addition, it was found that user engagement plays a mediating role between users' multiple attitudes and eWOM. The research results help the mobile sensor computing industry to develop effective strategies and build strong consumer-product relationships [9]. eWOM is very effective in brand promotion messages, and consumer experience [10].
A topic model (TM) is a statistical model that tries to find hidden topics in a collection of documents. Topics provide a summary of the corpus that cannot be obtained using document decomposition and similarity metrics for manually searching and understanding documents.

100
TM is used in text mining to find hidden semantic structures or also called latent variables. latent variables are hidden variables in the observed variables, namely documents [11]. TM assumes that the subject is a probability distribution over the vocabulary. TM is a technique to take unstructured text and automatically extract common topics. It's a great way to scan a large collection of text [12].
The key feature that differentiates the topic model from other grouping methods is the notion of mixed membership. However, in most cases it is more realistic to assume that the data actually belongs to more than one group or category. The TM assumes that the topic is a probability distribution over the vocabulary. The vocabulary probability is added to 1 or each topic, but mostly words with lower weight are truncated in the output. Likewise, we can represent individual documents as probability distributions over topics.
LDA requires documents to be rendered as word bags (for the Gensim library). This representation ignores the order of words in the document, but stores information about how many times each word occurs. A good topic model should be able to distinguish the two meanings depending on the context. Since there is no document tagging or human annotations, TM is an example of an unsupervised machine learning technique.
For the Gensim library, the default print behavior is to print a linear combination of the top words, arranged in descending order of possible words occurring in the topic. Then the words that appear on the left are the most significant words of the topic. The get_term_topics and get_document_topics functions are also used to further evaluate the results.
get_term_topics returns the probability that a given word belongs to a given topic. Customer engagement continues to grow online and in real life. Online customer engagement generates big data in structured and unstructured formats at high speed [13].
LDA combined with TF-IDF and Doc2Vec increases the variety of feature sets for document classification. The experimental results show that the proposed one is strong against parameter changes [14]. A text representation model that combines Word2Vec and LDA word insertion techniques, that improves accuracy and LDA offers a solution to the high-dimensional and high-sparity problems caused by the BoW model [15].

II. METHOD
Text mining on Fintech OVO user comments is the application of data mining concepts in looking for text patterns from user comments to find useful information for developers or owners of Fintech OVO. The initial stage of text mining for Fintech OVO users is text pre-processing, which aims to prepare user comment text into structured data that can be processed at a later stage. The following are the stages in text preprocessing: 1. the process of removing unnecessary punctuation marks. 2. The stage of converting the entire text to lowercase so that all words are equal. 3. Tokenization, which is the stage of solving Fintech OVO user comments based on the constituent words. 4. Normalization of non-standard words used in Fintech OVO user comments are changed to correct words so that they are as expected. 5. Word deletion. 6. choose the right features, the word dimension reduction stage. There are several stages in finding topics in the comment data of Fintech OVO users. The steps to get the topic are about the Gibbs sampling algorithm flow on the LDA, the following are the steps to get the LDA model: The purpose of this study is to identify the trend of OVO users' comments based on the topic of the model. The results of this study are useful to provide insight for companies to find out user comments so that companies can find out what topics are formed within 1 year.

III. RESULTS AND DISCUSSION A. Processing using Google Collab
There are 2 ways of processing data using Google Collab, namely the data is placed on Google Drive and the processing is on Google Drive, the second way is that the data on Google Drive is entered into the Collab virtual space. Processing data directly using the Google Drive script as below. For reading data, the second way is to install PyDrive, after that authenticate the user and then mount the data. The data files on Google Drive are shared with anyone with link access and the role is changed to viewer, as shown in Figure 1.

Links that have been formed like:
Take the share ID used to download data with the command Data that has been read will do some processing to eliminate data duplication, delete words that are not needed, delete symbols, delete punctuation marks.
Data that has been read will do some processing to eliminate data duplication, delete words that are not needed, delete symbols, delete punctuation marks. Using NLTK you can remove tab, new line, back slice, remove non-ASCII (emoticon, Chinese word), remove mention, link, hashtag, remove incomplete URL, remove number, remove punctuation, remove leading & trailing whitespace, remove multiple whitespace into single whitespace, remove single char.
NLTK stopwords are used to remove words that are not used in the analysis of the model topic, usually words that are commonly used in daily conversation, this process is determined by the researchers themselves.  Table 1 will be read by the system and then used for the data normalization process.
The next process is to do stemming using Literature. Python Sastrawi is a simple library that can convert words with Indonesian affixes into their basic form. This process makes it easy to match each word.

A. LDA Implementation Using Gensim
Determine the topic of OVO comments to be extracted, and determine the number of words per topic that is considered appropriate, so that each topic does not have the same topic meaning in this process, total_topics =6 and number_words =8 are determined.
The LDA setting in this research is Determination of the number of topics of 6 and the number of words in the topic of 8 is determined by repeated experiments so that in each topic there are no intersections between topics. With no intersection between topics, topic interpretation becomes easier.

B. Feature Interpretation
The results of the interpretation of the topics that have been built are top-up for credit services, lots of bonuses, failed premium processes, good bank transfers, difficulty updating, conditions cannot be used, satisfaction, verification difficulties.

C. Determining the Number of Topics
The initial stage of TM LDA will determine the right number of topics for Fintech OVO data. If the wrong number of topics is selected, this will sub-optimal performance resulting in incorrect OVO topics.
Doing word analysis on the LDA model must be careful because it is possible to find the fact that not all words on the topic can be interpreted correctly, the results of the LDA model do not produce good convergence with the resulting topics do not lead to the same discussion.
The results of the LDA process on Fintech OVO reveal 6 topics and consist of 8 user comments that have the highest probability of appearing in each topic of Fintech OVO. The words of OVO users' comments are arranged on each topic based on their similarities, compiling a specific topic. The topic of comments by Fintech OVO users is said to be convergent if the distribution of the words that make up the topic leads to the discussion of the same topic. Topic interpretation of Fintech OVO user comments of the LDA model is presented in Table 2.

D. Trend Topic Analysis
Detection of trending topic comments by Fintech OVO users can be detected by LDA well, where LDA can capture events well with a narrow topic coverage. To determine trending topic data on OVO Fintech user comments, it is done by looking at the highest probability value of each compiled topic. The probability value of the OVO Fintech topic from topic 1 to topic 6 is presented in Table 3.
From Table 3 it can be seen, the highest Fintech OVO topic is in topic 1, which is with a value of 0.235, on the interpretation of transaction difficulties on premium services. And the lowest review is the ease of transactions.

IV. CONCLUSION
Data processing of Fintech OVO users' comments utilizing the LDA algorithm produces 6 topics as be seen in table 3. They are sorted by the highest to lowest total probability of topics. It was found that the trending topic was on topic 1 with a topic probability value of 0.235. The term that appears in topic 1 means that the difficulty of transactions on premium services with high OVO usage, this indicates that there is a need to improve services in the process of moving to premium services. While the ease of transactions has the lowest total probability. The results of this topic can be used as a reference for OVO service providers to focus their performance on improving OVO applications. The impact