Drafting of Online Meeting Minutes Based on Video Recording Using Topic Modelling

— Meeting minutes are important because they can track decisions and agreements made during the meeting. Meeting minutes can also be used as a benchmark for whether the meeting objectives have been achieved or not. Minutes are taken during the meeting until the end of the meeting, which contains essential points from the meeting. Minutes in online meetings are currently still done manually, and generally, every meeting is recorded as documentation that requires more Human Resources to change the recording of the meeting file Based on the problems above, a solution to this problem is needed by creating an automatic note-taking system that can assist the note-takers in concluding the meeting, especially in the Information Technology Department. This study uses the Latent Dirichlet Allocation (LDA) method to determine topic modeling. Based on this research, the system calculation using the LDA method produces the results obtained on the coherence score and similarity score only get an average value of 64.56% and 57.91% where these values are still less than optimal if used in actual conditions.


I. INTRODUCTION
A meeting is the gathering of two or more people to solve problems in an organization. Meeting Contains a collection of several people or organizations to discuss common interests to solve the problem providing clarity to obtain a mutually agreed outcome [1]. In a meeting, it has become a necessity for minutes to be carried out. Minutes are notes on a meeting that are written by including the points of the meeting that have been neatly written. Meeting Minutes are a very important component in seeing the results of meetings conducted by organizations [2].
The coronavirus (Covid-19) outbreak has forced an organization to change and find solutions on how to continue its activities to adapt to existing conditions [3]. Meetings, which are generally held face-to-face, have to be changed to e-conferences to support government policies to help limit the spread of Covid-19, especially in Indonesia.
Minutes in online meetings are currently still done manually and generally, every meeting is recorded as documentation that requires more Human Resources to change the recording of the meeting file [4], [5]. Problems with online meetings are found in long transcripts and lack of document structure, which makes it difficult to identify content that stands out for inclusion in meeting minutes [6]. The process of recording the results of the meeting carried out by the minutes will take more time because the minutes are required to re-listen to the recording then take point points from the meeting and rewrite it in the text editor which is then given to each member of the meeting [4], [7].
In the same problem, solutions were offered in previous research including ProMETheus [8], and then developed into SmartMeeting [9] where both applications are notable creators using mobile devices. The first part of SmartMeeting is meeting speech detection, which is used to automatically extract multiple speech segments from each speaker, sort them by time, and mark the corresponding speakers separately in different periods. The meeting voice text transcription is the second part, in which the voice segment cut in the previous step is transcribed into a complete meeting text. The third component is the extraction of meeting minutes, which involves extracting meaningful key phrases and abstract sentences from the entire meeting text. The fourth section is the next work plan extraction, which uses the emotion recognition algorithm to filter out the negative emotion summary from the multiple meeting minutes to display the meeting's next work plan. The weakness of the study is the very long process time because SmartMeeting can separate the conversation from each speaker in the meeting. However, the case studies raised in this study, it does not require To provide a solution to these problems, in this study it is proposed to make minutes automatically using Topic Modelling which is based on the Latent Dirichlet Allocation (LDA) method whereas in the previous research Topic Modelling could be used as a solution to various problems, including in research [10], by using tandem mass spectrometry data in a variety of standard formats, Ms2lda. org enables users to infer the co-occurring sets of fragment and neutral loss features (Mass2Motifs).
The user can additionally decompose a data set onto predefined Mass2Motifs as an alternate approach. Next in the study [11], To understand how the phenomena of living labs have been treated in the recent innovation management literature, this study applies topic modeling analysis to a corpus of 86 papers in the Technology Innovation Management Review (TIM Review). The TIM Review has published the most special issues on living labs so far, therefore even though the analysis is done on a corpus compiled from just one journal, it still reflects the development of the field in the academic literature. The analysis identifies seven major categories into which research approaches living labs can be divided: 1) Design, 2) Ecosystem, 3) City, 4) University, 5) Innovation, 6) User, and 7) Living lab.
Additionally, each topic has a collection of distinctive subtopics. Furthermore, in the study [12], This article explores and critically evaluates the potential contribution to discourse studies of topic modeling, a group of machine learning methods that have been used to automatically discover thematic information in large collections of texts. This study, therefore, highlights the issues associated with the use of topic modeling and offers recommendations and caveats for researchers applying such approaches to study future discourse. Then in the research [13], Because of the complexity and length of the so-called "European refugee crisis," there was an atmosphere of uncertainty that gave the media plenty of opportunity to influence how the public perceived the entrance of these refugees meant for their various countries. This study examines national media discourses from this period in Hungary, Germany, Sweden, the United Kingdom, and Spain. The research unveils country-specific media frames to trace the overall trajectory of the refugee debate and to identify dynamics and shifts in discourses using Latent Dirichlet Allocation topic modeling in five languages based on N 14 130,042 articles from 24 news sites. As a result of media coverage reacting to actual events, data show similarities across nations, although media framing varies as well. Next in the research [14], To better understand user interaction with IPA, our research created a theoretical framework based on technology acceptance models and usage and satisfaction theories. Also, to identify important aspects that have different impacts on IPA scores, our method uses machine learning algorithms based on text summarization, structural topic modeling, cluster analysis, sentiment analysis, XGBoost regression, etc. Next in the study [15] To enhance destination marketing initiatives, this research offers a machine learning methodology that combines electronic word of mouth (eWOM) from tourists. Each DT examines various cultural and economic factors and how they connect to the topics covered in the eWOM, looking at patterns associated with the traveler experience to pinpoint potential sources of happiness or dissatisfaction. Using the results of data analysis, managers can utilize this method to create marketing campaigns that better address the wants and desires of visitors from various ethnic and economic backgrounds. Research [16], investigation concluded that there was little disinformation regarding the coronavirus outbreak and that information flow was accurate and dependable. Sentiment analysis validated the predominance of both fear and trust, as well as the most pertinent and accurate themes connected to the corona virus outbreak revealed by LDA analysis.

II. METHOD
In completing this study, data was used in the form of video recordings of online meetings accompanied by minutes that had been prepared by the meeting minutes. The video recording of the online meeting will be used as input data that will be processed by the computational method while the minutes that have been compiled by the notes will be used as validators so that the accuracy of the use of the proposed method can be calculated.
All stages in building the solution offered are depicted in Figure 1 where the initial stage is the video data of online meeting recordings that have been collected previously, which will be converted into audio data in .wav format, and then the audio will be processed using the library [17] so that a transcript of the audio is obtained.
The transcript that has been obtained, next will be preprocessing Text [18] where there are 5 stages, namely: (a) Case Folding, which aims to convert all letters in the transcript into lowercase letters, (b) Tokenizing, which aims to break the sentence into word by word, (c) Filtering, which aims to eliminate words that are included in the stopword word list Indonesian where the words are words that have no important meaning in the sentence or appearing too frequently in transcripts, (d) Stemming, which aims to remove the affixes, inserts, and suffixes of each word so that the root word is obtained, and (e) Lemmatization, which aims to improve the results of the Stemming process so that the resulting base word corresponds to the base word that is in Indonesian Dictionary / Kamus Besar Bahasa Indonesia (KBBI).

Figure 1. Research Framework
The audio transcript has been a collection of basic words, to be processed using Topic Modelling in the calculations using the formula [ Where is the number of topics, is the index of the document, is the number of words in the sentence, and is a parameter of Dirichlet. The results of the Topic Modelling calculation will be measured by the quality of the topics produced using the Coherence Scoring method [20] to assess the quality of Topic Modelling work and identify areas where improvements could be made to enhance the clarity and consistency of their arguments. Next will be calculated the degree of similarity of the words that make up the topic with the minutes that have been drawn up by the note as a validator.
The previously compiled minutes will be preprocessed text to obtain important words in the form of basic words so that they can be measured in the degree of similarity with the results of topic modeling using the Vector Space Model (VSM) [21], [22]. This needs to be done because the degree of similarity of the existing Minutes to the results of topic modeling represents the level of accuracy.

III. RESULTS AND DISCUSSION
As a result of collecting online meeting recording video data, there are 4 online meeting recording videos with an average meeting duration of 2 hours where each video consists of the opening, core problems, questions from each online meeting participant, and conclusions from the meeting held.
During the Audio to Text process, several adjustments were made where if the Audio to Text process was carried out directly at a duration of 2 hours, a very good internet connection was needed because if there was a slight connection interruption, the audio to text process would experience an error so that the audio to text process was carried out by looping per 3 minutes.
In the Topic Modelling process, results were obtained as in Table 1 where the results obtained, there are still some words that are not standard but are often spoken in online meetings so that they cannot be processed properly at the Filtering Preprocessing Text stage as in the word "gimana", "loh", and "iman". At the Stage of Coherence Scoring and Measurement who aim to assess the quality of their work and identify areas where improvement can be made to increase the clarity and consistency of their arguments, the level of similarity of documents from the basic word set of audio transcripts with the minutes that have been formed is found in Table 2 where the average coherence score of the 4 data used is 64.56% and the average value of the similarity level is 57.91%. The coherence score obtained is less than optimal because in the processed words there needs to be an additional process to get a more optimal text preprocessing result. Meanwhile, the level of similarity with the previously arranged minutes is less than optimal because, at the time of preparing the minutes, the notes use their own words based on re-watching the video recording of the online meeting. IV. CONCLUSION In the research that has been done, the results obtained on the coherence score and similarity score only get an average value of 64.56% and 57.91% where these values are still less than optimal if used in actual conditions. The results of this research will also help the note-taker not to watch the video recording repeatedly and have gotten the core words of what was discussed during the online meeting, but the wording produced but the note-taker will still compose the minutes with the understanding of the note-taker himself.
As a suggestion for future research, Text Summarization will be used so that the results of the application are already in the form of sentences so that the note takers no longer compose notes based on their understanding of the help of words, they only need to correct or add sentences that have been arranged automatically from Text Summarization.