Javanese Gender Speech Recognition Based on Machine Learning Using Random Forest and Neural Network

Speech is a means of communication between people throughout the world. At present research in the field of speech recognition continues to develop in producing a robust method in various research variants. However decreasing the word error rate or reducing noise is still a problem that is still being investigated until now. The purpose of this study is to find the right method with high accuracy to classify the gender voices of Javanese. This research used a human voice dataset of both men and women from the Javanese tribe which was recorded and then processed using a noise reduction preprocessing technique with the MFCC extraction feature method and then classified using 2 machine learning methods, namely Random Forest and Neural Network. Evaluation results indicate that the classification of Javanese accent speech accents results in an accuracy rate of 91.3 % using Random Forest and 92.2% using Neural Network..


I. INTRODUCTION
Speech recognition technology is increasingly developing following the development of science and technology. Some technology giant companies have started their research in speech recognition, including Microsoft, Apple and Google by producing various forms of applications such as Google Now, Siri, or Cortana virtual assistants. These various technologies are also still developing as more and more researches are carried out in the field of recognition. sound. The history of research in the field of speech recognition has begun since 1952 researchers from Bell Labs built a system by the name of Audrey to recognize one-digit speakers. In the 1970s DARPA funded Speech Understanding Research which is speech recognition research that functions to find vocabulary size. A pretty phenomenal speech recognition product was produced by Google in the 2000s in the form of Google Voice Search which is supported by around 30 languages around the world.
Researchers around the world are still trying to build various methods and algorithms that are robust and have high accuracy in speech recognition. Some research on speech recognition includes Speech recognition with artificial neural networks with the method of voice recognition with Mel Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques [1], Voice recognition using Hidden Markov Mode [2] which results in accuracy up to 86.67%, Research speech recognition by combining the Artificial Neural Network method with Hidden Markov Model [3], Hindi voice recognition with Hidden Markov Model [4], Voice recognition for the biometric field with the Vector Quantization method [5]. Research on voice recognition in the groundwater was also carried out using Mel-Frequency Cepstrum Coefficients (MFCC) and Adaptive Neuro-Fuzzy Inferense System (ANFIS) resulting in an accuracy rate of 95.90% [6]. Markov Models with different letter tests produce an accuracy of 54.6% [7].
Various studies on speech recognition as has been done above still do not really produce the best accuracy in speech recognition, researchers are still trying to overcome various problems that often occur in speech recognition including how to reduce noise and reduce high data dimensions. This paper discusses the recognition of Javanese gender speech using machine learning. Research begins with sound recording and then the sound results are processed using the Adobe Audition application which is then performed feature extraction. The results of feature extraction will be evaluated by measuring the level of accuracy in its classification so that the results of speech recognition accuracy can be compared using 2 methods, namely Random Forest and Neural Network.

A. Dataset
Research on the javanese speech recognition in this paper uses a private dataset by recording the words "eating", "drinking" and "sleeping", each of which is spoken 10 times by 5 men and 5 women. Here are the sound recording settings in Adobe Audition: Then after recording, each sound is cut in the same duration of time that is 80631 to then be stored in respective folders.

B. Preprocessing
Preprocessing is a stage in processing the raw dataset, in this case the sound dataset to be cleaned of noise disturbances by reducing noise by utilizing the noise reduction feature in Adobe Audition. C. Feature Extraction Speech recognition dataset from male and female gender from the Javanese tribe is still in the form of a wav sound file which after being framing will then be processed with a feature extraction deungan to be a form of data that is ready for classification. In performing the dataset extraction feature, this paper uses Matlab software using the MFCC (Mel Frequency Cepstral Coefficients) method, the MFCC is a method used to contract the unique sounds of humans [8].
Using the MFCC function, each dataset is extracted to produce a unique feature for each sound to produce 150 wav file data record records for later labeling as a requirement for the classification process .

Random Forest Method
Random forest is one of the methods of machine learning that is widely used by researchers. This method is one way in machine learning to manage large amounts of data. Random forest for the first time was introduced in a paper by Leo Breiman, this paper contains a model for constructing uncorrelated tree forests using procedures such as CART (Classification And Regression Trees), which can be hybridized by optimizing random nodes and bagging [9]. Random Forest is a method that produces a high level of accuracy, but this method requires a high time compared to other methods [10]. However, the Random Forest method also has advantages over other methods, which are suitable for classifying high-dimensional data [11].

Neural Network Method
Neural Network is a method implemented in machine learning that is implemented like an imitation of the human brain [12]. Some of the features and advantages of neural networks include [13]: a. Adaptive learning Neural Network takes a copy of the human brain along with the ability to learn and adjust the concept of work while learning. b. Parallel operation The concept in neural networks works in parallel which can adjust as in the human brain.
c. Classification and recognition Neural networks can also be used in pattern recognition, data classification and other applications that have unclear data. d. More fast Neural networks also have the advantage of faster processing when compared to the human brain.

III. RESULTS AND DISCUSSION
Research on the Javanese speech recognition uses several stages which include:

A. Orange Application
Orange is an opensource application that can be used in machine learning in data mining, analysis and data visualization activities [14]. The Orange application was made by scientists at Ljubljana University using the Python, Cython, C ++ and C programming languages [15]. The measurement results of 2 methods of neural network and random forest can be seen in Figure 2 above where the random forest method produces an accuracy rate (CA) of 97.4%, F1 measure, precision and recall of 91.3% while the neural network method produces an accuracy level (CA) of 98.7% , F1 measure, precision and recall at 92.2%. Confusion matrix with the neural network method in Figure 4 above shows the level of accuracy that can be calculated with the formula: The result is 92.2%

Figure 5. RF Confusion Matrix
Confusion matrix with random forest in Figure 5 above shows the level of accuracy that can be calculated with the formula: The result is 91.3%

IV. CONCLUSION
Research on the Javanese gender speech recognition has been carried out, the dataset used is a private dataset where the recording process is carried out using a voice recorder that is processed using the Adobe Audition application. The next dataset processing is to do feature extraction using MFCC (Mel-Frequency Cepstrum Coefficients) technique then labeling process. After the dataset is ready to use the dataset is processed and then classified using 2 methods, namely random forest and neural network. The evaluation results show that the neural network method achieved the highest level of accuracy, namely 92.2%, while the random forest method obtained an accuracy of 91.3%.