Recommendation System of Information Technology Jobs using Collaborative Filtering Method Based on LinkedIn Skills Endorsement

— Students who are graduated from Informatics Engineering have wide employment opportunities in the information technology work field, such as database administrator, data scientist, UI designer, IT project manager, network engineer, system analyst, software engineer and UX designer. Each job in Information Technology field has different skill requirement for the interest of work field. Therefore, IT skill classification is needed to find out the suitable career recommendation for Informatics Engineering students. Data from IT professionals which are obtained from LinkedIn account of IT professionals will be processed as reference for students. Data are processed using K-Means Clustering algorithm to find out how is feasible IT professionals data are used as a reference. Then, Collaborative Filtering method by the K-NN algorithm is used to determine classification based on the proximity between student skills and information technology job field. The output is recommendation of information technology job field which are generated from calculate of IT student skills. Result has been tested by testing one of user that has been labeled software engineer produce a recommendation output as a software engineer.


I. INTRODUCTION A. Background
Informatics Engineering is a course that focuses on learning the principles of computer science and mathematical analysis for designing, developing, testing and evaluating of software [1]. Department of Informatics Engineering is a computer science major that its graduates students can work in lot branch in Informatic Technology fields such as database administrators, data scientist, UI designer, IT project manager, network engineer, UX designer, etc. It is different with other majors are like doctor education graduation student will work as a doctor, nursing graduation student will work as a nurse, pharmacy graduation student will work as a pharmacist, education major student will work as a teacher, and other majors that are their work sciences have been studied during college. However, informatics engineer students will never be enough if only have high education degree without completed by skills. Informatics engineering students have to expand their skills so they can fulfill the specifications as an expert worker in the certain IT fields. From the lot branch of IT job fields, job applicants must know what specific skills and knowledge are needed when they want job for a particular IT field. Informatics engineering students often do not know benchmark of their ability for work skill requirements in IT fields, so it result to the lack of graduates readiness for confront workplace in Informatics technology fields. Therefore, application system recommendation career in IT fields base on IT skills and knowledge is necessary. Although every student is equally graduates from Informatic Engineering, they actually have different skills and knowledge. There is currently a business oriented social networking website which mainly used for professional networking. That social network is LinkedIn. LinkedIn can promote professional online profile of someone to offer employment. So LinkedIn can connect users whom have not known each other in their professional scope [2]. The author will use 400 LinkedIn professional IT profiles from 8 types of IT work fields as a real reference data to make recommendations of 8 IT work fields which most suitable for students whom already have a LinkedIn account. Eight IT work fields which author mean are database administrators, data scientist, UI designer, IT project manager, network engineer, system analyst, software engineer, and UX designer. So in this research, we propose a combination of K-Means and K-NN algorithms to produce recommendations to graduates of informatics engineering based on their current expertise.

B. Literature Review
Tonni Limbong in 2013 with study titled "Implementations Methods of Simple Additive Weighting (SAW) for Selection of Informatics Work Field". This research weighted on academic number values of alumni data in academics information systems database [3].
Yurmalin M. Z. in 2016 also conducted research on the SPK (decision-making system) with title "Decision Making System for Concentration and Interest of Information Technology Course in Yogyakarta Janabadra University". This research is done by found the weight value for each criteria and carried out the rank of interest majors using Fuzzy Multiple Attribute Decision Making (FMADM) method and Simple Additive Weighting (SAW) method [4].
Research titled "A Multi-Criteria Collaborative Filtering Recommender System Using Clustering and Regression Techniques" in 2016 by Mehrbakhsh Nilashi, Muhammad Dalvi Esfahani, Morteza Zamani Roudbaraki, T. Ramayah, and Othman Ibrahim discusses about the multi-criteria recommendation system that uses many item criteria for rating to improve traditional collaborative filtering accuracy. Techniques used are regression and clustering which developed based on Expectation Maximization (EM) and Classification and Regression Tree (CART). The analysis is done on the dataset provided by Yahoo! Film and TripAdvisor show that the proposed method can improve the accuracy of Multi-Criteria Collaborative Filtering [5].
In 2017 there was research by Laode Aldhi Maulana Ramadhan, Sutardi, and Jumadil Nangi which was discussing about "Creation of E-Commerce Web at Kenime Store Shop Using Recommendation System based Collaborative Filtering with Adjusted Cosine Similarity Algorithm", where the system will provide goods recommendation to user according to his similarity with other users in pattern purchase. Results have been tested by black box method and approved that all test scenarios were completed and have not error at all [6].
Research by Sari Rahmawati, Dade Nurjanah, and Rita Rismala in 2018 regarding "Analysis and Implementation of Hybrid Approaches for Recommendation System with Knowledge-Based Recommender System and Collaborative Filtering Methods". Research combines 2 methods recommender system with weighted technique, where the prediction of knowledge-based recommender system will be combined with collaborative filtering method based on weight rule of two social types (liberal and moderate). The social aperture moderate gives weight of 75% recommendation based on user profile preference and 25% based on other user recommendations. While the liberal social aperture gives weight of 50% recommendation based on user profile and 50% based on other user recommendations. This research gets the best results for interaction prediction is on hybrid social moderate aperture with an average value of RMSE 0,347 of the value n=50 [7].

Another research by Andrias Eko Wijaya and Deni Alfian in 2018 about "Laptop
Recommendation System using Collaborative Filtering and Content-Based Filtering". The study discusses the laptop recommendation system for collaborative filtering method is using adjusted-cosine similarity algorithm to calculate similarity between users, and using weighted sum algorithm to calculate rating predction that has been given by user on an item. For the content-based filtering method is using TF-IDF algorithm to find the available contents to be recommended. The result show that manual recommendation calculation is same as the system recommendation calculation. Additionally, the execution time by the system is effected by number of items. Collaborative filtering method has an execution time longer than content-based filtering method [8].
In 2018, Jorge Valverde Rebaza and Paul Bustios also conducted a study titled "Job Recommendation Based on Job Seeker Skills". The research focuses on job recommendations based on the the professional skills of jobseekers. The technique used is the extraction of jobseekers skills using text processing. Algorithms used for recommendation are TF-IDF and 4 types of Word2Vec (Word2Vec-CBOW, Word2Vec-SkipGram, Word2Vec-Ngrams-SkipGram and Word2Vec-Ngrams-CBOW) with data taken are from Catho and LinkedIn sites. The Result of study obtained conclusion that not all profiles are given good recommendation, because maximum value of minimum average effectiveness which occurs on TF-IDF, wORD2Vec-SkipGram and Word2Vec-Ngrams-CBOW is 0.96 (only 48 from 50 profiles given recommendation) [9].

C. Operational Definition 1. Collaborative Filtering
Collaborative filtering is a recommendation system method based on same assuming between user's item rating with other user's item rating that is considered have same liking item.
Collaborative filtering is devided into 3 methods, namely :

a) Memory-based Collaborative Filtering
Memory-based collaborative filtering is divided into two namely user-item filtering and item-item filtering. User-item filtering takes a specific user, then finds other user like that user based on rating, to recommends items liked by similar user. While, item-item filtering retrieves the item, then finds the user who likes the item to find another item that the user liked. Some techniques utilize memorybased CF are user-based Top-N recommendation algorithm and item-based Top-N recommendation algorithm [10].

b) Model-based Collaborative Filtering
Model-based produces recommendations by creating models from learning a specific algorithm in the training process. Process of making this model is various, such as Bayesian network, clustering, and rule-based [10]

c) Hybrid Collaborative Filtering
Hybrid collaborative filtering is a recommendation system method that combines the method of collaborative filtering with other methods, such as content-based so as to complement the advantages and disadvantages of each method so that it can be improve recommender system performance [10].

Clusterization
Cluster is a collection of data object that have simlarities between one and other in same group and different objects with other group's data. Clustering or cluster analysis grouping a set of physical or abstract objects into single class of the same object [11].

Classification
One of algorithm for classifying is K-Nearest Neighbor (K-NN). K-NN is a method that use supervised algorithm where the result of newly classified test samples based on majority of categories on K-NN. The purpose of this algorithm is classifying new objects based on trainer's attributes and samples. The test point will be found number of K objects which closest to test point [12] .

II. METHOD A. Data Preparation
IT professional data profiles which obtained from LinkedIn site consists of 400 profiles will be used as reference for making recommendations. Data profiles are still an early data which contain list of top skills, knowledge, tools and other skills from each profile. Data

B. System Development Model
Author uses system design techniques with waterfall schemes. The waterfall model is often referred as a linear sequential or classic life cycle [13]. Waterfall model provides sequential flow of software approach starting from analysis, design, coding, and testing. The following is an explanation :

Requirement Analysis
Requirement analysis phase is determine list of requirements related to recommendation system application that will be designed. Application of system recommendation will have this following specifications : a) Application will use PHP tool b) Application will receive input of user profile which consist of education data, skills, knowledge, rating endorsement of skills obtained by user. c) Collaborative filtering using RapidMiner software allows clustering users of IT career fields based on endorsement skills rating. d) Collaborative filtering also performs KNN calculation process to obtain classification of the IT career fields for user. e) Application performs Euclidian Distance for recommendation process. f) Application will provide result a recommendation of IT career fields that best suited for user.

Design System
In the design system stage, author will create application models. Author use PHP tools with OOP (Object Oriented Programming) approach. OOP approach requires Unified Modeling Language (UML) as an abstraction about the system will be built. UML is a popular modeling language that has a good system visualization and documentation performance [14]. UML modeling even can generate ready to implement programming codes. UML consists of Use Case, Sequence Diagram, Class Diagram and Activity Diagram.

Coding
After design system, the next step is to do coding as PHP tool implementation. Coding must be adjusted with design system which have made.

Testing
The final stage in waterfall scheme is testing. Code that has been created will be tested to see whether the application has sufficient the requirement analysis according the initial planning phase.

A. K-Means
K-Means algorithm is a clusterization algorithm which grouping data based on the closest centroid cluster with data. Purpose of K-Means is grouping data by maximizing data similarity in one cluster and minimizing data similarity between clusters. The size of similarity used in cluster is distance function. So, maximization of data similarity is obtained based on shortest distance between data and centroid [15].
Author uses method of data clusterization using K-Means algorithm to determine whether the data of 400 professionals IT have been collected are worthy to be used as a reference of IT career recommendation system to students. Here is initial view of RapidMiner version 5.3

. Excel File Configuration View
Next, configure the excel file so that data can be processed in clustering on RapidMiner. Configurations include name configuration and attribute type on table.

Figure 3. Input View of Excel File and K-Means Operator
Put the selected Excel file with K-Means operation. Fill the value K=8 because author will use 8 types of clusters to see 8 types of data grouping, which will be used as a reference in the next using K-NN method. Connect the Excel file with K-Means operator. Click Run for the data processing process.
Here is the result of processing using K-Means with K value = 8 and max iteration of 10 times. Author obtained the result of clusterization as this following data grouping : and 50% C3. From the research above, it can be known that from 8 clusters of each career field just branched out to maximum of 3 clusters or not more than half the number clusters specified. So, data is worth to be used as a reference recommendation system.

B. K-NN
K-NN is an algorithm includes a group of instance-based learning. This is one of lazy learning techniques. K-NN is done by looking for group K objects in the closest training data with object in new data or data testing [16].
In this K-NN research author use k=1,2,3,4,5,6,7,8 to determine classification of 8 IT careers areas, and to calculate the distance between user testing and data 8 IT career field by using Euclidean Distance. The smallest distance between user testing and IT career field will be used as a recommendation.
User testing SE_19 data that will be used as an example of the K-NN calculation testing. User SE_19 is the user who was originally labeled Software Engineer.
Calculation of distance between user SE_19 with each area of IT career fields is calculated by comparing the distance of skills user SE_19 and skills in each IT career fields.    From the table above, User SE_19 has first recommendation on software engineer, second recommendation on data scientist, third recommendation on system analyst, fourth recommendation on network engineer, fifth recommendation on project manager, sixth recommendation on UI designer, seventh recommendation on UX designer, and eighth recommendation on database administrator.

C. Program Implementation 1. Student Dashboard
Student can manage student profile data, input skills, point endorsements of skills and seek recommendations.

Find Student Recommendations
On search recommendations page will show list of skills that point endorsements skills must be input by student.

Student Recommendation Results
On recommendation results page will produce output of recommendation according to the skills and point endorsements skills entered by student.