Page 364 - Proceeding The 2nd International Seminar of Science and Technology : Accelerating Sustainable Innovation Towards Society 5.0
P. 364
nd
The 2 International Seminar of Science and Technology
“Accelerating Sustainable innovation towards Society 5.0”
ISST 2022 FST UT 2022
Universitas Terbuka
assumptions that must be met [10]. Random Forest is a method that
consists of a structured set of trees that each casts a vote unit for the
class and the results obtained are based on the most decisions. The
basic technique used by Random Forest is Decision Tree. In other
words, a random forest is a set of decision trees that are used for
classification and prediction of data by entering input into the roots
above and then down to the leaves below [2].
Random Forest uses an ensemble bagging strategy that can
overcome the overfitting problem that occurs if the train data is small
[11]. The results of the Random Forest analysis for classification are
the mode of each tree of the forest built, while the prediction results
are obtained from the average value of each tree [12]. The algorithm
to follow when constructing a tree using a Random Forest is divided
into two parts. The first is the creation of "n" trees to form a random
forest. The second is to make predictions from Random Forests that
have been made [2].
Input:
− D, a dataset consisting of d rows
− k, the number of trees
The Random Forest method process in constructing a tree:
a. Generate sample data Di data by taking random data from
dataset D with replacement.
b. Use sample data Di to build a tree to i (i=1,2,…k)
c. Steps 1 and 2 are repeated k times
In the classification process, the individuals are based on the vote of
the most votes in the tree population collection, while for the
regression using the average results of the tree population. Stages of
analysis of the Random Forest method
1) In Random Forest analysis, the first step is to input data into the
R Studio software.
2) Divide the data into training and testing data. Then identify the
Random Forest model with the ntree value (number of trees) that
has been determined using training data, testing data is used to
see the error rate of the model made.
ISST 2022 – FST Universitas Terbuka, Indonesia 327
International Seminar of Science and Technology “Accelerating Sustainable
Towards Society 5.0