Application of the Naïve Bayes Algorithm and Radipminer Application to Determine the Nutritional Status of Toddlers

Determining the nutritional status of toddlers is an important activity carried out by medical personnel at community health centers as technical implementers. Because with the results of determining nutritional status, intervention efforts to overcome the condition of toddlers who experience malnutrition, malnutrition or overnutrition can be addressed as early as possible. However, medical personnel often find it difficult to determine the nutritional status of toddlers due to limited human resources and equipment, so it is important to have a method used to determine the nutritional status of toddlers quickly, precisely and accurately. For this reason, in this study we used the Naïve Bayes algorithm in calculating the classification for determining the nutritional status of toddlers. The research results show that the Naïve Bayes algorithm can increase the level of data accuracy in determining the nutritional status of toddlers with a data accuracy value of 70.33%


INTRODUCTION
Toddlers are a group that has an important period in a child's physical growth and development.Toddler itself is a general term for children aged 1-3 years (toddlers) and preschool children (3-5 years).The toddler years themselves are often said to be the golden age because the success of a person's growth and development in the future is determined during the toddler years (Nugraha et al., 2017).
One important effort to improve the quality of human resources is efforts to improve nutritional status.Nutritional status is one of the factors that determines quality of life and work productivity.Improving nutritional status is very necessary, especially for toddlers who are still in their golden age.Assessment of nutritional status can be one method to prevent nutritional problems.The nutritional status of toddlers will be known by assessing the nutritional status of toddlers.Assessment of nutritional status plays a very important role in monitoring children's nutrition.If the assessment of nutritional status is carried out correctly and accurately, signs or symptoms of impaired growth and development in children can be identified early so that mitigation and prevention can be carried out in an effort to improve the nutritional status of children under five.And the occurrence of nutritional problems can be minimized.For this reason, determining the nutritional status of toddlers must be done quickly and accurately.
Puskesmas as the technical implementer of the Health Service, has the main task of collecting data and assessing the nutritional status of toddlers and submitting the results of the assessment to the Health Service.Determination of nutritional status has been carried out using standard anthropometric tables according to the Decree.Minister of Health Number: 1995/Menkes/SK/XII/20 10.This kind of determination will take a long time to determine nutritional status if there is a large amount of data on toddlers.To overcome the problems above, it is necessary to have a system that helps the process of determining the nutritional status of toddlers using the classification method.The classification method in this research will be able to calculate the nutritional status of toddlers in the future because classification is a method that uses training data as part of making decisions and can adjust other parameters for determining the nutritional status of toddlers so that it will produce better results.good (Nugraha et al., 2017).

LITERATURE REVIEW A. Data Mining
Data mining is a process of mining information from data.This information is obtained from a very complicated process such as using AI, statistical techniques, mathematics, machine learning, and so on.The technique will identify and extract useful information from a large database (Sudarsono et al., 2021).
From the definition that has been presented, the important things related to data mining are (Mardi, 2017): 1.Data mining is the automation of existing data.2. Very large data to be processed.

The goal of data mining is to obtain patterns that may provide useful indications
Data mining is an iterative and interactive process to find new patterns that are perfect, useful and understandable in a very large database.Data mining consists of searching for desired trends in large databases to help decision makers in the future.These patterns are recognized by certain tools that can provide useful and insightful data analysis which can then be studied more thoroughly, possibly using decision support tools other (Sikumbang, 2018).

B. Decision Tree
Decision Tree is a flow diagram shaped like a tree structure where each internal node represents a test for an attribute, each branch represents the output of the test and the leaf nodes represent the classes or class distribution.The top node is called the root node.A root node will have several outgoing edges but no incoming edges, an internal node will have one incoming edge and several outgoing edges, while a leaf node will only have one incoming edge without any outgoing edges (Qadrini et al., 2021).A tree is a data structure consisting of nodes and edges.Nodes in a tree are divided into three, namely root nodes, branching/internal nodes, and leaf nodes.A decision tree is a simple representation of a classification technique for a finite number of classes, where the internal nodes and root nodes are marked with attribute names, the edges are labeled with possible attribute values and the leaf nodes are marked with different classes (Nasrullah, 2021).

C. RapidMiner
Rapid Miner is software created by Dr. Markus Hofmann from the Blanchardstown Institute of Technology and Ralf Klinkenberg from rapid-i.com with a GUI display making it easier for users to use this software.This software is open source and created using Java programs under the GNU Public License and Rapid Miner can be run on any operating system.By using Rapid Miner, no special coding skills are needed, because all the facilities are provided (Sudarsono et al., 2021).

D. Naïve Bayes
Naïve Bayes is the application of Bayes' theorem.Naïve Bayes is based on the simplifying assumption that attribute values are conditionally independent of each other given the output values.Given an output value, the probability of observing together is the product of the individual probabilities (Putri & Wijayanto, 2022).
Naïve Bayes is one of the algorithms that is popularly used for data mining purposes because of its ease of use and fast processing, easy to implement with a fairly simple structure and a high level of effectiveness.Naïve Bayes is an algorithm that can classify a particular variable using probability and statistical methods.Depending on the probability model, it can be trained to perform supervised learning very effectively.Naïve Bayes does not require a large amount of training data (Khotimah & Utami, 2022) Bayes classification is a statistical classification that can be used to predict the probability of membership of a class.Bayes' classification is based on Bayes' theorem, taken from the name of a mathematician who was also an English Presbyterian minister, Thomas Bayes (1702-1761), namely where Bayes' theorem is written with the following formula:

𝑃(𝑦)
Information : Y = Data with unknown classes x = The y-power hypothesis is a specific class P(x|y) = Probability of hypothesis x based on condition y P(x) = prior probability x P(y|x) = Probability y based on the conditions in hypothesis x P(y) = Probability of y The Bayes theorem formula is simplified to Naïve Bayes which is a simplification of the Bayes method.A simplification of Bayes' theorem is written as follows: P(x|y) = P(y|x) P(x).

METHODOLOGY
The methodology used is experimental research methodology, this methodology is used for research that is experimental, manipulating and influencing things related to all variables or attributes.

A. Research Variables
In classifying toddler nutritional status data in this research, the Naïve Bayes algorithm will be used to detect toddler nutritional status.Attributes and attribute values are obtained from data reports assessing the nutritional status of toddlers.The attributes used in this research include: 1. Gender This is an attribute that contains data on the toddler's gender 2. Age This is an attribute that contains data on the age of toddlers 3. Height This is an attribute that contains data on the toddler's height 4. Body weight This is an attribute that contains data on the toddler's weight 5. Remark It is an additional attribute that contains the determination of the nutritional status of toddlers.
To create this classification, 5 attributes are used and the attributes which are parameters are shown in table 1

C. Data Analysis Techniques
The Knowledge discovery in database (KDD) process is interactive and iterative, involving many steps with many decisions made by the user.Provides a practical view of the KDD process, emphasizing the interactive nature of the process, which consists of nine steps.1. Toddler Nutrition Data Selecting and creating a data set that will be used for research must be determined, including finding out what data is available, obtaining additional data that is needed, and then integrating all the data for knowledge discovery into one data set including the attributes that will be used for the process.
Table 2. Nutritional Data for Toddlers

1.Data Transformation
At this stage, better data generation for data mining is developed.This stage is also a transformation process for data that has been selected according to the data mining process.This is a creative process and depends greatly on the pattern of information to be searched in the database.
The data that will be used in the research is transformed into categories according to those in table 1, resulting in research data as in table 3. Table 3. Nutritional Status of Toddlers Who Have Passed Processing and Cleansing as well as Transformation Data The data that has been prepared for classification is divided into two, training data and testing data with a ratio of 80% and 20%.The division of data into training data and testing data uses systematic random sampling techniques.The way to use this systematic random sampling technique is that the drawing or drawing is only done once, namely when determining the first element of the sampling that will be taken.Determination of sampling elements is then carried out by utilizing the sample interval.The sampling interval is a number that shows the distance between the serial numbers contained in the sampling frame which will be used as a benchmark in determining or selecting the second sampling element and so on up to the nth element.The sample interval is usually denoted by the letter k.
The sampling interval or also called sampling ratio is obtained by dividing the population size by the desired sample size (N/n).An example of a calculation for retrieving testing data is as follows: Total From the results above, we obtained testing data for 41 toddler nutritional data, then the rest was used as training data for 165 toddler nutritional data.

Experiments and Model Testing
Model testing uses four variables which randomly take 10% of the training data as testing data.This process is repeated 10 times and the model testing results in the form of accuracy, precision and recall are averaged.This testing process is carried out with a rapid miner in the building blocks used for predictions.
In carrying out this research, experiments and a testing process for the proposed model are required.Experimenting and testing models using parts of existing datasets.All datasets were tested with the proposed method in the Rapid Miner 5 application.The following model is implemented in the Rapid Miner 5 application, namely:  The dataset is trained in Figure 2 using a decision tree so that the performance of the algorithm used can be obtained.Where the gain information is filled in with parameters like Figure 3   The results obtained are then compared to find out which method is the most accurate in the proposed model.The results compared include the attributes used in the model, AUC values, accuracy values, model performance (f-measure, precision, recall).

A. Classification of Naïve Bayes Algorithms
The use of the Naïve Bayes algorithm using training data in table 3 begins by calculating prior probabilities to find out which values are accepted and which are not accepted for all amounts of data.In the training data, the total number of data is 165 toddlers, of which the number of well-nourished children is 112, the number of under-nourished children is 23, the number of over-nourished children is 22 and the number of malnourished children is 8.The following are the results of the prior probability calculations shown in table 4.
Table 4. Prior Probability Calculation To determine whether new cases include good nutrition, undernutrition, poor nutrition or overnutrition, posterior probability calculations are carried out based on the prior probabilities that have been previously calculated in table 4. Calculation of posterior probability to determine testing data including which classification is in table 5.For example, take data testing Table 5. Calculations to Determine Classification of X Testing Data = P(X| remark = GBa) P(remark = GLe) = 0.000115954 x 0.133333333 = 1.546060000From the results of the calculations above, the values obtained for P(X|Ci) and P(PX|Ci) P(Ci) are greater for remark = GBa so it can be concluded that the testing data includes the GBa classification.

A. Evaluation and Validation
As in Figure III-1 for the research steps in chapter III, from both the C4.5 and Naïve Bayes algorithms, data validity was tested with training data.Validity testing will be carried out using the Confusion Matrix.

DISCUSSION
The results of calculations using the RapidMiner application found that the Naïve Bayes algorithm had a data accuracy value of 70.33%, which was a training data calculation consisting of 165 data records.

Figure 1 .
Figure 1.Proposed Model in Rapid Miner 5

Figure 2 .
Figure 2. Cross validation method in Rapid Miner 5 below:

Figure 3 .
Figure 3. Parameters for the Decision Tree Process

From
this Table There are Several Steps to Calculate, Namely: a. P(X|Ci) = P(X)|remark = GBa) = 0.696629213 x 0.789473684 x 0.882352941 x 0

Figure 4 .
Figure 4.Text View Confusion Matrix Model for the Naïve Bayes Algorithm

Table 1 .
. Attributes and Values B. Data Collection Methods1.Literature Study Data collection is theoretical in nature related to this research.2. Secondary Data Data collection in this research used field research, carried out by taking data from 2 Community Health Centers through the Alor District Health Service, namely nutritional data for toddlers, in the form of photocopies.

Table 6 .
Confusion Matrix Table Model for the Naïve Bayes Algorithm