In this section the experiment performed and result achieved were discussed as follows.
CNN mainly consist of three type of layers, where a kernel (or filter) of weights is convolved in order to extract features; Nonlinear layers, which apply an activation function on feature maps in order to enable the modeling of non-linear functions by the network; and pooling layers, which replace a small neighborhood of a feature map with some statistical information about the neighborhood and reduce spatial resolution. Advantage of CNNs is that all the receptive fields in a layer share weights, resulting in a significantly smaller number of parameters than fully-connected neural networks.
E. Training and Testing datasets
In this study the researcher split the data 80 by 20 as a result the researcher can get training 60 percent, evaluation 20 percent, and testing data 20 percent. The researcher has 422 total nutritional deficient Coffee plant leaves image data set, from this data first the researcher split 20 percent for testing which is 84 images and 338 training image data, and further from the remaining training data, the researcher again split 20 percent validation data which is 10 images. So split our data 50 for training, 40 for testing (unseen data during training), and10 for evaluation.
Table 2
Accuracy and loss of training and testing of the three models
| Mobile Net | VGG-Net 16 | Inception-Net-V3 |
Training Accuracy | 0.9911 | 0.6677 | 0.9643 |
Training Loss | 0.0157 | 0.4853 | 0.0889 |
Testing Accuracy | 0.9882 | 0.6471 | 0.8095 |
Testing Loss | 0.0761 | 0.5028 | 0.3820 |
A. Mobile-Net classifier
Mobile-Net is one of CNN architectures that used for image classification and mobile vision. Mobile-net classifier uses depth wise separable convolutions. It significantly reduces the number of parameters when compared to the network with regular convolutions; this is why the research used mobile net.
B. VGG 16
VGG 16 is also one of CNN architectures. It is considered to be one of the excellent visions types of CNN architectures. In VGG16 approach instead of having a large number of hyper-parameter they focused on having convolution layers of 3x3 filter with a stride 1 and always used same padding and max pool layer of 2x2 filter of stride 2. Model architecture Iron = 0.336% Potassium = 0.256% Calcium = 0.226% Boron = 0.184%.
C. Inception V3
Inception V3 is a CNN for assisting in image analysis and objects detection. It is the modification of a Google-net. Third edition of Google's Inception CNN, Inception V3 is a widely-used in image recognition model. It has been shown to attain greater than 78.1% accuracy on the Image-Net dataset. The model itself is made up of symmetric and asymmetric building blocks, including convolutions, average pooling, max pooling, con-cats, dropouts, and fully connected layers. In this experiment potassium, calcium, iron, boron 0.289% 0.47%, 0.206 and 0.154% respectively.