2. 1. Image Acquisition
Previously used methods of image acquisition (Femenias et al., 2022; Velesaca et al., 2021), particularly those employing flatbed scanners, have exhibited several drawbacks, including imaging only one side of the grain, uneven or one-sided lighting, and the need for meticulous arrangement of grains on the scanner’s surface to prevent contact or overlap. Industrial devices utilizing linear cameras also fail to capture grains from multiple angles and they produce low-resolution images that are inadequate for qualitative analysis according to applicable standards. Therefore, we propose a device design (Fig. 2) that eliminates these limitations (Dolata and Reiner, 2018; Lampa et al., 2016).
In this device, barley kernels are placed on the surface of a rotating glass disc. A vibratory feeder dispenses the grains individually, ensuring specific distances between them and from the disc's rotational axis. Two Allied Vision Mako G-125C color cameras, equipped with 16mm lenses and a 3mm spacer ring, are positioned above and below the disc surface. When a kernel arrives in location between the cameras, a ring LED lamp flushes, providing uniform and intense white cold light illumination on the kernel. This configuration allows for capturing clear, high-resolution images of both sides of the kernel (Shrestha et al., 2016). Additionally, a third camera with a 25mm lens and 8mm spacer ring is integrated into the system to capture the kernel's side profile. By analyzing the kernel's geometry in these three images, we can estimate its length, width, and height, enabling the evaluation of kernel size uniformity in the sample under examination.
Once the kernels pass through the cameras, they are conveyed to a pneumatic sorting system. Through the use of nozzles, the grains are directed into various containers based on their assigned class. It is crucial to accomplish the kernel image analysis and classification within a short timeframe to ensure the result is available when the grain reaches the sorting system. The assigned class determines the container in which the grain is placed. During the device's development, we addressed several mechanical challenges related to grain transportation, separation, and classification into quality classes. The analysis of images captured from three perspectives facilitated measurements of grain geometry, replacing the conventional approach of using slit sieves, and allowed for an assessment of their technological class. The attained efficiency enabled the evaluation of a 100-gram sample (approximately 2000 grains) in approximately 10 minutes.
One of the technical challenges encountered while using the device was maintaining the quality of the glass surface. Over time, the surface would accumulate dust and develop scratches. To address this issue, a solution was implemented whereby a series of several dozen images of the glass surface are captured immediately after the device is turned on, during the initialization period. These images solely present the surface of the glass disc and do not include any barley kernels. By analyzing the pixel brightness variation across the image sequence, the level of background variability could be assessed, providing an indication of surface contamination and wear. Additionally, the images in the sequence are averaged, resulting in a representative background reference for subsequent image segmentation procedures, then involving barley kernels.
2. 2. Image Preprocessing
Images of kernels captured by the top and bottom cameras undergo processing (Kociołek et al., 2017; Szczypiński and Zapotoczny, 2012a) (Fig. 3) before being passed to the feature extraction procedure or the CNN. Firstly, both images are converted into binary images, which have only two levels of brightness: one indicating the barley kernel area and the other representing the background. Since the kernels appear as relatively bright with comparison to the dark background, a brightness thresholding algorithm is applied to achieve binary image. The threshold value is adjusted locally based on the background reference image. Pixels that are significantly brighter than the background reference image are assigned one level of brightness, while the other pixels are set to the other level.
In addition to the barley kernel regions, the binary images may contain relatively small and separate areas representing dust particles, pieces of awns, or husk fragments. To eliminate these unnecessary areas and slightly smooth the contour of the barley kernel region, mathematical morphology operations such as opening and closing are performed using a small circular-shaped structuring element. If multiple areas still remain in the binary image after these operations, only the largest one is retained. This area defines the region of interest for subsequent processing procedures.
Following the binarization process, both the original and binary images undergo cropping to encompass the region of interest, resulting in reduced-size images that are more suitable for further processing. These cropped images are designated by the letter C. The regions of interest exhibit an elongated shape that narrows on the side corresponding to the kernel's embryo. Analyzing the shape of these regions allows for estimating the elongation direction, approximate axis of symmetry of the kernel, and its orientation relative to the embryo side. This information is then utilized to rotate the image content, aligning the kernel in a vertical orientation with the narrowing pointing upward. The resulting rotated images are indicated by the letter R.
Due to the flattened shape of the kernel, it can rest on the glass surface in two ways: with the dorsal or ventral side facing down. In the first scenario, the crease of the kernel is visible in the image from the top camera, while in the second scenario, it is visible in the image from the bottom camera. In the rotated versions (R) of the images captured by the top and bottom cameras, the brightness gradient is computed horizontally, perpendicular to the approximate axis of symmetry. The absolute value of the gradient calculated for the image displaying the ventral side, which exhibits the crease, is significantly larger compared to the other side. This distinction enables the identification of which image represents the ventral and which represents the dorsal side of the kernel. Finally, the rotated images from the top and bottom cameras are combined into a single image, presenting the ventral view of the kernel on the left-hand side and the dorsal view on the right-hand side. These combined-view images are denoted by the letter B.
The image pre-processing procedures yield three sets of images (C, R, and B) with different levels of pre-processing. This allows us to investigate the hypothesis that by initially processing the images, standardizing the kernel orientation, and identifying the ventral and dorsal sides, we can utilize a less complex CNN architecture while achieving higher classification accuracy.
2. 3. Reference Method
In the previously employed methods for analyzing barley kernel images, a traditional approach (Ramirez-Paredes and Hernandez-Belmonte, 2020; Szczypiński et al., 2015; Szczypiński and Zapotoczny, 2012b; Xu et al., 2021) was utilized, which involved extracting image features, selecting significant features, and applying either an SVM classifier (Chandra and Bedi, 2021) for variety recognition or a cascade of such classifiers for defect recognition. The extracted features included surface texture-related characteristics, color features, and morphological attributes. When using a pair of images capturing both sides of the kernel, feature extraction was performed separately for each side, and subsequently, the features were combined into a single data vector. However, only a small subset of features in this vector exhibited relevance to the recognized classes. As a result, a feature selection procedure was implemented to identify a limited set of features that are highly discriminative based on Fisher's linear discriminant. With this reduced feature set, machine learning processes were conducted using SVM classifiers with polynomial basis functions.
In the case of identifying defected kernels, the implementation of a cascade of binary SVM classifiers yielded good classification results. The first classifier in the cascade was trained to recognize broken kernels, followed by the next one to detect grains with damaged germs. Subsequent classifiers were designed to identify whole and healthy kernels, and the final classifier was trained to recognize moldy ones. Kernels that remained unrecognized by any stage of the classification process were considered to be immature or sprouted.
When classifying new kernels with unknown class membership, only the features with high discrimination ability are extracted. By computing a limited number of features, the computational process is accelerated and becomes more efficient. Finally, the trained SVM classifiers utilizing these features are applied to predict the variety or type of defect.
2. 4. Deep Learning Networks
CNNs have become the most advanced tools for many tasks related to classification and recognition of digital images. As already mentioned, they consist of two main parts: the feature extraction part and the classification part. The feature extraction part contains convolutional (CONV) layers, which use filters to extract features from the input image, and activation layers, which apply non-linear transformations to the features. The most commonly used activation functions (Apicella et al., 2021; Dubey et al., 2022) in CNNs are ReLU, sigmoidal, Elu, and Gelu. The maxpool layer is used to downsample the feature maps to reduce computational complexity and to extract the most salient features. In the classification part, the extracted features are fed into fully connected (FC) layers, which perform the final categorization of the image.
In learning process of the CNN, images of known class affiliation are presented to the network. The network predicts the class of images, and in case of incorrect prediction the weights of the filters and the FC layers are adjusted through error backpropagation. This process evolutionarily decrease the number of differences between the predicted and real class affiliations.
The learning process aims at minimization of a loss function. There are various types of loss functions that can be used depending on the task and type of data. For classification problems, commonly used loss functions include cross-entropy loss and categorical hinge loss. The choice of loss function depends on the specific problem and often requires comparative experiments. A well-chosen loss function enables the network to achieve high classification accuracy. Moreover, there are various optimization algorithms to minimize the loss function such as, stochastic gradient descent (SGD) (Bottou, 2010), Adam (Kingma and Ba, 2014), or Nadam (Ruder, 2016). The correct selection of the algorithm affects both the final optimization result, and thus the classification accuracy.
The initial design of the CNN dedicated for barley kernel analysis (CNN-Barley) was developed based on deep networks that achieved high accuracy in image classification of the ImageNet competition (Mishkin et al., 2017). A constant size of convolutional filters was adopted in subsequent layers, and MaxPool layers were used to reduce size of feature maps. Finally, SoftMax neurons were used in the output of FC layers, which provide probabilities that the analyzed image belongs to certain classes.
During a series of experiments, the performance of different network variants was compared, in which a different number and size of convolutional filter kernels, different types of activation functions, the number of network layers and various optimization algorithms were used and tested. All classifiers were compared in terms of balanced accuracy of classification, which was computed for the validation set images. The use of a separate, validation set of images, not used in the learning process of the network, allows to assess the level of knowledge generalization of the neural network obtained during the learning process and prevents the network from remembering only teaching examples without the ability to properly classify other data – to generalize knowledge.
The results produced by the CNN-Barley were compared to the results obtained with the reference method described earlier. In addition, the performance of the network was also compared with two other CNN networks, ResNet (He et al., 2016) and AlexNet (Krizhevsky et al., 2012). They are two commonly used pre-trained networks for image analysis tasks. AlexNet is a shallow network with 5 CONV layers and 3 FC layers, while ResNet is a deeper network with skip connections to improve performance. In such pre-trained networks the feature extracting layers are already trained to extract features suitable for recognizing various classes of natural images. Thus, instead of training such the networks from scratch the transfer learning is applied. This means, the CONV layers and their weights are transferred to the new network, in which only the FC layers are specifically designed for a specific classification problem and subjected to the learning process.
Summarizing, in this work we present a neural network particularly designed for the analysis of barley kernel images. In this network, the structure was optimized, and it was trained from scratch using exclusively images of barley kernels belonging to different classes. The performance of the developed network was compared with the performance of two other pre-trained CNNs with transfer learning applied, and with the reference method, designed with traditional feature engineering approach, which was previously used to analyze images of barley kernels.