In crops, the growing tip and the roots where cell division occurs are sensitive to the surrounding environment. In particular, the hypertrophy of early reproductive growth of crops can be determined from the state of the growing truss [1], which can also affect the quality of flowers and fruits. Although experts can determine hypertrophy with the naked eye, it makes the collection of accurate numerical data difficult, and realizes various disadvantages in setting the crop management standards. While studies are being actively conducted on analyzing crop diseases using digital imaging studies conducted on tomato crops, not many have measured the indicators related to tomato growth. the case of the growing truss, it is difficult to collect numerical information from the information obtained from the image to determine a label value considering the lack of video images for reference.
The development of a future-oriented agricultural robot platform is expected to reduce the challenges in acquiring image data comprising growth information. Singh et al. (2020) developed a mechanical robot arm with a high degree of freedom and an intelligent control unit that moves the arm by judging the captured image. However, research on recognizing objects in a diversified view based on images by placing the robot arm in a more advantageous position is currently underway [2, 3]. Chang et al., 2015 reported the use of image processing techniques such as color space transformations, morphological operations, and 3D localization to identify objects and grippers in captured images and estimate their relative positions using the computer vision area as the novel algorithm for extracting the object before determining the movement of the robot arm. In agriculture, measuring the growth using computer vision has been in progress for a relatively long time [5, 6]. In particular, robots are used in harvesting, and various image processing techniques have been applied to extract fruit and determine the ripeness [7, 8]. Zhuang et al. (2019) proposed a computer-vision-based method for locating acceptable picking points for litchi clusters, and the image processing algorithm was used to track the location of the fruit while considering the agronomic characteristics of the picking point.
Although it is necessary to apply image processing techniques by identifying the characteristics of the crop, no image segmentation method has been developed to determine the growing truss in tomato plants. However, the segmentation of tomato stem and leaves were not able to distinguish the overlap of other surrounding objects in the RGB images. Because a tomato cultivation environment inside a greenhouse is dense, classifying stems or leaves using images is difficult [10, 11]. As a related study, Xiang, (2018) performed crop segmentation through a simplified pulse coupled neural network by measuring 385 tomato images at night time. The best results obtained through the segmentation technique confirmed that the best and false rates are 59.22% and 13.77%, respectively. However, there was a limitation in that it could be performed through a specific light at night for light correction, which would require more mechanical devices and technical improvements to measure the growing truss of tomatoes. Zhang and Xu, (2018) reported method for improving the accuracy of image segmentation in the middle stage and late stage of the fruit growth by using unsupervised method. However, the segmentation of tomato stem and leaf did not distinguish the overlap of other surrounding objects, so it did not show the possibility in RGB images. Many studies have been conducted on the fruit of the main target for identification of tomatoes using RGB images, and there are many reported results about the possibility in tomato cultivation, but segmentation studies on tomato stems at growing points have yet to be successfully reported.
In order to solve this problem, there is a potential possibility to use a 3D camera capable of segmentation according to distance or image processing techniques that are affected by solar light in greenhouse. 3D depth cameras are widely used in image acquisition platforms for recognizing objects in various industries, including agriculture [14–16]. It has been reported that a technology that combines depth and color image information through a stereo camera, one of the 3D camera technologies, can be presented, and segmentation of objects can be performed on real images recorded with a stereo camera [17, 18]. Unlike conventional 2D cameras, 3D depth cameras can be distributed to the field and used to calculate the depth value of each pixel in an image, whereas research on growth measurement is underway to determining growth measurement using 3D cameras.
Deep learning image processing technology has advanced in recent years. For instance, in image recognition and classification, studies using convolutional neural networks (CNN) are effectively applied to various industrial fields [19–22]. The use of Mask-RCNN, which recognizes objects at high speed and is specialized for segmentation, is expected. As a related study, Afonso et al., (2020) used Mask-RCNN for tomato fruit recognition and confirmed its potential inside a greenhouse. The structure of such a CNN has the form of general supervised learning, and annotation of the region of interest (ROI) is required in all image data, and the accuracy of the model is contributed to some extent by the quantity and quality of the data obtained. Therefore, it is important not only to develop a robot platform to extract accurate images in an automated greenhouse, but also to apply an algorithm that can self-learning with an appropriate number of images.
Generative advertising networks (GANs) have particularly gained wide attention [24, 25]. The basic GAN configuration comprises a deep learning technology that learns the delimiter and generator model simultaneously to obtain the target image from the generator, showing endless possibilities in unsupervised learning. The recently devised CycleGAN has been trained to avoid switching between images by learning two unpaired images by circulating the two generators and identifiers [26, 27]. A representative application example of the CycleGAN is a study wherein the zebra pattern was converted to that of an ordinary horse. Researches have reported (28) that this technology is capable of switching the patterns of two images, that is, a photo with depth information and a general image with RGB information. Furthermore, unlike other CNN algorithms, CycleGAN is a learning process while generating images by self-learning, and the number of labeled image data required is relatively small. This is expected to enable efficient algorithm application using relatively little data in environments where data acquisition is difficult, such as in a greenhouse environment.
Considering these points, we can say that the current research lacks detection technology for determining the tomato growing point, and it is necessary to systematically secure the related study. For image acquisition using an unmanned robot, extraction of the tomato growing truss must be performed on-site, which requires segmentation using depth image information. Therefore, the specific objectives of this study are :
1) To identify the tomato growing truss image, a monitoring system that can measure the height of the growing truss was built based on a robot, and based on this, the RGB image and the depth image of the growing truss were secured.
2) Using the obtained image, a conversion model between RGB and Depth is created through CycleGan, and this is combined with image processing techniques to segment the growing truss of tomatoes growing in the greenhouse without overlapping.