Pavement condition assessment is a fundamental task in road maintenance programs. Budget restrictions and high maintenance costs produce a need for rationalized efforts for adequate maintenance and reduced life cycle costs (Meneses et al., 2013; Meneses and Ferreira, 2013; Liu et al., 2015; Babashamsi et al., 2016). With the increasing number of vehicles, pavement monitoring has received more attention given the high number of accidents related to poor road conditions (Chen et al., 2020). Such a problem is even more noticeable in developing countries, where financial resources for road maintenance are insufficient (Hoang, 2018).
Among all pavement distresses, potholes are one of the most common and dangerous (Omanovic et al., 2013; Akula et al., 2019). They are defined as bowl-shaped depressions in the pavement surface. Aging, heavy traffic, poor drainage, thin asphalt layer, and weak substructure are frequent causes of potholes (Ryu et al., 2015). In maintenance programs, this distress is generally manually detected by inspectors of agencies during periodical surveys (Hoang, 2018). Although such procedures may lead to trustworthy results, the task is labor-intensive and time-consuming. In recent years, many researchers have developed automatic pothole detection procedures (Omanovic et al., 2013; Harikrishnan and Gopi, 2017; Ye et al., 2019; Anand et al., 2018; Varona et al., 2019; Chen et al., 2020; Majidifard et al., 2020). This is also ongoing in many places in Brazil (Branco and Segantine, 2015; Espindola et al., 2021; Serafim et al., 2022; Antunes and Dantas, 2023), particularly in a partnership between Academia and the Court of Audits of the Ceará State (TCE, 2023).
Pothole detection methods can be grouped into three main categories: (i) vibration-based methods (Harikrishnan and Gopi, 2017; Varona et al., 2019); (ii) 3D reconstruction methods; and (iii) image-based methods (Anand et al., 2018; Ye et al., 2019; Chen et al., 2020). Vibration-based strategies rely on the use of accelerometers on the vehicle to identify potholes using the vibration signatures generated when passing over a pothole. Although this method has a reduced cost and is simple to implement, detecting pothole signatures may be challenging due to a large variety of surface shapes (Omanovic et al., 2013). 3D reconstruction (Li et al., 2009; Zhang et al., 2014) methods use laser scanners to construct accurate digital models of pavement surfaces. The application of this method is limited by the high cost of laser scanners and also the high computational cost required to process the large amount of points collected by the scanner. Finally, in the image-based approach, one uses images collected by regular cameras, which are analyzed with image processing techniques. Since image-based methods exhibit a good balance between cost and accuracy, such approach is followed by most of the recent works (Lekshmipathy and Velayudhan, 2020).
Image-based pothole detection is challenging because potholes have various shapes and other objects (shadows, horizontal signaling objects etc.) may be misclassified as a pothole under complex real-world conditions (Ye et al., 2019). Hence, over the years, several researchers developed works using various methods for pothole detection. The first papers on this topic mainly rely on the use of image processing techniques to extract features from raw images. This process is highly influenced by the knowledge of a specialist and the features are often related to image texture and color. Since the process is specialist-dependent, the feature design is not fully automated and is usually termed as a handcrafted procedure. After that, classifiers such as Neural Networks (NN) and Support Vector Machines (SVM) are used to classify pothole image segments (Koch and Brilakis, 2011; Koch et al., 2013; Omanovic et al., 2013; Ryu et al., 2015). More recently, researchers have focused on the use of Deep Neural Networks (DNN) (Anand et al., 2018; Suong and Jangwoo, 2018; Ye et al., 2019; Chen et al., 2020), which stack several layers of artificial neurons to enable the learning of complex patterns from data. DNNs arise as promising methods for the task of learning from images, as more data become available along with computational power with a fair cost. Apart from the remarkable accuracy, one important advantage of DNNs is that no separate feature extraction step is required. Thus, the process is not specialist-dependent, but fully automated.
Although much work has already been done in image-based pothole detection, two important aspects were neglected in previous works. First, available literature tends to focus on locating potholes within an image of a pavement surface. For road maintenance programs this task is unnecessarily complex. A decision-maker need primarily the information that an image contains a pothole and not the precise number of potholes and their respective positions on the image. Such modification of the original problem may not only align goals from researchers and practitioners, but also turn the problem into an easier machine learning task. The second aspect is related to the use of pre-trained DNNs. In such a strategy, a DNN is trained with large amounts of data for a particularly difficult image-based task. Then, given another image recognition task, one shall use several trained intermediate (hidden) layers of the aforementioned DNN and only adapt the final layers according to the new task at hand. That approach, named deep transfer learning, has led to remarkable performance in many areas, with the benefit of requiring a reduced number of training samples (Zoph et al., 2016; Kaya et al., 2017; Yang et al., 2018; Kim et al., 2020). It should also be emphasized that proper training of DNNs would require millions of labeled images, not often available for pavement applications. In such cases, the use of transfer learning (hidden layers of pre-trained neural models are used) is common, where only the parameters of the final layers are adjusted for the new task. Furthermore, data augmentation techniques can be applied to enhance the neural network's generalization capabilities. Alterations in backgrounds and textures are recognized to complicate the classification process, particularly when dealing with roads (Mikolajczyk and Grochowski, 2018; Chen et al., 2022).
This paper aims to evaluate several pre-trained DNN architectures in the detection of images with potholes and select best strategies for automated detection. The images were collected in a real-world scenario, where a camera is mounted in a car that performed a survey on roads of a Brazilian state. All images were manually labeled by two independent specialists. On the basis of our computational experiments, we can state that most pre-trained DNNs have achieved promising results, even though only 360 labeled images were used for training. The ResNet101V2 was the one that performed better, with a test accuracy of 89.44%.
The remaining of this paper is organized as follows. Section 2 presents the related work. Section 3 shows some basic concepts of deep transfer learning. The proposed methodology is presented in Section 4 and the empirical results are shown in Section 5. Section 6 contains the conclusions and final remarks.