The breathalyzers used in traffic inspection follow technical performance requirements, with the aim of minimizing measurement errors when subjected to severe conditions of use, such as high and low temperatures and variations in relative humidity. The electromagnetic compatibility requirements are also assessed to establish that external emissions will not affect the measurement.
In terms of the technology employed, the primary ones are: infrared spectrometry (IR) and fuel cells. While the first is considered the most accurate, it is also the most expensive [10], which leads users to opt for the second on a larger scale. Brazil has approved models for both these technologies [11]; however, the instruments used are predominantly based on fuel cells.
The fuel cell construction scheme is briefly shown in Figure 1. The main components involved in the measurements can be clearly observed.
The measurement cycle begins by the sampling of a predetermined volume of exhaled air. The sensors control the temperature and flow, making necessary microprocessor-controlled adjustments. Subsequently, the alcohol present in the sample is electrically oxidized, releasing electrons. Thereafter, the signals are converted into concentration values. The instrument does not directly measure the concentration of alcohol in the blood; rather it is measured through an existing physical relationship with exhaled air, which is approximately 2,100:1 (blood/exhaled air ratio) [10,12].
Main flaws found in breathalyzers
Knowing the construction of a breathalyzer, it can be inferred that failures in any of its main components can render measurements unfeasible or, in the worst case, cause incorrect readings.
The microprocessor monitors the entire measurement cycle, and the parameters are predetermined to enable this cycle. Further, faults in the temperature, flow, and pressure sensors are usually detected, and alarms are issued such that the user can repeat the operation and complete the measurement. However, the fuel cell generates a current caused by the oxidation of ethanol on its surface. As these cells age, or owing to other environmental factors [13], the process becomes less efficient, causing inaccurate readings that are not detected by the microprocessor. Therefore, this component is critical for the measurement process.
Data generated by repair and maintenance workshops (Figure 2) for breathalyzers used in Brazil showed that of the six main components, the fuel cell (or electrochemical) had the highest incidence of repair (40.2%) [14]. This was followed by the flow sensor (26.51%), which interrupts the reading if there is an interruption in the flow of the expired air. Next is the temperature sensor (16.96%), which ensures that the instrument does not operate outside the range specified by the manufacturer.
A temporal analysis revealed that, over the years, the incidence of repairs in fuel cells increased, becoming more evident in relation to the other components (Figure 3). The increase in fuel cell failures means an increase in the imprecision of the results generated by these instruments.
These failures imply legal uncertainties. They can serve as a basis for lawsuits brought against the state and cancel fines generated by instruments whose verification after the assessment failed. However, simply nullifying the generated results between checks is not the best way to solve this problem, as it may result in impunity for drivers who are actually driving under the influence of alcohol.
This scenario is most likely because errors found in verifications conducted by State Metrological Bodies [15] indicated negative values. In other words, when a reference solution was inserted to verify its accuracy, the instrument displayed lower values. (Figure 4). This suggests that the likelihood of a driver being unfairly fined is minimal. However, a large number of readings below the true value reflects the release of drivers who can cause traffic accidents sue to driving under the influence of alcohol.
Thus, solutions that increase the reliability of metrological control of breathalyzers must be determined, as ignoring the occurrence of failures between checks can lead to unfair decisions, whereas, voiding all punishments imposed on drivers can increase the occurrence of accidents, due to lack of credibility in the system.
Many studies have been conducted using machine learning models to predict instrument failures. Algorithms can understand behavior and detect anomalies and this trend has also been applied in metrology. The next section discusses examples of studies and implementations.
In metrology, the use of machine learning algorithms was reported by VIOREL-MIHAI [16] to identify the tendency of measurement instruments to fail. To create an algorithm, the authors used initial and subsequent verification data. The proposed methodology involved three steps: statistical control of compliance, establishment of control limits, and subsequent metrological verification. Consequently, the proposed model prevented instruments from presenting errors above the limits established in regulation until the next verification, thereby ensuring user reliability between the verifications performed. The conclusion of this study indicated the possibility of decision-making using the generated data.
Metrology faces many challenges in fulfilling its mission, such as metrological control of road scales. These instruments require the use of a standard weight corresponding to the usual bearing load, thereby rendering process exhausting, owing to the dimensions and transport conditions. To improve the control of these instruments, a study was conducted using a machine learning algorithm with neural networks [17]. Based on verification data collected over two years, the authors created a classification algorithm capable of identifying whether a given instrument would pass the indication error test, resulting in a reliability of 83.3%. Thus, knowing the advance trend of the result, users can take steps to reduce the weighing errors.
In the medical field, systems based on classifiers (Random Forest, Decision Tree, and KN-Neighbors) can reduce costs related to clinical engineering services and adapt maintenance protocols in defibrillators [18]. In this study, the authors used a set of measurement data acquired from periodic inspections, to overcome the challenges of maintenance protocols and increase clinical accuracy and efficiency in the diagnosis and treatment of patients.
In the area of instrument quality control, models based on regression and artificial neural networks (ANN) have been developed to predict measurement errors in instruments and standards [19]. The possibility of characterizing and quantifying measurement errors facilitates the performing of necessary corrections, scheduling the next calibration, or rejecting the instrument in advance. In addition to reducing measurement errors, the use of these algorithms provides savings for laboratories, as the general rule of thumb for a 12-month calibration interval can be extended.
With the increasing use of ML algorithms in measurement and analysis processes, their application in legal metrological control is a matter of time. Adapting the apps will be the next step. An important example is the remote calibration of instruments [20]. Processes that combine intelligent instruments, communication technology, and machine learning can provide better remote calibration and reduce service time and cost.
A strong movement toward the application of artificial intelligence in the metrological control of instruments can be noticed with the cited examples. The possibility of using machine learning in all areas of legal metrology in Brazil is anticipated. In the next section, the application of this tool to the surveillance of breathalyzers used in traffic control is described.
Types of classification models
Machine-learning algorithms used for complex pattern recognition and process automation based on intelligent decisions are being used increasingly. This section highlights the regression and classification problems as subareas that are widely used in the construction of these algorithms.
Regression problems are defined as predictors to estimate numerical values in response; for example, a prediction of property value based on location, area, and the number of rooms and classification algorithms are responsible for predicting categories or classes based on the observation of behavior patterns, such as predicting whether a particular patient has a disease. The decision tree and linear classifier models are among the most commonly used methods.
The decision tree is similar to a flowchart. The most important input variable is considered to be the root node, and the other variables, as well as the labels, are considered as leaf nodes. The flow continues as functions run on each node. Following a pattern, it passes through all input variables and finds the best route to generate an answer (output). For these types of models, categorical data need not be converted to numeric data, which renders them a good choice for many classification problems.
An example of a decision tree model applied in metrology is the diagnosis of faults in wind turbine gears [21] and electronic equipment [22], as well as in the detection and classification of power quality disturbances [23]. Owing to their popularity, applications can be developed with several adaptations, including random forest and gradient-boosting algorithms that use clustering techniques to optimize the model.
Linear classifiers, classify data according to their characteristics within a space, through a line, plane, or hyperplane. Commonly used examples of these types of classifiers include Logistic Regression, Vector Machine Support, and Naive Bayes. In addition, in virtual metrology, they can be used for product quality control [24] and the detection and diagnosis of multiple failures in industrial processes [25.
In the implementation of this work, the results of certain classification techniques are compared, and according to the metrics, the best model for the problem addressed here is chosen.
Similar to the works cited above, this study presented a way to solve metrology gaps using machine learning techniques, defining trends in measurement instrument behavior, and increasing the reliability of the results.
Data collection and pre-processing
Integrating metrological control and artificial intelligence is a recent practice; however, existing databases were not created for this purpose, which renders it unfriendly and difficult to extract data for processing and analysis.
The two main metrological control data storage systems in Brazil: Integrated Management System (SGI) and Inmetro Services Portal in the States (PSIE). The first was conceived to standardize the processes and procedures of Inmetro's Brazilian Metrology and Quality Network (RBMLQ-I) [26]. Whereas, the second was built to increase the reliability of the repaired measuring instruments because this service is performed by private entities [27].
For both, access is restricted to authorized persons; therefore, an external user cannot obtain such information. Once the locations for data extraction were identified, data were collected. Considering that each breathalyzer was checked, on average, every 12 months, a relatively long period to capture information about the instrument's behavior and its drift must be established. Thus, the period studied comprised five years (2016–2020).
Sampling and preprocessing are decisive steps for the quality of the analysis. The raw data collected included both categorical and numerical variables (Table 1). Among the categories, information on the types of models, reference materials, laboratories responsible for tests, and others was available. Further, for numerical variables, data on the concentrations of the reference solutions, test dates, errors, and standard deviations were found. In addition to these, the 'drift' variable was proposed, as the aim was to explicitly understand the error variation over the years.
The selection of predictor variables is an important step in building a model. Many analysts use feature selection techniques to find the best variables, thus reducing computation time and improving the algorithm performance [28].
The goal was to predict instrument failures before their occurrence, and it is known that these failures are directly related to fuel-cell wear. The following predictive variables were selected: error indications, standard deviation, and deviation for each concentration (Table 2), as they can provide indications of instrument failure. In addition, to reduce noise and standardize the types of inputs, only samples whose tests were performed with reference materials in gas mixtures and performed by a single laboratory were considered. Consequently, although the generalization power of the algorithm was reduced, the reproducibility of the technique was improved. Because this was a supervised learning classification problem, samples were separated and labeled as suitable "0" and inappropriate "1" if they passed or failed the next check, respectively.
Table 1 - Information available in the integrated management system for breathalyzers
Table 2 - Input variables used for breathalyzer classification models
Input Variables
|
Mean Error conc. I
|
Mean Error conc. II
|
Mean Error conc. III
|
Deviation conc. I
|
Deviation conc. II
|
Deviation conc. III
|
Drift I
|
Drift II
|
Drift III
|
A common problem when using supervised learning in classification algorithms is finding unbalanced data. The same was observed for this problem; the samples selected for training had an approximate ratio of 1:6 (samples 1 and 0, respectively) (Figure 5). Training the model with unbalanced data can result in more learning of the characteristics of the majority class, while the minority class may often be misclassified [29]. Among the various existing techniques to eliminate the disproportion of class types, this study combined the Neighbor Cleaning Rule (NLC) and Random Under Sample (RUS), which are responsible for removing outliers and excluding samples from the majority class, respectively. Once the preprocessing phase was concluded, the models were applied, and their metrics were analyzed