The validation of The Arabic version of DBQ comprised two stages. The first stage included the translation and cross-cultural adaptation of the DBQ into Arabic and the second stage focused on testing the psychometric properties of the Arabic version of the DBQ among Lebanese drivers.
Adaptation of DBQ
Translation and back-translation:
The original 50-items version of DBQ was meticulously translated through a process of forward-backward translation including translation synthesis [27] (Figure 1). First, two independent bilingual translators whose mother tongue is Arabic and who were also proficient in English translated the English version of DBQ to Arabic. One of the translators was a road safety specialist who was aware of the DBQ concept. The second translator was selected independently from the langue department at the Lebanese university and was unaware of the concept. Any discrepancy found between the two translators was considered and discussed, then the two translated versions were synthesized into one. At that point, the initial |Arabic translated version was back-translated into English by two independent translators whose native language was English and who were also fluent in Arabic. Of note, these translators didn’t have a behavioral science or road safety background and they don’t have access to the original version of the DBQ as well.
A committee of experts which included a road safety specialist, a linguistic professional, a psychologist, and the principal investigator was composed to verify the clarity and the suitability of the items in terms of wording, and specific features of the Lebanese context. Inquiries raised in committee meetings were communicated to the forward and back translators. The two versions were reviewed and compared by the experts at the end of this procedure, , to discuss any discrepancy or inconsistency in the previous stage of translation. Any change made to the translated version was resolved by consensus. The latter was reached by removing five items and incorporating its contents in other items to avoid redundancy. The removed items were as follows: The following items were removed: (1) become impatient with a slow driver in the outer lane and overtake on the inside,(2) On turning left, nearly hit a cyclist who has come up on your inside, (3) Fail to notice someone stepping out from behind a bus or parked vehicle until it is nearly too late, (4) overtake a slow-moving vehicle on the inside lane or hard shoulder of a motorwayand (5) fail to notice pedestrians crossing when turning into a side street from the main road. The rest of the modifications suggested by the review committee are presented in Annex A1.
Translators were responsible to address these changes and updating the translated version once those variations were communicated to them.
The pre-final version of the translated DBQ-L was piloted on a small sample of 35 drivers to evaluate the questionnaire’s comprehensibility and provide final input on its language. After completing the questionnaire, each participant was asked to elaborate on what they thought about each questionnaire item and what their corresponding response meant. Based on their feedback, minor revisions were made. This included changing the confusing words and potentially misleading items in particular those related to technical terms not commonly used by the Lebanese drivers. Hence, they were changed to the slang language while conserving their meanings and ensuring that they can still feed into the relevant scale. Finally, the Arabic version of DBQ was produced and ready for psychometric testing.
Psychometric testing
Participants and sampling
As part of a large project exploring driver behavior in Lebanon, a cross-sectional study was carried out using a convenience sampling technique among Lebanese drivers aged 18 years old and above from all Lebanese provinces over the period extending from October to December 2019. To ensure the representativeness of the sample in terms of age, gender, and location and to align the sample distribution with the population for those variables, a weighting procedure was used following an iterative proportional fitting. This procedure involved setting predetermined target figures across gender and age for each Lebanese governorate (Bekaa, Baalbeck-Hermel, Mount-Lebanon, Beirut, North, Akkar, South, and Nabatyeh).
All Lebanese active drivers having 18 years or over, having a driver's license, driving regularly in Lebanon, Arabic literate, and agreeing to participate were eligible to be part of this study.
This study excluded drivers who are not practicing driving activities currently, illiterate drivers who could not understand the questions, non-Lebanese drivers, and drivers who refused to participate in the study. Rarely driving referred to drivers who drive less than once per month were also excluded. The research protocol was properly evaluated and approved by IPNET. All methods were performed following the relevant guidelines and regulations. The study design assured adequate protection of study participants, and neither included clinical data about patients nor configured itself as a clinical trial. None of the survey’s queries questioned for information that could harm the respondent in any way.
Minimal sample size calculation
According to Comrey and Lee, sample size guidance indicated that five to ten observations for each scale item were found necessary for establishing sufficient evidence of scale validity and reliability [28]. The original version of the DBQ scale consists of 50 items therefore, 500 patients were needed, for the run of the exploratory factor analysis. To increase the validity of the study, we used two different samples: one for the exploratory factor analysis (EFA) and one for the confirmatory factor analysis (CFA). Therefore, considering losses to withdrawal, follow-up, or protocol violation, we set to recruit ≥more than 500 participants for each analysis (exploratory and confirmatory).
Reliability
Internal consistency reliability
The reliability of the DBQ was evaluated using internal consistency which looks to the degree to which every test item measures the same construct. The internal consistency reliability was estimated using Cronbach’s alpha where its value α ≥ 0.70 was considered satisfactory based on the Rule of Thumb.
Test-retest reliability
The–retest reliability measures the degree to which the measurement's results in one sitting are consistent and stable over time (usually over two-time points (T1, T2)) through the correlation between scores from one administration of the DBQ to another. In terms of the sample size required for test-retest reliability, a small sample size is required and a minimal sample size of 22 drivers is considered sufficient given that such a test is commonly conducted during the initial pilot study. Then, 40 drivers were asked to fill out the questionnaire for the second time after almost 3 weeks, which is a relatively short period, to mitigate against conclusions being due to memory bias. Test-retest reliability was evaluated using Pearson correlation ((Pearson’s r) where its value ≥ 0.70 was considered satisfactory for ruling on the correlation between the retest and the initial study. However, to ascertain the magnitude of agreement between the time points rather than the relationship, we calculate the difference between each data point and the mean (mean difference) alongside the standard deviation. The latter allowed us to determine how agreeable the measures are based on how close the data points deviate from the line of equality, 95% of differences were expected to be less than two standard deviations away from the mean.
Validity
Content validity:
Content validity is defined as “the extent to which elements of an assessment tool are relevant and fairly representative to the entire domain the tool seeks to measure [29]. The content validity of the DBQ was determined using the viewpoints of the panel of experts including content experts as well as lay experts. Content experts are professionals who have research experience or work in the field and lay experts who are the potential research subjects (drivers). Of note, using subjects of the target group as expert ensures that the population for whom the instrument is being developed is represented
Therefore, this panel was composed of eight members and included a road safety specialist (two), an epidemiologist (two), drivers (two), a psychologist/behavior specialist (one), and the principal investigator. In this study, content validity was tested using qualitative and quantitative approaches to confirm that each item in the construct was necessary. In the qualitative content validity method, content experts and target groups reviewed the items and suggested recommendations which were adopted on observing grammar, using appropriate and correct words, applying correct and proper order of words in items. In the quantitative content validity method, the experts are requested to specify whether an item is necessary for operating a construct in a set of items or notusing the Lawshe method [30].
. To this end, they are requested to score each item from 1 to 3 with a three-degree range of “not necessary, useful but not essential, essential” respectively. The confidence is maintained in selecting the most important and correct content in an instrument, which is quantified by the content validity ratio (CVR). Then, the content validity ratio (CVR) for each item was calculated using the following formula: .“Ne” referred to the number of experts who rated an item as “relevant.” And N referred to the total number of experts [31]. The content validity ratio varies between 1 and -1. The higher score indicates further agreement of members of the panel on the necessity of an item in an instrument. The acceptable range in the CVR depends on the number of experts on the panel which, in the present study, was based on the judgments of the 8 panelists. According to criterion values provided by Lawshe, CVR is equal to or larger than 0.49 for each item indicating an acceptable level of significance, therefore item was retained The Content Validity Index (CVI) is the mean score of those retained items having CVR≥0.49.
Floor and ceiling effects:
To determine the sensitivity of the DBQ, the bottom (Floor) effects and roof (Ceiling) effects were calculated. Terwee and colleagues considered that 50 participants could be used to adequately assess the floor and ceiling effects [32]. The floor and ceiling effects of more than 15% were considered to be significant [33]. Floor and ceiling effects were defined by calculating the number of respondents who scored the lowest status or the highest status, respectively, on the Arabic-DBQ [34].
Construct and factorial validity:
The two tests of Kaiser-Meyer-Olkin (KMO) and Bartlett were performed to evaluate the adequacy of data for factor analysis. To examine the appropriateness of the factor structure and to increase the validity, we split our data into 2 random samples, one for the exploratory factor analysis (EFA) and one for the confirmatory factor analysis (CFA). A sample of 580 drivers was used for conducting a factor analysis using the principal axis factoring (PAF) method with varimax rotation on retained items from the DBQ through SPSS to identify DBQ-L dimensions. Based on Eigenvalues >1 and scree plot, it was decided on the number of factors. A minimum factor loading of 0.40 per item is required to be included in the factor. A parallel analysis (PA) was performed to determine the number of components or factors to retain from factor analysis. To evaluate the internal consistency of the DBQ, Cronbach’s alpha reliability coefficients were calculated.
Confirmatory Factor Analyses (CFA) and psychometric analyses
The 45-item version of DBQ-L was subjected to confirmatory analysis using SPSS Amos to examine the good fit of the model to the Lebanese drivers and to confirm its factorial structure. Confirmatory factor analyses (CFA) were used to compare items organization of 2 models: (a) exploratory factor model, and (b) Lebanese adapted model. To identify the best fitting model, the following goodness-of-fit statistics [35] were calculated using IBM AMOS 24.0. The structural models were considered as a good fit to the data when Chi-squared value (χ 2 )/ degree of freedom(df) χ2/df < 5[36], normative fit index (NFI)>0.9, goodness of fit (GFI)>0.9, the comparative fit index (CFI)>0.9 and the root mean square residual (RMSEA)<0.08 [37]. In case of a poor fit, modification indexes were observed to identify additional parameters that improved the goodness of fit of the models. Covariances were permitted to be freely estimated and items that loaded 0.40 on two or more factors (cross-loading items) were eliminated in the modified models [38].
Questionnaire development
An anonymous Arabic, self-administrated questionnaire was developed in the Arabic language. It included a brief introduction of the study objectives and instructions on how to fill out the questionnaire as well.
It comprised mainly closed-ended questions and consisted of three main sections: (a) socio-demographic characteristics; (b) traffic-related variables; and (c) the DBQ scale.
The first section collected socio-demographic data of the participants, including gender, age, marital status, profile, education level, and residency.
The second section covered the topic of traffic variables where drivers were asked about their driving experience and their annual mileage. The third section of the questionnaire included the over mentioned adapted Arabic version of DBQ to objectively assess Lebanese aberrant drivers' behaviors. It consisted of 45 items categorized in four domains intended to measure the main forms of self-reported aberrant behaviors. Each item describes a particular aberrant driving behavior. Participants were requested to consider their driving behavior and to report the frequency of engagement in these behaviors on a six-point Likert scale (0= never; 1=hardly ever; 2=occasionally; 3=quite often; 4= frequently; 5=nearly all the time). Items scores were summed, with higher scores indicating more frequent aberrant behavior.
Data collection procedure:
Since no updated official list of drivers is available, potential respondents were recruited from public places such as shopping malls and parking stations via a face-to-face approach. Four well-trained data collectors were responsible for the dissemination of questionnaires. These investigators were students in the traffic major at the Lebanese Higher Technical School collectors. The entire procedure included the administration of the preliminary questions to assess first the eligibility of the participant based on the preset inclusion criteria. Before participation, study objectives and general instructions were delivered orally by data collectors to the eligible participants. Written informed consent was obtained from eligible respondents who were willing to complete a questionnaire. No reward was given to the drivers for their involvement in the study which was entirely voluntary. Drivers were also free to withdraw from the study at any time. Since the study was observational, participants' anonymity and confidentiality were respected. The completion of the questionnaire took around 10–15 minutes.
Statistical analysis
The collected data was entered and analyzed using the statistical software SPSS (Statistical Package for Social Sciences), version 24.0. Since missing data constituted < 10% of the total database, then it was not substituted. Before analysis, the distribution of each DBQ item was checked for normality. Descriptive analyses were done using counts and percentages for categorical variables and mean and standard deviation for continuous measures. A bivariate analysis was conducted using the ANOVA test to compare the means of DBQ subscales and the categorical variables. Pearson correlation was used for linear correlation between continuous variables. All variables that showed a p < 0.2 in the bivariate analysis were included in the model as independent variables. Four stepwise linear regressions were performed taking the DBQ subscales as the dependent variable and the sociodemographic and traffic variables as dependent measures. To perform linear regression, the aforementioned variables had to be modified and the number of categories was reduced (categories were merged in cases where there was no significant difference). Significance was set at p < 0.05.