Reasons for disagreement between screening and standard echocardiography in primary care: data from the PROVAR + study

We aimed to evaluate the reasons for disagreement between screening echocardiography (echo), acquired by nonexperts, and standard echo in the Brazilian primary care (PC). Over 20 months, 22 PC workers were trained on simplified handheld (GE VSCAN) echo protocols. Screening groups, consisting of patients aged 17-20, 35-40 and 60-65 years, and patients referred for clinical indications underwent focused echo. Studies were remotelyinterpreted in US and Brazil, and those diagnosed with major or severe HD were referred for standard echoperformed by an expert. Major HD was defined as moderate to severe valve disease, ventriculardysfunction/hypertrophy, pericardial effusion or wall-motion abnormalities. A random sample of exams wasselected for evaluation of variables accounting for disagreement. A sample of 768 patients was analyzed, 651(85%) in the referred group. Quality issues were reported in 5.8%, and the random Kappa for major HD between screening and standard echo was 0.51. The most frequent reasons for disagreement were: overestimation of mitral regurgitation (MR) (17.9%, N=138), left ventricular (LV) dysfunction (15.7%, N=121), aortic regurgitation (AR) (15.2%, N=117), LV hypertrophy (13.5%, N=104) and tricuspid regurgitation (12.7%, N=98). Misdiagnosis of mitral and aortic morphological abnormalities was observed in 12.4% and 3.0%, and underestimation of AR and MR occurred in 4.6% and 11.1%. Among 257 patients with suspected mild/moderate MR, 129 were reclassified to normal. In conclusion, although screening echo with task-shifting in PC is a promising tool in low-income areas, estimation of valve regurgitation and LV function and size account for considerable disagreement with standard exams.


Introduction
Cardiovascular diseases are the leading causes of mortality in Brazil, accounting for, roughly, 27% of all deaths, with a strict association between disease burden and socioeconomic index [1]. In this context, heart disease (HD) exerts high morbidity and mortality, especially in low-income areas, where underserved populations have little or no access to specialized care, leading to unfavorable outcomes and greater financial costs to the health system. It was estimated that over 17 billion USD were spent in management of HD in the country in 2015 [2]. Early diagnosis is of utmost importance to reduce the burden of disease, and risk stratification for prioritization are key tools to optimize allocation of health resources.
Echocardiography (echo) is the primary diagnostic tool for many structural heart diseases and it is being rapidly implemented in primary care (PC) [3][4][5]. Screening echo performed by non-experts with remote interpretation seems to be a useful strategy to increase access to specialized care in underprivileged locations, promoting earlier diagnosis and contributing to improve timely treatment and reduce costs [6]. Integrated in the Brazilian Public Health System, backed by the Family Health Strategy and with robust telemedicine support, this effort could potentially lead to more positive outcomes and, ultimately, to a healthier population.
Based on this, a group of healthcare workers were trained to perform screening echo with handheld ultrasound (HUD) devices in PC centers. However, the performance of simplified 7-view screening protocols aimed at fast-track diagnosis, compared to standard echocardiography is yet to be defined. For specific diseases, as rheumatic heart disease (RHD), this approach has shown to be more sensitive than specific, considerably overcalling variables as mitral regurgitation. In this study, conducted by the PROVAR+ (Rheumatic Valve Disease Screening Program and Other Cardiovascular Diseases) team, we aimed to evaluate the reasons for disagreement in results between screening echo, acquired by non-experts and interpreted by telemedicine, and standard echo -performed by specialists -in the Brazilian PC system.

Methods
Data analytic methods and study materials will be made available to other researchers for purposes of reproducing the results or replicating the procedure, from the corresponding author upon reasonable request. The PROVAR + study is a continuation of the RHD screening program established in 2014, as a collaboration between the Universidade Federal de Minas Gerais, Telehealth Network of Minas Gerais [7] and the Children's National Health System, Washington, DC, USA. This sub study took place in three cities of the southeastern state of Minas Gerais, Brazil: Nova Lima (central area, 87.4 thousand inhabitants), Sabará (central area, X inhabitants) and Lagoa Santa (central area, 361.9 thousand inhabitants). Ethics approval was obtained from the institutional review boards of the institutions and from the local Boards of Health.
In the implementation phase, 22 healthcare workers (4 physicians, 6 nurses, and 12 technicians) at 20 PC centers -selected by health authorities based on socioeconomic variables and vulnerability -were trained on simplified screening echocardiography, with the utilization of online educational modules http://www.wiredhealthresources. net/EchoProject/index.html and at least 32 h of hands-on training with handheld machines. Screening was also supported in the PC centers by 2 previously trained non-physicians (research nurse and technician) with over 3-years expertise ( Fig. 1 shows the operational flowchart of the PROVAR + study, for imaging acquisition, file storage and remote reading). Any patient presenting to PC who met the Fig. 1 Operational flowchart of the PROVAR + study, with imaging acquisition by non-physicians, upload for storage and interpretation in dedicated cloud computing systems, for remote collaborative reading inclusion criteria was also informed of the program, and informed consents were actively collected previously to any study procedures.
Patients were enrolled according to the inclusion criteria: (a) screening (SC) population: all asymptomatic patients in 3 age groups (17-20, 35-40 and 60-65 years-old), without previously known significant HD or indication or referral for regular echocardiography; (b) referred (RF) population: patients in the public system's waiting list for an echo or a cardiology appointment. Consented patients underwent a standardized clinical and sociodemographic questionnaire, prior to the simplified 7-view screening echo focused on mitral, aortic and tricuspid valves, left and right ventricular morphology and function and pericardial effusion (Appendix Table 1) utilizing handheld devices (VScan® and VScan Extend®). The ASE diagnostic criteria [8] were applied, modified for handheld without spectral Doppler. Objective and subjective variables were recorded and delivered to the PC centers in simplified reports. Major HD was defined as moderate to severe valve disease (regurgitation or stenosis), ventricular dysfunction/hypertrophy, congenital heart disease, pericardial effusion or any wall-motion abnormalities, based on the ASE-REWARD study criteria [9] (Appendix Table 1). Severe HD was defined as the most severe presentations of the abnormalities considered as major HD.
A locally developed could system (SigTel®, Universidade Federal de Minas Gerais, Belo Horizonte -MG, Brazil) was utilized for file storage and reporting, and proprietary offline software for VScan® files were utilized for download and telemedicine interpretation in Brazil and the US [10]. Screen-positive cases were reviewed by two experts (Brazil: MN and US: CS and AT) and discrepancies were consensually solved. All reports were made available online for the PC centers. A standard echocardiogram was scheduled for all patients with confirmed significant abnormalities in the final conclusion and at the discretion of the expert reader for those not fulfilling these criteria -provided in the PC center within 60 days by experts from UFMG with portable machines: (Vivid IQ®, and Vivid Q®, GE Healthcare) -as well as referral to the university hospital or local cardiology facilities for follow-up. Continuing care was left to the discretion of the attending cardiologist.
A sample of exams extracted from the database, consisting of pairs of screening and standard echocardiograms with concordant and discordant conclusions (with or without major HD), was randomly selected for the assessment of variables accounting for disagreement. The aim was to build a balanced dataset, not to look at the degree of disagreement, but to assess the main reasons for disagreement, when present. The variables of interest for the analysis of agreement were: mitral regurgitation, signs of mitral stenosis (in the absence of spectral Doppler), mitral abnormal morphological findings, left ventricular (LV) dysfunction and hypertrophy, LV segmental wall motion abnormalities, aortic regurgitation, signs of aortic stenosis and tricuspid regurgitation.

Statistical analysis
Data were entered to the SigTel® system and exported to the RedCap® database [11]. Statistical analysis was performed using SPSS® software version 23.0 for Mac OSX (SPSS Inc., Chicago, Illinois). As this was an exploratory study, no pre-specified sample size calculation was performed, and we considered a random sample of 750 screening exams acquired in the primary care setting in 20 months. As the objective of the study was to evaluate the main findings accounting for disagreement between screening and standard echocardiography, only cases in which standard echo was indicated (for clinical reasons or based on screening results) were randomly selected, regardless of the group (SC or RF). Categorical variables, expressed as numbers and percentages, were compared between groups (SC and RF groups) using Fisher's exact test, whereas continuous data, expressed as mean ± SD or median or Q1/Q3 (25%/75%), were compared using Student's unpaired t-test or the Mann-Whitney U test, as appropriate. Kappa (κ) statistics was utilized for the assessment of the overall agreement between screening and standard echo for each of the echo, in comparison with screening: rates were separately reported for overestimation, underestimation and agreement, for each finding. For comparisons, a p-value < 0,05 was considered statistically significant.

Results
From 2,237 patients screened in the selected primary care centers in 20 months, a random sample of 768 patients with paired screening and standard echos was analyzed, being 651 (85%) from the higher-risk RF group. Mean age was 58 ± 15 years, 62.5% were women; 68.4% had hypertension and 23% diabetes ( Table 1). In the preliminary analysis of imaging quality by the expert readers, interpretation issues were reported in 5.8%. The random Kappa statistic for the presence of major and severe HD between screening and standard echo, as the final diagnosis, was 0.51 and 0.45, reflecting an ideal random selection. The prevalence of cardiovascular risk factors and symptoms was significantly higher in the RF group, whereas the presence of major HD, in both screening and standard echocardiography were more frequent in the SC group ( Table 1). The main variables accounting for disagreement between screening and standard echos are depicted in Table 2. Considering the total sample of exams, the most frequent reasons for disagreement were: overestimation of mitral regurgitation (MR) (17.9%, N = 138), left ventricular (LV) dysfunction (15.7%, N = 121), aortic regurgitation (AR) (15.2%, N = 117), LV hypertrophy (13.5%, N = 104) and tricuspid regurgitation (12.7%, N = 98) ( Table 2; Fig. 2). Considering the individual echocardiographic variables, agreement in the grading of abnormal findings was especially low for severe mitral regurgitation (10%, in which most of the cases variables of interest. Disagreement was calculated as the percentage (N, %) of patients, in relation to the total sample, in which a certain abnormality was observed in standard

Discussion
Our data, from a random sample of patients screened in the PC setting, showed that screening echo may be a valid tool to evaluate presence of HD in low-income areas, especially to rule out significant disease, but there were considerable variables accounting for the disagreement between screening echo performed by non-expert examiners and expertperformed standard echo. Overestimation of valve disease -especially regurgitation and morphologic findings -and presence of LV dysfunction and hypertrophy were the most common causes of this variance, suggesting that the method may significantly overcall the presence of such abnormalities. However, its utility to rule out significant disease, as a point-of-care prioritization tool, as suggested by preliminary studies [6,12], is supported by our data. A possible explanation for the disagreement between the findings of the two groups might be the dependence on the operator's experience in acquiring the images. The training profile, duration time and number of supervised training sessions may have contributed to suboptimal imaging and disagreement among readers, even though reports were collaboratively done by experienced cardiologists with similar backgrounds. A recent meta-analysis on HUD devices suggested similar disagreements when assessing LV parameters were reclassified to moderate), signs of aortic stenosis (86% of the cases recoded to normal, with spectral Doppler) and mild to moderate LV dysfunction (with a high rate of reclassification to mild/minimal or normal). Detailed agreement data for individual variables are presented in Table 2.
In screening echo, there was also a considerable overestimation of morphological findings in screening, especially related to the mitral valve -in which only 30.6% of these findings were confirmed -and abnormalities compatible with aortic stenosis, confirmed in 14% of the suspected cases -although with the utilization of spectral Doppler in standard fully functional devices. On the other hand, underdiagnosis of morphological abnormalities of the aortic and mitral valves were observed in only 12.4% and 3.0% of the cases ( Table 2; Fig. 2), and underestimation of AR and MR occurred in 4.6% and 11.1% of the exams, respectively. For the presence of findings suggestive of RHD, among the 35 positive cases in screening, 14 (40%) were confirmed by standard echo, whereas only 6 RHD cases not reported in screening were identified exclusively in standard exams. Examples of cases included in this study, with agreement and disagreement between screening and standard echos are depicted in Fig. 3 and Supplement videos 1 to 8. Examples of handheld (top) and standard (bottom) echocardiographic images with agreement between methods for: A: mild mitral regurgitation; B: severe mitral regurgitation; C: mitral stenosis, with morphological findings (leaflet thickening and calcification), and dis-agreement for the presence of D: moderate-to-severe mitral regurgitation, suggested by screening echo but reclassified to mild in the standard exam screening in PC for over 2 years, and the included sample reflected the learning curve, which developed with heterogeneous populations, consisting mostly of children been screened for RHD in the early phases of the program [6,22]. And, specifically for RHD, overestimation of findings -as shown by our data -is a common issue, however more relevant for subclinical disease [22].
To help data acquisition and analysis, new artificial intelligence techniques are being implemented in machinelearning systems to enhance the efficacy of interpretation of US acquired images, and also to guide non-experts for optimal image acquisition [23][24][25][26]. These can be useful as a way to increase data quality and reduce variability. On the other hand, new telemedicine solutions are also evolving at a fast pace, allowing for faster and more accurate diagnostics and, also, for greater specialist coverage and access [27][28][29]. But, as HUD develop and screening echo becomes more popular -as the fifth pillar of clinical examinationgreater will be the amount of images and exams the expert will have to review, possibly leading to an overload that could compromise patient care. In this sense, systems for automated flagging of abnormalities and high-risk features may be useful for a more practical risk stratification at bed side, even prior to the final interpretation by experts [25].
In spite of the growing number of screening programs worldwide, mostly linked to research projects, doubts remain about the role echo screening as a public health policy. The aforementioned disadvantage of using nonexperts in terms of accuracy at the point of care has been highlighted in studies in low and middle income countries [30,31]. Moreover, there seems to be considerable heterogeneity between non-experts with different backgrounds (e.g.: nurses vs. technicians), requiring more standardized and long-lasting supervised training, followed by strict quality-assurance procedures [13,32]. Depending on the complexity of included patients, the 32-hour hands-on training protocol applied in our study may be suboptimal. Furthermore, it is hypothesized that population screening may lead to overload of healthcare services as a result of overdiagnosis of HD -as suggested by our preliminary data [6,12,33]. Beyond the impacts on associated costs, this may also impact patients' lives, in terms of stigmatization and stress associated with a possible health condition [34]. Finally, logistics and technological infrastructure are additional challenges for low-resourced regions. Thus, a refinement of HUD-based screening strategies and investigation of optimal training approaches and ideal screening settings are of utmost importance.
Despite the aforementioned doubts related to the effectiveness of HUD for large-scale screening, handheld echo progressively emerges as a promising strategy to assess HD, especially in low-income areas [12,22,35,36]. There is obtained from echo performed by non-experienced examiners using HUD compared to those performed by experienced examiners [13]. Pooled sensitivities in echo performed by experienced examiners, for LV function and structure criteria, were determined to be between 85% and 90%, while non-experienced examiners achieved sensitivities between 68% and 83%. Both groups had specificities of at least 87%, with less disagreement when compared to sensitivity results [13]. These findings align with our observations, showing LV function as a variable accounting for a considerable proportion of disagreement.
The results of the current study can also be explained by technology limitations related to HUD. Portable ultrasound machines are less powerful than standard cart-based systems, allowing less detailed and sometimes unsatisfactory imaging depending on the patient's anatomy, especially in adults and the elderly. This issue is even more relevant when complex presentations of HD are being assessed. Quality reports from our study's trained screening operators suggest that it's more challenging to perform the exams in overweight and obese patients, due to thicker thoracic wall and difficulties for adequate positioning. That requires more physical strength to be applied on the probe, thus causing pain to the patient and muscle fatigue to the examiner. In the long run, it can account for a higher percentage of unsatisfactory images. Although good diagnostic accuracy has been shown when patients had significant valvular disease, overestimation of valve regurgitation is not uncommon in screening echo due to these mobile devices limitations, such as lower resolution and penetration, probe size and markedly lack of spectral Doppler [14].
Other few studies have been conducted in order to assess the feasibility of pocket-sized focused cardiac US by nonexperienced examiners, most of them with limited sample size and in a hospital context [15][16][17][18][19][20]. It is worth noting that nearly all examiners were medical professionals (med students, residents and physicians) and the average examination time was about 5 min. Also, training protocols were considerably briefer. Despite these examinations being performed by medical professionals, interpretation of complex findings was overall difficult. The common conclusion was that training needed to be more extensive in order to improve the quality of image capture by examiners [15][16][17][18][19][20]. Considering this, the disagreements in results between expert and non-expert examiners for certain variables should be interpreted with a broader view of the situation. Screening echo is a cost-effective and accessible tool to assess HD [21] but its optimal application requires several hours of training to achieve a level of proficiency. After theory, trainees should have to perform a sufficient amount of supervised examinations before being allowed to perform solo. In the case of our study, the team of scanners had been performing disagreement of screening echo performed by non-medical examiners in comparison to expert standard echo in Latin America, and the findings are potentially useful to guide training and quality-assurance programs on imaging acquisition for non-experts.

Conclusion
Even though screening echo is a useful tool for early diagnosis of HD in PC, overestimation of valve regurgitation and LV dysfunction and enlargement account for considerable disagreement with standard exams. Training and qualityassurance strategies must be implemented to improve quality and to achieve consistent high standard results, not only for optimal image acquisition, but also for a preliminary interpretation at the point-of-care. In order to meet optimal results with the use of HUD, healthcare workers must be able to discern when the information provided is diagnostic or insufficient and hence requires a more accurate analysis with fully functional devices. Ultraportable machines with higher quality images and advanced features -such as artificial intelligence technology -are in the making and are expected to provide a wider range of use for screening, with better accuracy and less dependence of extensive training.
Funding The PROVAR + investigators would like to thank Edwards Lifesciences Foundation® for supporting and funding the primary care screening program (PROVAR+) in Brazil, General Electric Health-care® for providing echocardiography equipment and WiRed Health Resources for providing online curriculum on heart disease and echocardiography. The Telehealth Network of Minas Gerais was funded by the State Government of Minas Gerais, by its Health Department (Secretaria de Estado da Saúde de Minas Gerais) and FAPEMIG (Fundação de Amparo à Pesquisa de Minas Gerais), and by the Brazilian Government, including the Health Ministry and the Science and Technology Ministry and its research and innovation agencies, CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico) and FINEP (Financiadora de Estudos e Projetos). Dr Ribeiro is supported in part by CNPq (310790/2021-2 and 465518/2014-1) and by FAPEMIG (PPM-00428-17 and RED-00081-16. Dr Nascimento is partially financed CNPq (Bolsa de produtividade em pesquisa, 312382/2019-7), still, however, plenty of room for technological development and research. The identification of the main reasons for disagreement with the gold-standard full echocardiogram may allow for more specific training focused on key features. As the Brazilian government recently approved a freeze in public expenditure in health until the mid 2030s, appropriate investments need to be made in order to lessen the impacts of this financial policy on the country's wellbeing. Therefore, the need for investments, development and improvements of portable echo training programs is evident in order to structure a standardized, replicable program that allows a non-specialist to acquire images with higher success rates and lower patient risk. Investments in technology-based solutions and ultimately artificial intelligence are warranted to optimize results. And finally, the implementation of large scale screening programs, since 2014 [6,12,22] was a starting point to improve the availability of imaging diagnosis for underserved populations, but deep discussions are needed, especially related to federal regulations about imaging acquisition by non-physicians -still not allowed outside research by Brazilian medical councils.

Limitations
This study has some significant limitations. First, the HUD used by examiners weren't the state of the art equipment, for final diagnostic purposes. The lack of spectral Doppler, low image resolution and impossibility of adjusting Doppler color settings may impact interpretation of the images, causing over or underestimation of structural HD. Also, less US power can lead to less penetration in the thorax, making it difficult to acquire satisfactory images in patients who have a large thicker layer of fat in the thoracic wall. Despite these limitations, the ease-of-use, portability and lower price allow them for widespread use in underserved settings -the purpose of screening programs, and the core of this analysis. Second, conducting examinations in some PC centers, most notably those in poorer neighborhoods, at a high screening pace, could be mentally and physically more consuming to the examiner, which may impact image acquisition and consequently interpretation. Third, no stratified sampling procedures were applied, and the program was conducted in a single Brazilian state, what limits the extrapolation of the findings. Lastly, the database of images was not balanced, leading to a low prevalence of some findings and impacting deeper assessments of disagreement factors. As patients were randomly enrolled based on the performance of the 2 echocardiograms, the reported prevalence may not reflect reality, and especially the higher rates of HD observed in the SC group possibly result from selection bias. Despite these limitations, this is, to the authors' knowledge, the most extensive study conducted to appraise factors of