Missing data on ethnicity and socioeconomic status in high-impact medical journals

Reporting participant ethnicity and socioeconomic status (SES) in clinical research is needed for interpretation and to inform discussion around health inequalities. We assessed the frequency of reporting of ethnicity (or ‘race’) and SES indicators in a sample of 100 research articles, in which participant level are reported, published in the 10 highest impact general medical journals in Spring 2021. 35 reported ethnicity and 13 SES, while 99 reported age, and 97 reported sex or gender. Among the articles not reporting ethnicity only 3(5%) highlighted this as a limitation, and only 6(7%) where SES data were missing. Median number of articles reporting ethnicity per journal was 2.5/10 (range 0-9). In conclusion, reporting of research participant ethnicity and socioeconomic status in high-impact medical journals remains poor, and this omission is rarely acknowledged as a limitation. This situation persists despite the well-established importance of this issue. Standardized explicit minimum standards are required.


Main Text
Information about the ethnicity and socioeconomic status of participants in clinical research is needed for the interpretation, generalisability and pooling of data as well as to inform discussion around health inequalities. The relevance of ethnicity and socio-economic status to health and biomedical research is well established but has been emphasised by the COVID-19 pandemic, during which speci c ethnic groups and poorer individuals have been disproportionately affected 1 . The causal pathways driving health disparities are complex and multifactorial, however under-reporting of participant characteristics has been identi ed as a potential contributory factor [2][3][4] .
The International Committee of Medical Journal Editors (ICMJE) recommendations 5 , and some journal instructions to authors promote inclusion of these data 6,7 . Previous studies have identi ed that reporting is frequently incomplete with limited progress made over the last three decades 8-13 . Recent years have seen an increased focus on ethnicity and socioeconomic status in medicine, however there is a lack of research as to whether this has resulted in better reporting.
To evaluate the current situation in this area, we assessed the frequency of reporting of ethnicity (or 'race') and socioeconomic status indicators in a sample of research articles published in high impact general medical journals in Spring 2021. Translational Medicine. PNAS and PLOS One include a wide range of subject areas therefore the subsections 'Biological Sciences, Medical Science' and 'Clinical Medicine' were used respectively. From each of these 10 journals, we selected the 10 most recent journal articles that report participant level data. Laboratory studies using human derived tissues or cells were included if donor information was provided. Journal reporting guidance and requirements were also assessed by evaluating author guidelines, websites, and contacting the respective editorial/publishing teams. Data were collected on which participant level characteristics were reported and how. Data collection and analysis was conducted by SCB, KEJP, SMA and PW. All papers were reviewed independently by at least two researchers.
Ethnicity and race are related yet different constructs and arguably the latter term should be abandoned 14 . However, given the frequent lack of standardisation in the literature and that the terms are in practice often used interchangeably we accepted the use of either term. Similarly, regarding reporting of socioeconomic status indicators, various often inconsistent methods are used, therefore we opted to assess both direct measures such as the Index of Multiple Deprivation, but also measures from which socioeconomic status could be inferred such as educational attainment and job role. The focus being if, rather than how, such measures are reported. Among the 24 papers describing clinical trials, 50% reported ethnicity, with none highlighting the absence of these data as a limitation. 12.5% of trials reported an indicator of socioeconomic status, with one of the 21 not reporting socioeconomic status highlighting this absence as a limitation.
Of note, two of the research articles included in our sample identi ed ethnicity as being relevant to their research topic, yet did not provide relevant data on their study participants or highlight the lack of this data as a limitation of their study 'in the case of DNA-based mutation testing, poor sensitivity in detecting mutations in infants from ethnic and racial minority groups'' 15 , and 'peripheral oxygen saturation can substantially differ from the SaO 2 under certain conditions and may be less accurate in Black patients than in White patients.' 16 .  The majority of research published in high-impact medical journals does not include data on the ethnicity and socioeconomic status of participants, and this omission is rarely acknowledged as a limitation. This nding echoes related historical research, 8-13 but its persistence is of concern and is surprising given current awareness of such issues 17,18 .
These ndings have important implications for the interpretation and application of research ndings, both within academia and beyond, with the ongoing omission no longer justi able as simple oversight.
As highlighted by Baker et al. 19 in relation to data relating to LGBTQI+ communities, but equally relevant here, 'Data are fundamentally political: decisions about which data are collected and which are overlooked both re ect and shape policy and program priorities.' Our results could have multiple contributory factors. For some research including secondary data analyses, ethnicity and socioeconomic status data may not have been available to the researchers, but given the lack of explanation, it remains unclear if these data were unavailable, or available but not included in publications. The low level of reporting in controlled clinical trials suggests issues beyond unavailability of data, as in these studies such data would be simple to collect. Additionally, given research successfully reporting these data, the justi cation for these omissions remains unexplained.
The increased frequency of reporting ethnicity compared to socioeconomic status, may indicate differences between the perceived relevance of these variables. This would be in keeping journal author guidelines and ICMJE recommendations that encourage the inclusion of relevant demographic variables to ensure representative samples 5 , more often explicitly stating race and/or ethnicity, than socioeconomic status. The relevance of these factors may not have been apparent to authors and editorial teams, however ICMJE Recommendations for the Conduct, Reporting, Editing, and Publication of Scholarly work in Medical Journals 5 states 'Because the relevance of such variables as age, sex, or ethnicity is not always known at the time of study design, researchers should aim for inclusion of representative populations into all study types and at a minimum provide descriptive data for these and other relevant demographic variables.'. Of note, not all of the journals in our sample state that they follow the ICMJE recommendations 20 . However, whether or not the journal states they follow guidance or not, this has no impact upon the relevance of these data and the importance of reporting them. Additionally, Maduka et al 21 found no difference between journals stating they follow ICMJE recommendations, and those that do not, in the frequency of reporting race and ethnicity in a sample of surgical research publications in 2019.
Certain considerations require highlighting. Firstly, different approaches to selecting research papers may alter ndings. Secondly, we identi ed high-impact journals using the google scholar H5 index but acknowledge various other equally valid methods exist. Thirdly our analysis focused on if ethnicity and/or race was reported, but we acknowledge that these are not synonymous terms. In addition to if these variables are reported, how they are reported is also an important area for discussion and research. The widespread omissions identi ed by this research suggests a structural problem. Indeed, we the authors have published research which would have met the inclusion criteria and failed to report these speci c characteristics. Our intention is to highlight an issue and suggest approaches to address it.
Given that inadequate reporting persists despite research highlighting the issue, author and ICMJE recommendations, and the current socio-political climate, there is a clear need for more explicit requirements that are adhered to in practice. This is likely best achieved if steps are integrated into each stage of the research process, from protocol to publication. For example, Fain et al 22 compared reporting of race and ethnicity on ClinicalTrials.gov before and after the requirement to report these data (if collected), was introduced, nding that this was associated with an increase from 42% to 92%. Similar explicit requirements could be taken in EQUATOR guidelines 23 , and research ethics applications. From our sample, the journal JAMA had the most explicit guidance for reporting race and ethnicity, and this variable was reported in 9/10 of the articles we reviewed. Of note from 2022 the New England Journal of Medicine will be requiring authors of research articles to provide data on the representativeness of the sample including race or ethnic group 24 , though it is unclear if socioeconomic status indicators will also be required.
The reporting of ethnicity and socioeconomic status in high-impact medical research remains poor, despite a consensus on its importance. Omission of these participant characteristics limits the interpretation, generalisability, and pooling of data, that are required to facilitated informed discussion around health inequalities. Guidance and encouragement have so far proven insu cient to change practice in this area. Standardised, explicit, minimum standards are required.

Declarations Contributors:
SCB, had the original idea for the study. SCB, KEJP, SMA and PW collected the data. All authors (SCB, KEJP, SMA, PW, JKQ and NSH) contributed to the design of the study. KEJP analysed the data initially, which was veri ed by SCB, SMA and PW. KEJP wrote the rst draft of the manuscript. All authors (SCB, KEJP, SMA, PW, JKQ and NSH) critically appraised the manuscript and approved it for submission, and had full access to the data and can take responsibility for the integrity of the data and the accuracy of the data analysis. The corresponding author attests that all listed authors (SCB, KEJP, SMA, PW, JKQ and NSH) meet authorship criteria and that no others meeting the criteria have been omitted. Funding: KEJP is supported was supported by the Imperial College Clinician Investigator Scholarship (no speci c grant number/code). The funders had no say in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Competing interests:
None reported.
Data sharing: All data used in this study are publicly available. Percentages not given as most results have 100 as the denominator.

Figure 1
Consort diagram of study inclusion/exclusion