The randomised trials
We found 17 RCTs (Table 1, Figs. 1 and 2). In all but three RCTs, Dr Rezk was the first author, with the exceptions of SalamaRCT2019 and KandilRCT2018, in both of which he was the corresponding author, and HamzaRCT2016 which is an RCT reported only as abstract. In RezkRCT2019b, Dr Rezk was the first author, but Dr Elsayed Elshamy was the corresponding author. RezkRCT2019a is described as a prospective cohort study in the methods but also includes a description of randomisation. We assume this was an RCT. There were eight RCTs prospectively registered, one retrospectively registered, while RezkRCT2015b was registered retrospectively). Six of the 17 RCTs had a recruitment period that matched exactly a whole year, or the multiple of a year.
Numbers randomised to each group
For twelve RCTs, the randomisation process resulted in exactly equal numbers between groups, and in four of these exactly equal numbers between three different groups (RezkRCT2015d, RezkRCT2016, SalamaRCT2019, RezkRCT2020). In two RCTs, the equal numbers could be explained by the randomisation method being shuffling of equal numbers of cards, albeit only if zero envelopes went missing and no participants withdrew or were lost to follow-up. In one of those RCTs (Salama 2019), the cards had been allocated between two separate pharmacies. In the other 10 RCTs, the method of randomisation would not have inevitably led to equal-sized groups. In two further RCTs, the random allocation led to different sized groups but a differential loss to follow up led to exactly equal-sized groups for analysis (RezkRCT2015a, RezkRCT2015c).
Probability of random sampling for baseline characteristics
For all RCTs authored by Rezk, the cumulative distribution of Monte Carlo simulation showed a uniform distribution (p = 0.1801) of baseline variables, representing no evidence that these summaries of all baseline characteristics are not the result of a properly conducted randomization process.
Data copying between RCTs
Salama2019 and Rezk 2020 (Figs. 2a-c, Table 3) were both three-arm RCTs. Two of the randomised arms involved identical treatments (Methyldopa and a control group, with Salama2019 reporting on Nifedipine and Rezk2020 on Labetalol). Although both RCTs recruited patients with identical characteristics from the same hospitals over the same time period and share two authors, neither refers to the other.
The baseline tables in the two papers (Fig. 2a) have (out of 36 reported sample means or counts, not including sample standard deviations) seven identical counts and three values of the sample mean within 0.1 or 0.01 of each other. Using a normal approximation to the individual sampling distribution of each of these summary statistics (see Appendix), the probability of this degree of similarity was less than 0.07 for the three means, and less than 0.06 for each of the seven counts. The maternal outcome tables in the two papers (Fig. 2b) have 11 out of 27 identical values. All eleven of the individual probabilities of similarity are less than 0.15, eight are less than 0.10 and six are less than 0.05. The fetal and neonatal outcome tables in the two papers (Fig. 2c) show 7/32 identical values. The values for “Admission to NICU” and “neonatal mortality” for the “nifedipine” and “control” columns in Salama and the “labetalol” and “control” columns in Rezk appear to have been transposed (yellow squares), so this is effectively four more identical values. For these eleven similarities, all of the individual probabilities of similarity are less than 0.15, eight are less than 0.10 and five are less than 0.05. Even allowing for dependency between these calculated probabilities of similarity due to correlation between baseline variables, all of the individual probabilities are low, with the majority (25 of 32) less than 0.10.
Rezk2015c and Rezk2015e (Figs. 3a-c) are two RCTs on iron therapy in pregnancy, which share two authors, both recruited in Menoufia in overlapping time periods. However, the latter was a multicentre study also recruiting in other centres in the Menoufia directorate. There appears to be data copying between the baselines tables (Fig. 3a), the haemoglobin parameters (Fig. 3b) and the side effects (Fig. 3c).
Trial of endometrial scratching reported in abstract only
One RCT of endometrial scratching in women with unexplained infertility prior to intrauterine insemination appears to have been reported only as an abstract (HamzaRCT2016). Dr M Rezk is the second author. The trial was registered on 12 September 2015, and participants were recruited from 20 September 2015 to 20 April 2016 (seven months). At least two other RCTs of endometrial scratching from Menoufia, albeit with different authors (13, 14), were published around the same time. We found no obvious signs of data copying with those papers. Shaheen (2016) targeted a different population of women, those with recurrent miscarriage, over a different time period, between September 2014 and August 2015. Abd-Elhamid Shaheen, the only author of the recurrent miscarriage paper, is listed as a collaborator on the registration of the Hamza et al. trial but not as an author of the abstract. The Helmy2017 RCT recruited similar participants, infertility of any sort, over an overlapping time period, the first participant on 26 January 2015 and the last participant delivered on 17 July 2016, in the same centre. That trial paper shared no authors with the Hamza abstract or the Shaheen paper and had been registered on clinicaltrials.gov as NCT02345837.
We found 10 unpublished trial registered on the Pan African Clinical Trials Registry (11) by Dr Rezk (Tables 1 and 2; Fig. 1). All these RCTs have been labelled as completed between 2014 and 2018 but we could not find results published or presented as abstract.
We found 34 cohort studies (Table 4, Fig. 4). Eighteen cohort studies were reported to be “prospective”, while one study reported a five-year retrospective cohort followed immediately by a five-year prospective cohort (Rezk2015f). Twelve prospective recruitment periods began in 2012, albeit on different months of that year. In all these 12 studies, although recruitment periods varied, all were in whole years; that is, a study that started in January ended in January, a study that started in February ended in February, etc
Insufficient time for follow-up between the end of recruitment and paper reception by journal
Eleven reportedly prospective cohorts had journal reception dates that were incompatible with the follow-up reported in the paper (Rezk2015a, Rezk2015b, Rezk2015c, Rezk2016a, Rezk2016d, Rezk2017c, Rezk2017f, Rezk2017g, Rezk2017h, Reda2018, Abdelhamid2019; Table 3). We considered the possibility that some recruitment periods included follow-up. However, since 10 of them reportedly recruited over exactly whole years, this would be incompatible with the vagaries of the recruitment during gestation, the duration of the pregnancy and the timing of delivery for the final participants. The 11th study (Rezk2015b) reportedly recruited 450 pregnant women between 36 and 40 weeks, over a 30-day period, followed them up till delivery, and had the publication received by the journal 21 days later. In total, 24 of 34 cohort studies reportedly recruited over exactly whole years,
Implausible recruitment rates among cohort studies.
Rezk2015f reported 450 pregnant women with SLE during a ten-year period, about 4 recruits per month. For the prospective group, they claimed to have recruited 214 women with SLE out of 13,567 pregnant women, an incidence of 1.7%, compared with the usual reported rate of 1/1000 pregnancies (15). The mixed retrospective and prospective cohort design was similar to an earlier paper from Cairo (16), whose authors had identified 27 pregnant women with SLE in a retrospective cohort and 21 prospectively, albeit without reporting the time period. Rezk2015f did not cite Hendawy et al. 2011, and the two papers shared no authors, but the opening sentences of the discussion sections were uncannily close. Rezk2015f “SLE is mainly a disease of women in the childbearing period, and the coexistence of pregnancy is not a rare event. Disease flare during pregnancy leads to poor outcome (5)”. Hendawy et al. 2011 “Systemic Lupus Erythematosus (SLE) is mainly a disease of women in the childbearing period, and the coexistence of pregnancy is not a rare event. Disease flare during pregnancy consistently affects pregnancy outcome (7).” References 5 (Rezk2016a) and 7 (Hendawy et al. 2011, our reference 16) are the same. Finally, many of the odds ratios in Rezk2015f appear to have been miscalculated.
Rezk2017h reports a monthly recruitment of 159 pregnant women who had platelet assays at both 10–12 weeks and again at 18–20 weeks, maintained over five years in a single centre, Menoufia. Only 22 women out of 9,544 were not followed up till delivery.
Evidence of data copying between cohorts
Comparison of Rezk2016a and Rezk2016b (Table 3; Fig. 5) reported different cohorts of pregnant women with different types of hypertension. Although the recruitment periods and the entry criteria, differed the baseline tables showed evidence of data copying between the cohorts (Fig. 5). Of the five variables reported in both tables (age, parity, BMI, gestation age at diagnosis and delivery) 7/10 samples means were identical in both the “tens” column, the first and second decimal place, differing only by the same amount in the “ones” column, and 8/10 sample standard deviations were the identical for all digits (“ones” column, first and second decimal place). The chance of this degree of identity, similarity or “exact difference” in one place of a three- or four-digit number in two separate cohorts is not easy to calculate due to the large number of possible “exact differences” but would surely be low based on any sensible statistical model given the precision with which these summary statistics have been reported. A more likely explanation for the common values in the two tables is data fabrication in one or both studies.
Similar participants included in different cohort studies
Rezk2017c and Rezk2015a (Table 3 and Fig. 6) each reported a comparable number of patients (192 v 224) with rheumatic heart disease, recruited in the same clinic over overlapping periods of 3 and 5 years. There is no description as to how the two studies relate to each other. Given the essentially identical inclusion and exclusion criteria, it is difficult to explain the major differences in mean BMI (21 ± 1.2 and 21 ± 1.3 in one study compared with 24.1 ± 2.7 and 28.3 ± 3.6 in each group, Fig. 6) and in neonatal mortality rates (6/192 vs 32/204, Table 3).
Similar data and overlapping recruitment periods between RCTs and cohort studies
Four RCTs (RezkRCT2015b, KandilRCT2018, RezkRCT2018b, RezkRCT2019a) and two cohort studies (Kamal2017 and Rezk2016b) all recruited women with clomiphene-resistant PCOS, from the same single centre, during overlapping time periods (Figs. 7).
The recruitment of 109 participants over 3.5 months (59 per month) to (Rezk2018b) is difficult to credit given that participants had each failed to respond to clomiphene for at least three cycles (some as many as six) and that a normal uterine cavity assessment, a normal tubal patency test and the partner’s normal semen analysis were additional entry criteria.
The recruitment of 250 participants in Kandil2018 is even less plausible. The study ran for only eight months up to 18 October 2017, “which was the last day of follow up for the last included participant” and included six months follow-up, leaving only two months for recruitment. The paper was submitted on 21 November, 33 days after the last follow-up was completed. Not only was this a surgical trial, but the two treatment arms were laparoscopic or vaginal ultrasound-guided ovarian drilling. The former was done under general anaesthesia and the latter “without anaesthesia with only administration of ketorolac 50 mg by intramuscular injection 30 minutes before the procedure”. Both procedures were reportedly done immediately after menstruation.
The recruitment periods for KandilRCT2018, RezkRCT2018b, and RezkRCT2019a overlapped. From February to May 2017, all three RCTs were reportedly recruiting participants with identical characteristics in the same hospital. None describe how participants were allocated between the different trials.
The baseline FSH and LH values (RezkRCT201b (UOD v BOD) and RezkRCT2018b (C + M v letrozole) are identical (Fig. 7). The basal FSH and LH levels in Table 1 of Rezk2019a contain one identical value, one value with transposed digits and five values with a single digit different from the shared values in Rezk 2015b and Rezk 2018b (Fig. 8)
Rezk2017d, Rezk2017i, RezkRCT2019b, RezkRCT2019d, Rezk2018a, and RezkRCT16_U3 all measured uterine Doppler parameters in women undergoing different types of contraception. They all recruited in Menoufia hospital over overlapping time periods. Three were RCTs and two cohort studies. Despite studying identical treatments (LNG-IUS) in identical patient groups, Rezk 2017 and Rezk 2019 observed major differences in baseline and 6-month Doppler indices on which the authors did not comment. Despite two shared authors and overlapping recruitment periods in the same centre, neither paper cited the other. The two observational studies (Rezk2017d, Rezk2017i) both had submission dates that were incompatible with the report recruitment periods and follow-up duration.
Our findings are summarised in Table 5. For all tables with categorical variables there was an excess of even numbers. For 35 tables the probability of the observed excess or greater for the single table was P < 0.05. one individual table where all 47 categorical variables were even the probability was P = 0.000000000000007. Overall, with 925 even numbers and 92 odd numbers, the probability that this has happened by chance is p = 2.12e-150
Overall, we found 35 problematic papers. These included 14 prospective cohort studies with submission dates that were incompatible with the stated follow-up period, one paper reporting without any comment a rate of SLE in pregnancy about 17 times higher than previous reports, and another cohort study with an implausible recruitment rate. We also found three pairs of RCTs with what appears to be data copying and ten studies with overlapping recruitment of the same participants in the same centre without explanation of how participants were allocated between studies. These are all summarised in a table of problematic studies. Note that this table does not include the 11 studies with exactly balanced randomisation. Finally the probability of observing the excess of even numbered categorical variables in Dr Rezk’s papers overall is infinitesimal.