Our search strategy yielded a total of 20,431 titles after removal of duplicates. Twelve studies [11, 22–32] incorporating 28 data sets on self-testing (27,506 samples) and 31 studies [12, 13, 33–61] incorporating 37 data sets on self-sampling (31,792 number of samples) were found to be eligible for inclusion in the review (Fig. 1). One study was analyzed as self-sampling because it was unclear whether or not self-testing was performed [55].
Methodological quality of all included studies
The included studies were assessed to be of high applicability overall and variable bias (Fig. 2A).
Low risk of bias was observed in 41 out of 65 datasets (63.1%), when assessing the timing of the index test, the inclusion of participants, and whether the same reference standard was used throughout the study. However, in only 40.0% of the studies were the results of the reference standard (PCR) interpreted without knowledge of the index test results; this was unclear for the remaining 60.0%. For 67.7% of the studies, the conduct and interpretation of the index test was of low concern because the Ag-RDT results were interpreted without knowledge of results of the reference standard. Only 33.8% of the studies had a representative study population, avoiding inappropriate exclusions or a case-control design thereby resulting in low risk of bias. Out of the remaining studies, the risk of bias for patient selection remained unclear for 16.9%, and 6.2% had high risk of bias and 43.1% had an intermediate risk of bias. Applicability was deemed to be of low concern in 86.2% of the studies across all domains since the methods (i.e., patient selection, index test conduct, reference standard choice) in the respective studies matched our research question (Fig. 2B; with further details in Supplementary Fig. 1). Potential conflict of interest due to financial support from or employment by the test manufacturer was present in 17 studies (34.7%) [26, 28, 32, 37, 41, 46, 47, 49, 50, 52, 53, 55, 57, 58]. In studies focusing on self-sampling, 30 out of 36 datasets reported IFU-conform conduct of the test, even though sampling was explicitly observed in only 22 datasets (61.1%). For studies evaluating self-testing, 26 datasets stated IFU-conformity, while for the remaining two datasets it was unclear.
The result of the Deeks test for all datasets with complete results (p = 0.31) indicates a symmetrical funnel shape, suggesting that publication bias is absent (Supplement, S2 Figure Funnel Plot).
Study description
Most of the studies included in the review were conducted in high-income countries (HIC): the USA (n = 10), Germany (n = 7), the Netherlands (n = 6), UK, and Canada (n = 2, each), as well as Greece, Denmark, Japan, France, Belgium, Austria, France, and Hong Kong (n = 1, each). On the contrary, eight studies were conducted in middle-income countries (MIC): India (n = 3), Brazil, Morocco, Malaysia, South Korea, and China (n = 1, each) [62]. No studies were performed in low-income countries. Considering the study participant’s level of education, in two studies reporting on self-testing, the majority of participants (59.6% and 98.1%) had at least a high school degree [11, 22]. Out of the 17 studies reporting on self-sampling, one study stated that 52.5% of participants had a higher education degree [43]. Another study included only high school students (78.6%) or teachers (21.4%) [36], while two other studies included only college students [33, 54]. The remaining studies provided no information on the participants’ educational backgrounds. Participants had prior medical training (i.e., health care worker) in three self-sampling datasets (2,506 samples, 9.1%) [12, 43]. Participants were lay people without any medical training for six datasets totaling 5,023 samples, but for the other datasets, it remained unclear. Information on the participants' professional backgrounds and prior testing experiences was only reported in one self-testing study [10]. Out of the 144 participants in this study, 12 (8.3%) had prior medical training, 66 (45.8%) had undergone SARS-CoV-2 testing in the past, and four (2.8%) had performed at-home COVID-19 testing.
Most of the self-sampling data (32 datasets; 88.9%) were collected at testing or clinical sites, while for others no information was available. The sampling process was observed in 17 of the self-sampling studies (22 datasets), totaling 19,280 samples (60.6%) [12, 13, 33, 36, 38, 39, 41, 42, 45, 49, 52, 56–60], whereas sampling was not observed in four studies (4 datasets; 10.8%) [37, 43, 50, 54]. For the remaining ten studies (10 datasets; 27.0%), it was unclear whether the sampling was observed or not [34, 35, 40, 44, 46–48, 51, 53, 61]. Overall, 78.6% of the self-testing studies were carried out at a testing site, and the testing procedure was observed (without providing instructions) by the study team in three studies (1083 samples; 2.9%) [11, 28, 32].
A total of 27,506 samples were evaluated in the self-testing studies. With 13,166 individuals presenting with symptoms suggestive of a SARS-CoV-2 infection, while 10,103 persons did not show any symptoms at the time of testing. For the rest, the authors did not specify the participants’ symptom status. A total of 31,069 individuals participated in the self-sampling studies, of whom 6,325 had symptoms, 20,569 were asymptomatic, and 4,175 had unclear symptom status.
The most used Ag-RDTs across all studies were the BinaxNow nasal test by Abbott (USA, henceforth called BinaxNow) and the Standard Q nasal test by SD Biosensor (South Korea; distributed in Europe by Roche, Germany; henceforth called Standard Q nasal), with six datasets each. The BD Veritor lateral flow test for Rapid Detection of SARS-CoV-2 (Becton, Dickinson and Company, MD, US; henceforth called BD Veritor), the CLINITEST Rapid COVID-19 Antigen Test (Siemens Healthineers, Germany; henceforth called CLINITEST), and the Rapid SARS-CoV-2 Antigen Test (MP Biomedicals, CA, US; henceforth called MP Bio) were used in three datasets each.
Most self-samples for antigen testing were taken from the anterior nares (‘AN’; 28 datasets, 77.8%). The remaining datasets made use of either combined oropharyngeal/anterior nasal (OP/AN) (2 datasets, 5.6%), saliva (2 dataset, 5.6%), a combination of the above (AN/saliva, 1 dataset), or OP (3 datasets, 8.4%) samples. Similarly, many self-testing datasets used AN sample (20 datasets, 71.4%); whereas OP/AN and saliva accounted for 4 datasets (14.3%) each. The following samples were used for RT-PCR testing: AN (13 datasets, 20.0%), nasopharyngeal (NP) (21 datasets, 32.3%), NP/OP (13 datasets, 20.0%), OP (9 datasets, 13.8%), OP/AN (5 datasets, 7.7%), or saliva (3 dataset, 4.6%).
The RT-PCR and Ag-RDT analyses were conducted on the same sample type across 20 self-sampling datasets [31, 33–38, 41, 45, 46, 50, 54, 58–61]. Self-collected samples were used for RT-PCR in 14 of those datasets [33, 36–38, 41, 45, 46, 54, 59, 60]. In all self-testing studies, RT-PCR samples were collected by a professional.
Two self-testing and one self-sampling studies provided additional instructional videos [22, 29, 35]. Regarding self-testing studies, four studies provided study-specific test instructions since no manufacturer instructions for self-testing were available at the time [11, 22, 25, 29].
Table 1a, b provides further information on each of the studies included in the review.
Table 1
a Clinical accuracy data for self-sampled Ag-RDTs.
Study
|
Test assessed
|
Country
|
Type of location
|
Study population
|
Screening criteria
|
Sample type
|
Sensitivity (95%CI)
|
Specificity (95%CI)
|
Harris, 2021[12]
|
Sofia
|
USA
|
testing site
|
adults
|
sympt., HRC
|
AN
|
82.3% (77.5# to 86.4#)
|
98.8%# (97.5# to 99.5#)
|
Harris, 2021[12]
|
Sofia
|
USA
|
testing site
|
adults
|
asympt.
|
AN
|
31.6% (0.0# to 24.7#)
|
100% (99.8# to 100)
|
Lindner, 2021[13]
|
Standard Q
|
Germany
|
testing site
|
adults
|
sympt.
|
AN*
|
74.4% (57.9# to 87.0#)
|
99.2% (97.1 to 99.9#
|
Tinker, 2021[33]
|
BinaxNow
|
USA
|
testing site
|
adults
|
asympt.
|
AN*
|
20.0% (9.1# to 35.6#)
|
100% (99.8# to 100#)
|
Tanimoto, 2021[34]
|
Lumipulse
|
Japan
|
unclear
|
unclear
|
unclear
|
saliva
|
61.8% (47.7# to 74.6#)
|
100% (94.1 to 100)
|
Mak, 2022[35]
|
Standard Q
|
Hong Kong
|
testing site
|
unclear
|
HRC
|
OP/AN*
|
100% (15.8# to 100)
|
100% (90.7# to 100)
|
Blanchard, 2021[36]
|
Panbio nasal
|
Canada
|
testing site
|
adults, children
|
sympt.
|
AN*
|
78.6% (49.2# to 95.3#)
|
100% (98.7# to 100)
|
Harmon, 2021[37]
|
E25Bio
|
USA
|
testing site
|
adults
|
sympt., asympt.
|
AN
|
92.3% (64.0# to 99.8#)
|
99.6% (97.7# to 100)
|
Ford, 2021[38]
|
BinaxNow
|
USA
|
testing site
|
children
|
sympt., HRC, asympt.
|
AN*
|
71.4% (53.7 to 85.4)
|
100% (98.0 to 100)
|
Ford, 2021[38]
|
BinaxNow
|
USA
|
testing site
|
adults
|
sympt., HRC, asympt.
|
AN*
|
80.9% (75.9 to 85.3)
|
99.9% (99.5 to 100)
|
Klein, 2021[39]
|
Panbio nasal
|
Germany
|
testing site
|
adults
|
sympt., HRC
|
AN
|
86.4% (72.6# to 94.8#)
|
99.2% (97.0 to 99.9#)
|
Nikolai, 2021[43]
|
Standard Q
|
Germany
|
clinical
|
adults
|
sympt.
|
AN
|
91.2% (76.3# to 98.1#)
|
98.4% (91.3# to 100#)
|
Okoye, 2021[54]
|
BinaxNow
|
USA
|
testing site
|
adults
|
asympt.
|
AN*
|
53.3% (37.9# to 68.3#)
|
100% (99.9 to 100)
|
Krüger, 2021[56]
|
LumiraDx
|
Germany
|
testing site
|
adults
|
sympt., HRC
|
AN
|
82.2% (75.0# to 88.0#)
|
99.3% (98.3 to 99.7)
|
Osmanodja, 2021[57]
|
Dräger
|
Germany
|
both
|
adults
|
sympt., asympt.
|
AN
|
88.6% (78.7 to 94.9)
|
99.7% (98.2 to 100)
|
Chiu, 2021[58]
|
Indicaid
|
USA
|
clinical
|
adults, children
|
sympt.
|
AN
|
82.7% (72.2# to 90.4#)
|
96.4% (93.4 to 98.2#)
|
García-Fiñana, 2021[59]
|
Innova
|
UK
|
testing site
|
adults
|
asympt.
|
OP/AN
|
40.0% (28.5 to 52.4)
|
99.9% (99.8 to 99.9)
|
Shah, 2021[60]
|
BinaxNow
|
USA
|
testing site
|
adults, children
|
sympt, HRC, asympt.
|
AN
|
81.4% (76.8 to 85.5)
|
99.6% (99.2 to 99.8)
|
Frediani, 2021[61]
|
BinaxNow
|
USA
|
unclear
|
adults, children
|
unclear
|
AN
|
56.2%# (29.9# to 80.2#)
|
100% (87.7# to 100)
|
Tinker, 2021[33]
|
BinaxNow
|
USA
|
testing site
|
adult
|
asympt.
|
AN*
|
20.0 (9.1# to 35.6#)
|
100 (99.8# to 100#)
|
Tanimoto, 2021[34]
|
Lumipulse
|
Japan
|
unclear
|
unclear
|
unclear
|
saliva
|
61.8 (47.7# to 74.6#)
|
100 (94.1 to 100)
|
Mak, 2022[35]
|
Standard Q
|
Hong Kong
|
testing site
|
unclear
|
HRC
|
OP/nasal
|
100 (15.8# to 100)
|
100 (90.7# to 100)
|
Blanchard, 2022[69]
|
Panbio nasal
|
Canada
|
testing site
|
adult, children
|
sympt.
|
AN*
|
78.6 (49.2# to 95.3#)
|
100 (98.7# to 100)
|
Harmon, 2021[37]
|
E25Bio
|
USA
|
testing site
|
adult
|
sympt., asympt.
|
AN*
|
92.3 (64.0# to 99.8#)
|
99.6 (97.7# to 100)
|
Ford, 2021[38]
|
BinaxNow
|
USA
|
testing site
|
children
|
sympt, HRC, asympt.
|
AN*
|
71.4 (53.7 to 85.4)
|
100 (98.0 to 100)
|
Ford, 2021[38]
|
BinaxNow
|
USA
|
testing site
|
adult
|
sympt, HRC, asympt.
|
AN*
|
80.9 (75.9 to 85.3)
|
99.9 (99.5 to 100)
|
Ahmed, 2022[40]
|
ProDetect
|
Malaysia
|
unclear
|
adult, children
|
sympt, HRC,
|
AN
|
96.1# (86.5# to 99.5#)
|
98.0 (89.1# to 99.9#)
|
Cardoso, 2022[41]
|
Wondfo
|
Brazil
|
testing site
|
unclear
|
sympt
|
AN*
|
73.0 (64.7# to 80.2#)
|
98.6 (95.2 to 99.8#)
|
Chen, 2022[42]
|
Labnovation
|
China
|
clinical
|
adult
|
unclear
|
AN
|
70.4# (49.8# to 86.2#)
|
100# (29.2# to 100#)
|
Chen, 2022[42]
|
Labnovation
|
China
|
clinical
|
adult
|
unclear
|
AN
|
81.4# (66.6# to 91.6#)
|
64.0# (42.5# to 82.0#)
|
Gagnaire, 2022[44]
|
Biospeedia
|
France
|
testing site
|
adult, children
|
sympt, HRC, asympt.
|
AN/saliva
|
59.4 (51.5 to 67.0)
|
99.8 (99.7# to 99.9)
|
Goodall, 2022[45]
|
Panbio
|
Canada
|
testing site
|
unclear
|
asympt.
|
AN*
|
64.5 (51.3# to 76.3#)
|
100 (99.5# to 100#)
|
Goodall, 2022[45]
|
Panbio
|
Canada
|
testing site
|
unclear
|
asympt.
|
TN*
|
64.5 (51.3# to 76.3#)
|
100 (99.5# to 100#)
|
Goodall, 2022[45]
|
Panbio
|
Canada
|
testing site
|
unclear
|
asympt.
|
AN*
|
68.4 (51.3# to 82.5#)
|
100 (99.2# to 100#)
|
Goodall, 2022[45]
|
Panbio
|
Canada
|
testing site
|
unclear
|
asympt.
|
TN*
|
81.6 (65.7# to 92.3#)
|
100 (99.2# to 100#)
|
Igloi, 2021[46]
|
Standard Q
|
Netherlands
|
testing site
|
adult
|
sympt., HRC
|
saliva*
|
66.1 (52.9 to 77.6)
|
99.6 (98.8 to 99.9
|
Mane, 2022[47]
|
Coviself
|
India
|
testing site
|
adult
|
sympt., HRC
|
OP
|
54.2# (39.2# to 68.6#)
|
96.9# (92.9# to 99.0#)
|
Rangaiah, 2022[48]
|
Coviself
|
India
|
unclear
|
unclear
|
unclear
|
AN
|
61.5 (50.7 to 71.5)
|
100 (97.4 to 100)
|
Robinson, 2022[49]
|
BD Veritor nasal
|
USA
|
testing site
|
unclear
|
sympt., HRC,
|
AN
|
-
|
-
|
Savage, 2022[50]
|
Covios
|
UK
|
testing site
|
adult
|
sympt.
|
AN
|
90.5 (83.9 to 97.2)
|
99.4 (98.3 to 100)
|
Shin, 2022[51]
|
Standard Q
|
Korea
|
clinical
|
unclear
|
sympt., asympt.
|
AN
|
94.9 (87.5 to 98.6)
|
100 (98.3 to 100)
|
Sukumaran, 2022[52]
|
AG-Q
|
India
|
clinical
|
unclear
|
unclear
|
AN
|
77.9 (67.7 to 86.1)
|
100 (94.4 to 100)
|
Tsao, 2022[55]
|
BinaxNow
|
USA
|
testing site
|
adult
|
sympt., asympt.
|
AN
|
63.0 (50.9# to 74.0#)
|
99.8 (99.1# to 100)
|
Wölfl-Duchek, 2022[53]
|
Medomics
|
Austria
|
clinical
|
adult
|
sympt., asympt.
|
AN
|
63.0 (47.5 to 76.8)
|
100 (91.0# to 100)
|
Abbreviations: sympt. = symptomatic; asympt. = asymptomatic without known contact; HRC = high risk contact; AN = anterior nasal; OP = oropharyngeal; TN = throat; * RT-PCR sample was self-sampled # Values have been recalculated due to missing or contradictory data |
Table 1
b Clinical accuracy data for self-testing Ag-RDTs.
Study
|
Test assessed
|
Country
|
Type of location
|
Study population
|
Screening criteria
|
Sample type
|
Sensitivity (95%CI)
|
Specificity (95%CI)
|
Lindner, 2021[11]
|
Standard Q
|
Germany
|
clinical
|
adults
|
sympt.
|
AN
|
82.5% (67.2# to 92.7#)
|
100% (96.5 to 100)
|
Stohr, 2022[22]
|
BD Veritor
|
Netherlands
|
testing site
|
adults
|
sympt., asympt.
|
AN
|
48.9% (41.3# to 56.5#)
|
99.9% (99.5 to 100)
|
Stohr, 2022[22]
|
Standard Q
|
Netherlands
|
testing site
|
adults
|
sympt., asympt.
|
AN
|
61.5% (54.2# to 68.4#)
|
99.7% (99.3 to 99.9)
|
De Meyer, 2022[25]
|
V-Chek
|
Belgium
|
testing site
|
adult, children
|
unclear
|
saliva
|
7.7 (0.2# to 36.0#)
|
100 (90.5# to 100#)
|
De Meyer, 2022[25]
|
Whistling
|
Belgium
|
testing site
|
adult, children
|
unclear
|
saliva
|
9.1 (3.0# to 20.0#)
|
100 (92.5# to 100#)
|
Diawara, 2022[26]
|
PCL
|
Morocco
|
unclear
|
adult, children
|
unclear
|
saliva
|
90.1 (80.7 to 95.9)
|
99.6 (97.9 to 99.9)
|
Diawara, 2022[26]
|
PCL
|
Morocco
|
unclear
|
adult, children
|
unclear
|
AN
|
91.4# (82.3# to 96.8#)
|
100 (98.5 to 100)
|
Iftner, 2022[27]
|
Anbio
|
Germany
|
testing site
|
adult
|
asympt.
|
AN
|
-
|
99.8# (98.8# to 100#)
|
Iftner, 2022[27]
|
Clungene
|
Germany
|
testing site
|
adult
|
asympt.
|
AN
|
-
|
97.9# (96.2# to 99.0#)
|
Iftner, 2022[27]
|
Hotgen
|
Germany
|
testing site
|
adult
|
asympt.
|
AN
|
-
|
99.8# (98.8# to 100#)
|
Iftner, 2022[27]
|
Mexacare
|
Germany
|
testing site
|
adult
|
asympt.
|
AN
|
-
|
99.8# (98.8# to 100#)
|
Leventopoulos, 2022[28]
|
Boson
|
Greece
|
testing site
|
adult, children
|
sympt., asympt.
|
AN
|
98.2 (96.7 to 99.6)
|
100 (99.9 to 100)
|
Møller, 2022[29]
|
DNA Diagnostics
|
Denmark
|
testing site
|
adult
|
sympt, HRC, asympt.
|
AN
|
65.7 (49.2 to 79.2)
|
100 (99.0 to 100)
|
Møller, 2022[29]
|
Hangzhou
|
Denmark
|
testing site
|
adult
|
sympt, HRC, asympt.
|
AN
|
62.1 (50.1 to 72.9)
|
100 (98.9 to 100)
|
Schuit, 2022[31]
|
Flowflex
|
Netherlands
|
testing site
|
adult
|
sympt, HRC, asympt.
|
AN
|
79.0 (74.7 to 82.8)
|
97.2 (93.9 to 98.9)
|
Schuit, 2022[31]
|
MPBio
|
Netherlands
|
testing site
|
adult
|
sympt, HRC, asympt.
|
AN
|
69.9 (65.1 to 74.4)
|
98.8 (97.3 to 99.6)
|
Schuit, 2022[31]
|
Clinitest
|
Netherlands
|
testing site
|
adult
|
sympt, HRC, asympt.
|
AN
|
70.2 (65.6 to 74.5)
|
99.3 (97.6 to 99.9)
|
Schuit, 2022[31]
|
MPBio
|
Netherlands
|
testing site
|
adult
|
sympt, HRC, asympt.
|
OP/nasal
|
83.0 (78.8 to 86.7)
|
97.8 (94.3 to 99.4)
|
Schuit, 2022[31]
|
Clinitest
|
Netherlands
|
testing site
|
adult
|
sympt, HRC, asympt.
|
OP/nasal
|
77.3 (82.9 to 81.2)
|
97.0 (93.9 to 98.8)
|
Schuit, 2022[30]
|
SD Biosensor
|
Netherlands
|
testing site
|
adult
|
sympt, HRC, asympt.
|
NP/OP
|
68.9 (61.6 to 75.6)
|
99.5 (99.2 to 99.8)
|
Schuit, 2022[30]
|
Hangzhou
|
Netherlands
|
testing site
|
adult
|
sympt, HRC, asympt.
|
NP/OP
|
46.7 (39.3 to 54.2)
|
99.0 (98.5 to 99.4)
|
Tonen-Wolyec, 2022[32]
|
Biosynex
|
France
|
testing site
|
adult
|
sympt, HRC, asympt.
|
AN
|
90.9 (70.8# to 98.9#)
|
100 (95.7# to 100)
|
Venekamp, 2023[23]
|
FlowFlex
|
Netherlands
|
testing site
|
adult
|
sympt, HRC, asympt.
|
AN
|
27.5 (21.3 to 34.3)
|
99.8 (99.3 to 100)
|
Venekamp, 2023[23]
|
MPBio
|
Netherlands
|
testing site
|
adult
|
sympt, HRC, asympt.
|
AN
|
20.9 (13.9 to 29.4)
|
99.8 (99.2 to 100)
|
Venekamp, 2023[23]
|
Clinitest
|
Netherlands
|
testing site
|
adult
|
sympt, HRC, asympt.
|
AN
|
25.6 (19.1 to 33.1)
|
99.9 (99.5 to 100)
|
Zwart, 2022[24]
|
BD Veritor
|
Netherlands
|
clinical
|
adult
|
sympt., asympt.
|
OP/nasal
|
61.5 (56.6 to 66.3)
|
100 (99.8 to 100)
|
Zwart, 2022[24]
|
BD Veritor
|
Netherlands
|
clinical
|
adult
|
sympt., asympt.
|
AN
|
50.3 (43.0# to 57.6#)
|
99.7 (99.3 to 99.8)
|
Zwart, 2022[24]
|
Roche
|
Netherlands
|
clinical
|
adult
|
sympt., asympt.
|
OP/nasal
|
74.3# (66.6# to 81.1#)
|
99.7 (99.4# to 99.9)
|
Abbreviations: sympt. = symptomatic; asympt. = asymptomatic without known contact; HRC = high risk contact; AN = anterior nasal; OP = oropharyngeal; TN = throat; * RT-PCR sample was self-sampled # Values have been recalculated due to missing or contradictory data |
Concordance with professional-use Ag-RDTs
The concordance between self-testing and professional testing was only reported in one study, which found high concordance with a kappa of 0.94 [11]. The concordance between self-sampling and professional testing was reported in six studies and ranged from 0.86 to 0.93 [13, 39, 42, 43, 58]. The pooled Cohen’s kappa for self-sampling studies was 0.91 (95% CI 0.88 to 0.94) (Fig. 3).
We also performed an exploratory analysis of concordance combining datasets from self-sampling and self-testing studies, assuming that sampling is a major driver of differences between self-testing and professional testing. We observed a pooled Cohen’s kappa of 0.92 (95% CI 0.89 to 0.95) (Supplemental Fig. 3).
Performance of self-testing and self-sampling in comparison to RT-PCR
When comparing the performance of self-testing using Ag-RDTs to the reference standard, sensitivity ranged widely from 7.7% [25] to 98.2% [28]. Specificity was high, above 99.5% in all datasets.
Across 36 datasets from 31 self-sampling studies, sensitivity again ranged widely from 20.0% [33] to 100% [35] with wide CIs. Specificity for self-sampling studies ranged from 96.4% [58] to 100% [12] with narrow CIs. Sensitivity of ≥ 80% was achieved in 15 self-sampling [12, 35, 37–40, 42, 43, 45, 50, 51, 56–58, 60] and five self-testing studies [11, 26, 28, 31, 32].
A total of 54 datasets assessing 55,115 self-tested or self-sampled samples were eligible for meta-analysis. The meta-analysed summary estimates of sensitivity and specificity across both self-sampling and self-testing datasets were 70.5% (95% CI 64.3 to 76.0) and 99.4% (95% CI 99.1 to 99.6), respectively. The pooled sensitivities for self-tested (23 datasets) and self-sampled (31 datasets) samples were 66.1% (95% CI 53.5 to 76.7) and 73.5% (95% CI 67.4 to 78.7), respectively.
When only AN sample (40 datasets, 74.1%) were considered, the pooled sensitivity marginally increased to 72.9% (95% CI 65.8 to 79.0). Test-specific summary estimates of sensitivity were possible for BinaxNow (6 datasets), Standard Q nasal (6 datasets) and Panbio (Abbott, Germany; henceforth called Panbio) (6 datasets), resulting in a sensitivity of 63.5% (95% CI 43.4 to 79.8), 79.8% (95% CI 66.0 to 88.9), and 67.7% (95% CI 60.8 to 73.8), respectively. Data were insufficient for a meta-analysis of other Ag-RDTs or sample types. Supplementary Table S1 provides the full ranges for the clinical performance of each Ag-RDT.
IFU-Conformity
Across all self-sampling and self-testing datasets, the overall summary estimate of sensitivity for all IFU-conforming studies was 71.3% (95% CI 64.5 to 77.3) (Fig. 4A), with marginal differences between self-testing and self-sampling studies (Supplement Fig. 4, 5). In total three datasets had unclear IFU-conformity with sensitivity ranging from 48.9% [22] to 78.6% [36].
In the one study in which participants were observed as they self-tested, the majority of deviation from instructions happened during the sampling procedure, with 41.8% of participants failing to rub the swab against the nasal walls [11]. Another common mistake made during sampling involved too little rotation time in the nose (24.1%) [11]. Squeezing the tube while the swab was still inside and squeezing the tube when the swab was being removed were the steps with most frequent deviations during the testing procedure, at 34.9% and 33.1%, respectively. These deviations, however, did not appear to impact test performance in this study, as performance against RT-PCR (Sensitivity 82.5%) was acceptable and concordance with professional testing was high (kappa 0.91).
Presence of Symptoms
The summary estimates of sensitivity across all studies were lower in the asymptomatic group compared to the symptomatic group, with 38.1% (95% CI 23.4 to 55.3) compared to 77.4% (95% CI 71.1 to 82.6), respectively (Fig. 4B). Specificity was above 99.0% in both subgroups. Self-testing studies, which are included in the pooled analysis, reported a range of sensitivity from 51.0% [30] to 82.5% [11] in symptomatic persons.
Duration of Symptoms (DoS)
We were unable to perform a bivariate subgroup meta-analysis for a DoS of more than seven days (DoS > 7) due to an insufficient number of available datasets (n = 1). The reported sensitivity and specificity in this study was 53.8% and 100%, respectively [56]. The pooled estimates of sensitivity and specificity in studies reporting DoS ≤ 7 was 79.4% (95% CI 72.7 to 84.8) and 99.4% (95% CI 98.9 to 99.7), respectively.
Ct Values
For the subgroup analysis based on Ct value range, 22 datasets from nine self-sampling studies were available for univariate meta-analysis. For the Ct value groups < 25 and < 30, the pooled sensitivities were 93.6% (95% CI 90.4 to 96.8) and 76.6% (95% CI 57.6 to 95.6), respectively (Fig. 4C).
Testing using self-sampling in patients who had samples with Ct values ≥ 25 and ≥ 30 showed a broader range, with pooled sensitivities of 35.9% (95% CI 9.8 to 62.0) and 10.2% (0.0 to 28.1), respectively.
One self-testing study reported a sensitivity of 85.0% and a specificity of 99.1% when only samples with high viral load (≥ 7.0 log10 SARS-CoV-2 RNA copies/mL) were analyzed [11].
Age
Across all the studies included in the review, we had 32 datasets with samples from people aged 18 years and older (‘≥18 years’), achieving a pooled sensitivity of 65.5% (95% CI 57.8 to 72.4) (Fig. 4D). For the ‘<18 years’ group, a meta-analysis was not possible, as only three datasets were available for this age group. However, the reported sensitivity in these three datasets had a comparable range to that in the ‘≥18 years’ group (71.4% [38] to 92.3% [37]). The pooled specificity was 99.6% (95% CI 99.2 to 99.8) in the ‘≥18 years’ group and was above 99.6% in all datasets in the ‘<18 years’ group.
Virus variant
VoC could be determined for 53 datasets out of 54, wild type observed in 21 datasets (39.6% of all datasets). The pooled sensitivity across these 21 datasets was 69.8% (95% CI 62.5 to 76.3) and the pooled specificity was 99.7% (95% CI 99.5 to 99.8). The highest sensitivity was found across studies conducted when the alpha VoC (8 datasets, 15.1%) was predominant, with 78.5% (95% CI 60.8 to 89.6). Across studies conducted during an Omicron wave (4 datasets, 7.5%), the pooled sensitivity was significantly lower with 32.8% (95% CI 17.8 to 52.3). When Delta (6 datasets, 11.3%) was predominant, the pooled sensitivity increased to 57.8% (95% CI 28.0 to 82.8). However, in other studies when Delta and Omicron were predominant had a pooled sensitivity of 76.1% (95% CI 70.7 to 80.7) (Fig. 5).
Self-testing studies showed similar pooled estimates for sensitivity for wild type, combined Delta/Omicron, and alpha VoC with 62.6% (95% CI 52.2 to 72.0), 76.1% (95% CI 70.7 to 80.7), and 85.3% (54.0 to 96.6), respectively.
Middle-income countries (MIC) vs. High income countries (HIC)
Studies conducted in high income countries (HIC) accounted for 44 datasets (53090 samples), resulting in a pooled sensitivity and specificity of 67.6% (95% CI 60.5 to 74.0) and 99.5% (95% CI 99.3 to 99.7), respectively. In contrast, studies from MIC (10 datasets; 2025 samples) had higher sensitivity and comparable specificity with 81.0% (95% CI 70.4 to 88.4) and 98.1% (95% CI 93.9 to 99.4), respectively (Supplement Figs. 6 and 7).
Sensitivity Analysis
When excluding case-control studies (5 datasets), the sensitivity remained comparable to the overall pooled sensitivity estimate with 69.5% (95% CI 62.8 to 75.5) (Supplement Fig. 8).
Datasets from manufacturer-independent studies (40 datasets; 20 self-testing studies) achieved an accuracy comparable to the overall summary estimates with a pooled sensitivity of 66.5% (95% CI 59.2 to 73.1) and a pooled specificity of 99.5% (95% CI 99.1 to 99.7) (Supplement Fig. 9). Excluding preprints (5 datasets) resulted in no substantial change in sensitivity (69.9% [95% CI 63.2 to 75.8]) and specificity (99.4% [95% CI 99.0 to 99.6]) (Supplement Fig. 10).
Certainty of Evidence (CoE)
We found CoE to be high for specificity and sensitivity, and low for concordance and user errors. As for ‘imprecision’, we downgraded the CoE for concordance by one point due to the low number of studies and small sample size. For studies assessing concordance and user errors, ‘inconsistency’ was rated ‘serious’ and consequently also downgraded by one point, since there was only one study available (Table 2).
Table 2
GRADE table: Should COVID-19 self-testing, defined as self-sampling, processing of the sample and self-readout using Ag-RDTs, be offered as an additional approach to professionally administered testing services? The following table summarizes the certainty of evidence according to the GRADE approach.
Certainty assessment
|
Impact
|
Certainty
|
Importance
|
№ of studies
|
Study design
|
Risk of bias
|
Inconsistency
|
Indirectness
|
Imprecision
|
Other considerations
|
Accuracy – sensitivity (Ag-RDT self-testing vs. rRT-PCR)
|
23
[11, 22, 31, 32, 23–30]
|
observational studies
|
not seriousa
|
not seriousb
|
not seriousc
|
not seriousd
|
none
|
Normalized to a study population with 1,000 participants and 10% prevalence, 66 true positive and 34 false negative self-testing results were reported. Pooled sensitivity was 66.1% (95% CI 53.5 to 76.7)
|
⨁⨁⨁⨁
High
|
CRITICAL
|
Accuracy – specificity (Ag-RDT self-testing vs. rRT-PCR)
|
23
[11, 22, 31, 32, 23–30]
|
observational studies
|
not seriousa
|
not seriousb
|
not seriousc
|
not seriousd
|
none
|
Normalized to a study population with 1,000 participants and 10% prevalence, 874 true negative and 2 false positive self-testing results were reported. Pooled specificity was high with 99.5% (95% CI 99.1 to 99.7)
|
⨁⨁⨁⨁
High
|
CRITICAL
|
Accuracy – concordance (Ag-RDT self-testing vs. Ag-RDT performed by professionals)
|
1[11]
|
observational studies
|
not seriousa
|
seriousb
|
not seriousc
|
seriousd
|
none
|
Kappa: 0.92 (out of 1.00); (95% CI 0.85 to 1.00)
|
⨁⨁◯◯
Low
|
CRITICAL
|
Accuracy – Proportion of user errors
|
1 [11]
|
observational studies
|
not seriousa
|
seriousb
|
not seriousc
|
not seriouse
|
none
|
15.5% of the sampling steps and 15.0% of testing steps, were found to have deviations by study participants. However, these did not impede the self-test's performance.
|
⨁⨁◯◯
Low
|
IMPORTANT
|
Certainty assessment
|
Impact
|
Certainty
|
Importance
|
№ of studies
|
Study design
|
Risk of bias
|
Inconsistency
|
Indirectness
|
Imprecision
|
Other considerations
|
Explanation: a. We used QUADAS-2 to assess risk of bias. The studies enrolled patients consecutively and assessed the self-testing, defined as self-sampling and self-performing the Ag-RDT, results blinded to the reference standard result (rRT-PCR or prof. Ag-RDT testing). While for one study it was not clear whether all self-tests were performed as per manufacturer’s instructions, this was ensured in the other. Furthermore, we could not detect any potential bias resulting from the study flow and timing. Therefore, we did not downgrade the quality of evidence for this criterion. |
b. The heterogeneity/inconsistency in findings, as shown by the wide-ranging point estimates with only marginally overlapping confidence intervals, is likely to originate from differences in the study population. This is strengthened by the fact that the head-to-head comparison between self-testing and professionally testing on the same study population shows similar performance of Ag-RDTs. However, as there are only a few studies available for concordance and one study for user errors, we downgrade for these two outcomes by one. |
c. Following current guidance from the GRADE guideline, we do not downgrade by one point for all studies but acknowledge that the study populations are not fully representative of the populations of interest. Furthermore, the intervention did not differ from the one of interest and outcomes were reported directly, therefore indirectness was judged 'not serious'. |
d. The number of studies and sample size were small, and only one study reported on concordance between self-testing and professionally testing using Ag-RDTs. |
e. For this outcome only qualitative data, or quantitative data in isolated studies in well-described but not comparable settings were available, therefore the criterion 'imprecision' is negligible and rated as 'not serious'. |