Phase 1. Forming the gold standard set
The process of forming the gold standard set from three different sources is shown as Figure 1.
Fig 1. Formation of the gold standard set
Characteristics of the gold standard set
The full gold standard set comprised n=534 citations from 226 unique journal titles and spanning the years 1988 to 2017. The spread of citations across this year range is shown in Figure 2.
Figure 2. Gold standard set date coverage and year frequencies
The top 10 journals represented in the gold standard set are shown as Table 2.
Table 2. The ten highest frequency gold standard set journal titles
International Journal of Integrated Care
|
45
|
Health Affairs (Millwood)
|
22
|
BMJ
|
16
|
BMC Health Services Research
|
13
|
Cochrane Database of Systematic Reviews
|
13
|
Health Policy
|
12
|
Health & Social Care in the Community
|
11
|
American Journal of Managed Care
|
9
|
HealthcarePapers
|
9
|
Journal of the American Geriatrics Society
|
9
|
The gold standard set was split into three subsets with the following proportions of citations:
- Term Identification Set (TIS) n=107 (20%)
- Filter Development Set (FDS) n=213 (40%)
- Filter Validation Set (FVS) n=214 (40%)
Phase 2. Deriving candidate search terms
The MeSH and textword terms capable of retrieving the highest number of unique citations from the TIS (≥ 25%) are shown in Table 3.
Table 3. Highest frequency MeSH terms and textwords in the TIS
Terms
|
Unique citations retrieved from TIS
(total n=107)
|
% citations retrieved from TIS
|
MeSH
|
Organization & administration.xs.
|
88
|
82.3
|
Delivery of health care, Integrated/
|
55
|
51.4
|
Economics.fs.
|
31
|
29.0
|
Therapy.xs.
|
30
|
28.0
|
Textwords/phrases
|
Health*.mp.
|
104
|
97.2
|
Health.mp.
|
102
|
95.3
|
Care.mp.
|
102
|
95.3
|
Care.tw.
|
94
|
87.9
|
Health*.tw.
|
85
|
79.4
|
Health care.mp.
|
81
|
75.7
|
Integrat*.mp.
|
80
|
74.8
|
Integrat*.tw.
|
77
|
72.0
|
Health.tw.
|
75
|
70.1
|
Integrated.mp.
|
74
|
69.2
|
Services.mp.
|
68
|
63.6
|
Delivery.mp.
|
67
|
62.6
|
Integrated.tw.
|
58
|
54.2
|
Support.mp.
|
58
|
54.2
|
Patient.mp.
|
54
|
50.5
|
Services.tw.
|
49
|
45.8
|
Systems.mp.
|
43
|
40.2
|
Management.mp.
|
40
|
37.4
|
Integration.mp.
|
39
|
36.5
|
Organi?ational.mp.
|
39
|
36.5
|
Systems.tw.
|
37
|
34.6
|
Community.mp.
|
36
|
33.7
|
Data.mp.
|
36
|
33.7
|
Model.mp.
|
35
|
32.7
|
Practice.mp.
|
35
|
32.7
|
Organizational.mp.
|
35
|
32.7
|
Quality.mp.
|
34
|
31.8
|
Health care.tw.
|
34
|
31.8
|
Service.mp.
|
33
|
30.8
|
Patient.tw.
|
33
|
30.8
|
Community.tw.
|
33
|
30.8
|
Models.mp.
|
32
|
30.0
|
Management.tw.
|
32
|
30.0
|
Healthcare.tw.
|
32
|
30.0
|
Delivery.tw.
|
31
|
29.0
|
System.mp.
|
30
|
28.0
|
Service.tw.
|
30
|
28.0
|
Data.tw.
|
30
|
28.0
|
Hospital.mp.
|
30
|
28.0
|
Primary care.mp.
|
30
|
28.0
|
Primary care.tw.
|
29
|
27.1
|
Clinical.mp.
|
29
|
27.1
|
Hospital.tw.
|
28
|
26.2
|
Disease.mp.
|
28
|
26.2
|
Design.mp.
|
27
|
25.2
|
±Coordinat*.mp.
|
21
|
19.6
|
Notes:
- .mp = search on title, abstract, keywords, and subject headings
- .tw = search on title and abstract
- .xs = search on exploded free-floating subheading
- .fs = search on free-floating subheading (not exploded)
- / = Medical Subject Heading (MeSH) search
- ? allows for single letter variants within a word (here: organisational OR organizational)
- ±Coordinat* was chosen from lower down the TIS-derived frequency list as a possible equivalent term for ‘integrated’.
Phase 3. Filter development
Individual term testing in the FDS
The highest frequency textwords from the TIS were again searched in the FDS to determine their recall. Their corresponding precision was also estimated in Medline outside of the FDS. Although recall for some terms was high (e.g. care.mp at 98.1%), precision proved very low (see Table 4). The term with the most face validity—integrated care—had low recall in the FDS (43/213; 20.2%) so it was not considered a candidate term at this stage. Similarly, the most relevant MeSH term, "Delivery of Health Care, Integrated", had low recall, retrieving only 95/213 citations, or 44.6% of the FDS.
Table 4. FDS recall and PubMed precision of highest-ranking candidate terms
Searches
|
Recall in FDS (n=213)
|
% Precision in PubMed
(Total n=100)
|
n
|
%
|
Integrat*.mp.
|
159
|
74.7
|
8
|
Integrat*.tw.
|
152
|
71.4
|
8
|
Integrated.mp.
|
143
|
67.1
|
8
|
Integrated.tw.
|
118
|
55.4
|
8
|
|
|
|
|
Coordinat*
|
46
|
21.6
|
8
|
|
|
|
|
Care.mp.
|
209
|
98.1
|
0
|
Care.tw.
|
180
|
84.5
|
0
|
|
|
|
|
Health*.mp.
|
199
|
93.4
|
0
|
(Health OR healthcare).mp
|
199
|
93.4
|
0
|
Health.mp.
|
197
|
92.5
|
0
|
Health.tw
|
143
|
67.1
|
0
|
Establishing concept groups
Concept groupings of high frequency candidate terms were hypothesised as: (1) integrated (2) health care (3) organisation and administration. These groups and the terms that fall under each are shown in Figure 3.
Figure 3. Concepts groups and their relevant terms
In og.xs., the 'og' is the abbreviated form of Medline subheading 'organization & administration'. In its exploded form (indicated by .xs) it also includes a search on the related subheadings: economics; legislation & jurisprudence; manpower; standards; supply & distribution; trends; and utilization.
Combining terms within and across concept groups
The FDS was then used to test the best performing combinations of terms from the first two concept groups, 'integrated' and 'health/care.' To determine the most meaningful way to combine them, each term was tested with the other terms in its own group and then with terms in the other group. However, when the high frequency terms were tested within their concept groups, proxy precision remained very low, often at 0%, for both the OR and the AND Boolean operators while recall stayed at an acceptable level.
As expected, the OR operator outperformed the AND operator at maintaining recall with no clear effect on precision. Table 5 shows the initial results of this process using the first two concept groups only.
Table 5. Sequential testing of terms within two concept groups in the FDS
Searches
|
Recall in FDS (n=213)
|
Proxy precision in Medline
(Total n=100)
|
n
|
%
|
%
|
OR'd combinations (within concept group)
|
Health.mp. OR healthcare.mp.
|
199
|
93.4
|
0
|
Care.mp. OR health.mp.
|
211
|
99.1
|
0
|
Care.mp. OR health*.mp
|
211
|
99.1
|
0
|
#(Care.mp. OR health OR healthcare).mp.
|
211
|
99.1
|
0
|
Care.tw. OR health.tw.
|
194
|
91.1
|
0
|
Integrat*.mp OR coordinat*.mp.
|
166
|
77.9
|
3
|
AND'd combinations (within concept group)
|
Health.mp AND healthcare.mp
|
43
|
20.2
|
0
|
Care.mp. AND health.mp.
|
195
|
91.6
|
0
|
Care.mp. AND health*.mp
|
197
|
92.5
|
0
|
#Care.mp. AND (health.mp. OR healthcare.mp)
|
197
|
92.5
|
0
|
Integrat*.mp AND coordinat*.mp.
|
39
|
18.3
|
0
|
At this stage, it was too soon to decide between the OR and the AND combinations involving ‘care’ and variants on ‘health’ (indicated by preceding symbol #) as both combinations achieved recall above 90% with similar poor precision. However, the truncated form ‘health*’ was here dropped as an option based on two observations:
- Once the filter is translated for PubMed, retrieval on ‘health*’ would be capped at the first 600 word ending variants, which may reduce recall equivalency between the Ovid Medline and PubMed search filter versions.
- Health* has the same level of recall as ‘health OR healthcare’ when both versions were combined with 'care.mp.' (197/213; 92.5%).
When the two concept groups, 'integrated' and 'health/care', were combined with each other using AND, a significant increase in proxy precision occurred alongside a drop in recall. This effect continued as more terms were successively added to the 'health/care group until precision reached 56%. Table 6 shows the progressive improvement in precision as successive 'within group' terms were added to the basic two concept search.
Table 6. Sequential testing of combined concepts ('integrated' and 'health/care') in the FDS
Searches
|
Recall in FDS (n=213)
|
Proxy precision in Medline
(n=100)
|
n
|
%
|
%
|
(Integrat* OR coordinat*).mp AND health.mp.
|
156
|
73.2
|
28
|
(Integrat* OR coordinat*).mp AND healthcare.mp
|
36
|
16.9
|
35
|
(Integrat* OR coordinat*).mp AND health*.mp
|
157
|
73.7
|
25
|
(Integrat* OR coordinat*).mp. AND care.mp.
|
163
|
76.5
|
40
|
OR'd combinations
|
(Integrat* OR coordinat*).mp AND (health OR healthcare).mp
|
157
|
73.7
|
33
|
(Integrat* OR coordinat*).mp. AND (care OR health OR healthcare).mp.
|
165
|
77.5
|
30
|
AND'd combinations
|
#(Integrat* OR coordinat*).mp. AND care.mp. AND (health OR healthcare).mp.
|
155
|
72.8
|
56
|
(Integrat* OR coordinat*).mp. AND ((care AND health) OR healthcare).mp.
|
155
|
72.8
|
49
|
The best candidate combination was determined to be the search indicated by the #. This is: (Integrat* OR coordinat*).mp. AND care.mp. AND (health OR healthcare).mp. This construct kept precision above 50% without significantly reducing recall.
Each of the remaining terms in the frequency table were then tested in combination with this construct in three ways:
- Combined with the construct using AND
- Combined with the construct using OR
- Combined within the health/care construct using OR to test if synonymous with that concept.
Terms that reduced precision on their addition to the search construction, or which could not maintain or increase recall when precision remained steady, were eliminated from the developing search string. This included the MeSH term Delivery of health care, Integrated and textwords: support, patient(s), community, data, hospital, primary care, clinical, disease, and design.
The final best performing search at the end of this process was:
((Integrat* OR coordinat*) AND care AND (health OR healthcare)).mp. AND (og.xs. OR services.mp. OR delivery.mp. OR management.mp. OR systems.mp. OR model.mp. OR organi?ational.mp. OR quality.mp.)
This search string, labelled Search Component 1, has 71.8% recall (153/213) and 62% proxy precision in the FDS. The fact that it was unable to retrieve n=60 (28.2%) of citations from the FDS suggested other concepts and terms closely associated with integrated care may remain unidentified in the FDS. Although these terms were not of sufficiently high frequency to be identified within the TIS recall cut-off threshold of ≥ 25%, they may serve as highly discriminatory search terms.
Statistical analysis of non-retrieved FDS citations
When the titles and abstracts of the remaining 60 FDS citations were submitted to frequency analysis using WriteWords, two high frequency terms emerged: 'disease management.mp.' and 'case management.mp'. These two terms were trialled using a process parallel to the one used to build Search Component 1, i.e. by successively adding concept groups to this new concept group to steadily improve precision while keeping recall close to an acceptable baseline. Details of this are provided as Additional File 1.
Table 7 shows the final 'disease management' concept search (Search Component 2) and its effect on overall recall and precision when combined with Search Component 1.
Table 7. Search Components 1 and 2 within the FDS
Searches
|
Recall in FDS (n=213)
|
Proxy precision in Medline
(n=100)
|
n
|
%
|
%
|
Component 1
|
((Integrat* OR coordinat*) AND care AND (health OR healthcare)).mp. AND (og.xs. OR services.mp. OR delivery.mp. OR management.mp. OR systems.mp. OR model.mp. OR organi?ational.mp. OR quality.mp.)
|
153
|
71.8
|
62
|
Component 2
|
(Disease management OR Case management).mp. AND (care OR health OR healthcare).mp. AND (og.xs. OR services.mp. OR delivery.mp. OR model.mp. OR quality.mp.)
|
55
|
25.8
|
69
|
Component 1 OR component 2
|
(((Integrat* OR coordinat*) AND care AND (health OR healthcare)).mp. AND (og.xs. OR services.mp. OR delivery.mp. OR management.mp. OR systems.mp. OR model.mp. OR organi?ational.mp. OR quality.mp.)) OR (((Disease management OR Case management) AND (care OR health OR healthcare)).mp. AND (og.xs. OR services.mp. OR delivery.mp. OR model.mp. OR quality.mp.))
|
180
|
84.5
|
63
|
This left 33 citations not retrieved by this search. Of these, five citations contained the low frequency textword 'Integrated care' and were from the International Journal of Integrated Care (IJIC)—a key journal title for researchers within the field of integrated care. These citations had not been retrieved for one of two reasons: (1) they did not contain any of the other search terms from Search Component 1 (e.g. care OR health/care) and (2) they were not indexed with MeSH terms or lacked an abstract. In fact, as of 5 October 2017, 26% of all IJIC citations (146/558) lacked an abstract making them only retrievable via terms in the article or journal title. Based on this information, we tested the addition of the straight phrase 'Integrated care' to the search construction as both a journal title keyword (.jw) and a search on title, abstract and MeSH terms (.mp.)
Integrated care.mp,jw. OR (((Integrat* OR coordinat*) AND care AND (health OR healthcare)).mp. AND (og.xs. OR services.mp. OR delivery.mp. OR management.mp. OR systems.mp. OR model.mp. OR organi?ational.mp. OR quality.mp.)) OR (((Disease management OR Case management) AND (care OR health OR healthcare)).mp. AND (og.xs. OR services.mp. OR delivery.mp. OR model.mp. OR quality.mp.))
This addition of 'integrated care'.mp,jw to the search retrieved all five IJIC citations and increased recall to 88.3% (188/213) within the FDS—an increase of 3.8%. Although this is a slight increase, we retained the .jw search element as the journal was uniquely identified with the integrated care concept. Furthermore, the EAG agreed that comprehensive retrieval would be supported by inclusion of content from this journal. Currently no other journals are picked up by searching 'integrated care' across the journal title field in Medline.
The final Ovid Medline search filter (above) therefore achieved 88.3% recall in the FDS (95% CI [83.3-91.9]) with a reduced final proxy precision of 53%. As this constitutes high recall with precision very close to the minimal level of acceptance, this search filter was designated Broad Integrated Care Search (or Broad ICS). The overall conceptual model of Broad ICS is shown in Figure 4.
Figure 4. Conceptual diagram of Broad ICS
Creating filter variants
A narrower (or more precise) integrated care search filter was created by returning to the TIS frequency table and testing less frequent terms with high face validity for their proxy precision in the FDS. Terms with individual levels of precision ≥ 75% in the FDS were then systematically and successively tested in combination with each other until maximum proxy precision was reached without allowing recall to go below 50%. The combination with the best level of precision was:
*Delivery of health care, integrated/ OR Integrated care.mp,jw. OR (integrated health*.mp. AND og.xs.)
This construct included a ‘focused’ version of the MeSH term Delivery of health care, Integrated as indicated by the asterisk before the term. This restricts retrieval to articles deemed by an indexer to have a major focus on this concept. This version of integrated care search achieved only 55.9% recall (117/213) in the FDS (95% CI [49.2-62.4]) but a precision estimate of 95% outside of the FDS. We have designated it Narrow Integrated Care Search (or Narrow ICS).
Phase 4. Filter validation
When both versions of the filter were searched within the FVS (n=214), the results were:
- Broad ICS: 0% recall, 95% CI [80.7-90.0]
- Narrow ICS: 8% recall, 95% CI [53.1-66.2]
Between the FDS and FVS, recall differed by 2.2% for the Broad ICS and 3.9% for the Narrow ICS.
Phase 5. Filter translation for PubMed
The main differences between the Medline version and its PubMed translation is the inability to directly translate Ovid's single character wildcard ? within ‘organi?ational’ for PubMed. This meant having to spell out the different forms of the term within PubMed (i.e. organizational OR organisational). The PubMed versions of both filters are shown as Table 8.
Table 8. Final PubMed translations of Ovid Medline ICS search filters
|
Ovid Medline version
|
PubMed translation
|
Broad ICS
|
Integrated care.mp,jw. OR (((Integrat* OR coordinat*) and care and (health OR healthcare)).mp. and (og.xs. OR services.mp. OR delivery.mp. OR management.mp. OR systems.mp. OR model.mp. OR organi?ational.mp. OR quality.mp.)) OR (((Disease management OR Case management) and (care OR health OR healthcare)).mp. and (og.xs. OR services.mp. OR delivery.mp. OR model.mp. OR quality.mp.))
|
Integrated care[tw] OR integrated care[ta] OR (((Integrat*[tw] OR coordinat*[tw]) AND care[tw] AND (health[tw] OR healthcare[tw])) AND (og[sh] OR services[tw] OR delivery[tw] OR management[tw] OR systems[tw] OR model[tw] OR organisational[tw] OR organizational[tw] OR quality[tw])) OR ((Disease management[tw] OR Case management[tw]) AND (care[tw] OR health[tw] OR healthcare[tw]) AND (og[sh] OR services[tw] OR delivery[tw] OR model[tw] OR quality[tw]))
|
Narrow ICS
|
*Delivery of health care, integrated/ OR Integrated care.mp,jw. OR (integrated health*.mp. and og.xs.)
|
(Delivery of health care, integrated[majr:noexp] OR Integrated care[tw] OR Integrated care[ta] OR (integrated health*[tw] AND og[sh]))
|
Narrow ICS (PubMed version) retrieved 312/534 (58.4%) of the fully reconstructed gold standard set in PubMed and Narrow ICS (Medline) retrieved the same proportion of the gold standard within Ovid Medline. Similarly, the two versions of Broad ICS retrieved 467/534 (87.5%) of the gold standard set in their respective databases. An examination of the set of citations not retrieved by each version revealed them to be identical, meaning the PubMed broad and narrow ICS versions have both quantitative and qualitative equivalence with their Medline counterparts.
Phase 6. Post hoc precision estimate
The results of the post hoc precision analysis of retrieved citations from PubMed are shown in Table 9. All final performances for both filters are provided in Table 10.
Table 9. Post hoc precision estimates for three variant sets of retrievals across PubMed
|
Broad ICS
2012-2016 sets
(%)
|
Broad ICS + topic search terms, sorted by Best Match (%)
|
Narrow ICS
2012-2016 sets
(%)
|
Reviewer 1
|
37
|
83
(Community health)
|
62
|
Reviewer 2
|
55
|
52
(Mental health)
|
68
|
Reviewer 3
|
40
|
70
(Aged care)
|
83
|
Reviewer 4
|
48
|
78
(Rural health)
|
71
|
Reviewer 5
|
57
|
71
(Acute care)
|
81
|
Average post hoc precision (%) across five variant sets (CI)
|
47%
95% CI
[43% to 52%]
|
71%
95% CI
[67% to 75%]
|
73%
95% CI
[69% to 77%]
|
Table 10. Final performance of filters
Search filter version
|
Recall in FDS (%)
95% CI
|
Recall in FVS (%)
95% CI
|
Post hoc precision (%)
(Single set of n=100 citations)
|
Averaged post hoc precision (%)
(Five sets of n=100 citations)
|
Broad ICS
|
88.3
(83.3-91.9)
|
86.0
(80.7-90.0)
|
53.0
|
47.0
|
Narrow ICS
|
55.9
(49.2-62.4)
|
59.8
(53.1-66.2)
|
95.0
|
73.0
|