Background: Literature search is challenging when thousands of articles are potentially involved. To facilitate literature search we created TEMAS a Text Mining Algorithm-assisted Search tool that we compared to a PubMed reference search (RS) in the context of etiological epidemiology.
Methods: The 4 steps of TEMAS are: 1) a classic PubMed global search 2) a first sort removing articles without abstracts or containing off-topic terms 3) a clustering step with a descending hierarchical classification regrouping articles in independent classes 4) a final sort extracting from the targeted class the abstracts containing the terms of interest, with a link to the corresponding PubMed articles. Validation was performed for risk factors of breast cancer. We estimated the precision and recall rate compared to RS. Average precision and discounted cumulative gain (DCG) were also computed to perform a ranking-based evaluation. We also compared TEMAS results with articles selected in two meta-analyses.
Results: For risk factors of breast cancer, breastfeeding, mammographic density, oral contraceptive, and menarche were explored. TEMAS consistently increased precision vs RS (from 23% to 32%), with a recall rate from 95% to 97%, and divided the number of selected articles to read from 2.3 to 4.8 times. Mean average precision for 100 articles was 47.4% for TEMAS vs 20.9% for PubMed ranked by best match, and DCG showed a consistent improvement for TEMAS compared to PubMed best match.
Discussion: TEMAS divided the results of a literature search by 3.2, and improved the precision rate, the average precision, and the DCG compared to RS for epidemiological studies. Reducing the number of selected articles inevitably impacted the recall rate. However, it remained satisfactory and did not bias the corpus of information. Moreover, the recall rate was 100% for the two meta-analyses we analyzed, which suggests that the loss of recall rate observed above concerned articles not relevant enough to be included in the meta-analyses.
Conclusion: TEMAS provides a user-friendly interface for non-specialists of literature search confronted with thousands of articles and appeared useful for meta-analyses.