On March 16th, 2020, White House released this Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset. Our Data Mining Group in CS@UIUC has created a literature search interface, EvidenceMiner, for automatic textual evidence mining for COVID-19 on the COVID-19 Open Research Dataset Challenge (CORD-19) corpus (2020-03-13).

EvidenceMiner: System Architecture

Our EvidenceMiner system on COVID-19 can be found here.

Below is the overal architecture of the EvidenceMiner system. It consists of two major components: an open information extraction pipeline and a textual evidence retrieval and analysis pipeline. The open information extraction pipeline includes two functional modules: (1) distantly supervised NER, and (2) meta-pattern-based open information extraction; whereas the textual evidence retrieval and analysis pipeline includes three functional modules: (1) textual evidence search, (2) annotation result visualization in the original document, and (3) the most frequent entity and relation summarization.


EvidenceMiner: Results on COVID-19

Textual Evidence Retrieval

Method nDCG@1 nDCG@5 nDCG@10
BM25 0.714 0.720 0.746
LitSense 0.599 0.624 0.658
EvidenceMiner 0.855 0.861 0.889

Case Studies

Here are some case studies to demonstrate that EvidenceMiner can help scientific discoveries on COVID-19. In the example shown below, scientists want to find some evidence for using ultraviolet (UV) to kill the SARS-COV-2 virus. In the top-retrieved results shown below, we see many supporting sentences such as the top one “Ultraviolet-C (UV-C) radiation represents an alternative to chemical inactivation methods”. More interestingly, we found the fifth sentence “Whole UV-inactivated SARS-CoV (UV-V), bearing multiple epitopes and proteins, is a candidate vaccine against this virus” indicating that UV-inactivation also has the potential for vaccine development against the virus. Scientists are very interested in this result that inspired them to conduct UV-related COVID-19 vaccine development.


Moreover, EvidenceMiner allows more flexible queries, such as the relational patterns, if the users are not sure which specific entity to search. In the example shown below, scientists want to find some evidence related to “CORONAVIRUS cause DISEASEORSYNDROME”. In the top-retrieved results shown below, we see many highly-related evidence sentences, such as “HCoV-OC43, HCoV-229E, HCoV-HKU1, and HCoV-NL63 cause mild, self-limiting upper respiratory tract infections”. This function is supported by our meta-pattern discovery methods and has not been incorporated by any existing systems.


We show some more examples. In the example shown below, doctors want to study if remdesivir is a potential drug treatment for COVID-19. Remdesivir is currently a very actively studied drug that has the potential to be repurposed for COVID-19 treatment. Similarly, in the top-retrieved results shown below, we can see many sentences regarding the clinical trials for remdesivir against COVID-19. An additional example is shown for amodiaquine as a potential drug for COVID-19.



Last, we show that EvidenceMiner is also useful for evidence finding for controversial topics. In the example shown below, people are interested to see if wearing masks can help prevent the COVID-19 spreading. In the top-retrieved results shown below, we see many related statements, among them are clearly two opposite opinions. For example, some statements support the use of masks to prevent the virus, such as “COVID-19 is transmitted by saliva droplets, …, which can be prevented by wearing masks”. While other statements are against the effectiveness of wearing masks, such as “Although surgical masks are in widespread use …, there is no evidence that wearing these masks can prevent the acquisition of COVID-19 …”. An interesting future work is to classify the opinions by their semantic polarity and even automatically generate summarizations of the evidence retrieval results.



Our team for creating this EvidenceMiner for COVID-19:


If you find our EvidenceMiner for COVID-19 useful, please cite our paper. Thanks!