The Corona virus pandemic is challenging humanity in unprecedented ways. A large amount of data has been collected on the virus as well as case reports, and data scientists can contribute to the efforts to stop this pandemic by analyzing this data.

We would like to bring the attention of the data science community in the university to two such challenges/datasets. The links are provided below:



1) COVID-19 Open Research Dataset Challenge (CORD-19)

The Israeli Ministry of Health has established a national infrastructure for research and policy on corona disease ("The Corona Research Database")

The Corona Research Database will gather information collected from the Israeli health system about COVID-19, to be used to conduct research on unidentified and secure information.
Health Ministry staff are currently working to establish a corona repository for research and data collection and absorption from the health maintenance organizations (HMOs) and hospitals In order for research to start as soon as possible.

For more information see:

iStock 1208953647 

2) COVID-19 Open Research Dataset Challenge (CORD-19)

In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 44,000 scholarly articles, including over 29,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease.

Kaggle is issuing a call to action to the world's artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions. Kaggle is sponsoring a $1,000 per task award to the winner whose submission is identified as best meeting the evaluation criteria.

For more information see:

 iStock 1212934028

2) The Dataset of Epidemiological Case Reports for COVID-19

The COVID-19 is threatening the health of the entire human population. In order to control the spread of the disease, epidemiological investigations should be conducted, to trace the infection source of each confirmed patient and isolate their close contacts. However, the analysis on a mass of case reports in epidemiological investigation is extremely time-consuming and labor-intensive.

Using the latest NLP technology to extract information from epidemiological case reports could extremely accelerate this process. IBM has created a dataset that contains case reports, and has made it available for such analysis

For more information see:

 iStock 1214461706

3) AWS - A public data lake for analysis of COVID-19 data:

AWS Amazon are making a public AWS COVID-19 data lake available – a centralized repository of up-to-date and curated datasets on or related to the spread and characteristics of the novel corona virus (SARS-CoV-2) and its associated illness, COVID-19. Globally, there are several efforts underway to gather this data, and we are working with partners to make this crucial data freely available and keep it up-to-date. Hosted on the AWS cloud, we have seeded our curated data lake with COVID-19 case tracking data from Johns Hopkins and The New York Times, hospital bed availability from Definitive Healthcare, and over 45,000 research articles about COVID-19 and related coronaviruses from the Allen Institute for AI. We will regularly add to this data lake as other reliable sources make their data publicly available.

For more information see:

 iStock 1215347333

4) New NIH Resource to Analyze COVID-19 Literature: The COVID-19 Portfolio Tool 
The National Library of Medicine at NIH joined the White House and key industry and university leaders to release the COVID-19 Research Dataset (CORD-19) and call on the AI community to develop text mining tools that help analyze and summarize the over 45,000 coronavirus articles. The CORD-19 dataset represents the most comprehensive, freely available library of machine readable coronavirus scholarly literature to date, with hundreds of AI tools and technologies already created.Building on this effort, the NIH Office of Portfolio Analysis (OPA) has assembled a comprehensive listing of COVID 19 publications and preprints that is freely available to the public and coupled with a user-friendly portfolio analysis interface for querying the full text and supplemental data. The COVID-19 portfolio is updated daily with new literature selected for inclusion by subject matter experts. It draws upon NLM’s PubMed resource for citations and abstracts of published biomedical literature.

5) EU data portal launches to support COVID-19 research COVID-19 Portfolio Tool 

The European Commission  has launched a data portal for scientists studying the SARS-CoV-2 virus to speed up access to data sets and tools in order to bolster research efforts by encouraging data reuse and open science.

The COVID-19 Data Portal is intended to accelerate regional efforts to combat the virus by creating a central repository for storing and sharing available research data, such as DNA sequences, protein structures, data from pre-clinical research and clinical trials and epidemiological data.

For more information see:

Important papers regarding epidemiology of Corona

a) Imperial College COVID-19 Response Team. Impact of non-pharmaceutical interventions to reduce COVID19 mortality and healthcare demand

b) Li et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2). Science, 16 March 20


dsrc logo 2020 color 1 copy

© 2019 DSRC. All Rights Reserved.

Contact Us

University of Haifa

Address: 199 Aba Khoushy Ave.
                 Mount Carmel, Haifa, Israel.

TEL: +972-542688302