The Corona virus pandemic is challenging humanity in unprecedented ways. A large amount of data has been collected on the virus as well as case reports, and data scientists can contribute to the efforts to stop this pandemic by analyzing this data.
We would like to bring the attention of the data science community in the university to two such challenges/datasets. The links are provided below:
![]() |
1) COVID-19 Open Research Dataset Challenge (CORD-19) The Israeli Ministry of Health has established a national infrastructure for research and policy on corona disease (“The Corona Research Database”) The Corona Research Database will gather information collected from the Israeli health system about COVID-19, to be used to conduct research on unidentified and secure information. For more information see: https://drive.google.com/open?id=1XDjO1oLPOUw8wy3CdYGP9ILV0p6bYp5C |
![]() |
2) COVID-19 Open Research Dataset Challenge (CORD-19) In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19). CORD-19 is a resource of over 44,000 scholarly articles, including over 29,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in natural language processing and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. Kaggle is issuing a call to action to the world’s artificial intelligence experts to develop text and data mining tools that can help the medical community develop answers to high priority scientific questions. Kaggle is sponsoring a $1,000 per task award to the winner whose submission is identified as best meeting the evaluation criteria. For more information see: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge?utm_medium=email&utm_source=intercom&utm_campaign=CORD-19-research-chal-email |
![]() |
2) The Dataset of Epidemiological Case Reports for COVID-19 The COVID-19 is threatening the health of the entire human population. In order to control the spread of the disease, epidemiological investigations should be conducted, to trace the infection source of each confirmed patient and isolate their close contacts. However, the analysis on a mass of case reports in epidemiological investigation is extremely time-consuming and labor-intensive. Using the latest NLP technology to extract information from epidemiological case reports could extremely accelerate this process. IBM has created a dataset that contains case reports, and has made it available for such analysis For more information see: https://github.com/IBM/Dataset-Epidemiologic-Investigation-COVID19 |
![]() |
3) AWS – A public data lake for analysis of COVID-19 data: AWS Amazon are making a public AWS COVID-19 data lake available – a centralized repository of up-to-date and curated datasets on or related to the spread and characteristics of the novel corona virus (SARS-CoV-2) and its associated illness, COVID-19. Globally, there are several efforts underway to gather this data, and we are working with partners to make this crucial data freely available and keep it up-to-date. Hosted on the AWS cloud, we have seeded our curated data lake with COVID-19 case tracking data from Johns Hopkins and The New York Times, hospital bed availability from Definitive Healthcare, and over 45,000 research articles about COVID-19 and related coronaviruses from the Allen Institute for AI. We will regularly add to this data lake as other reliable sources make their data publicly available. For more information see: https://aws.amazon.com/blogs/big-data/a-public-data-lake-for-analysis-of-covid-19-data/ |
![]() |
4) New NIH Resource to Analyze COVID-19 Literature: The COVID-19 Portfolio Tool |
![]() |
5) EU data portal launches to support COVID-19 research COVID-19 Portfolio Tool The European Commission has launched a data portal for scientists studying the SARS-CoV-2 virus to speed up access to data sets and tools in order to bolster research efforts by encouraging data reuse and open science. The COVID-19 Data Portal is intended to accelerate regional efforts to combat the virus by creating a central repository for storing and sharing available research data, such as DNA sequences, protein structures, data from pre-clinical research and clinical trials and epidemiological data. For more information see: https://techcrunch.com/2020/04/20/eu-data-portal-launches-to-support-covid-19-research/ |
Important papers regarding epidemiology of Corona
a) Imperial College COVID-19 Response Team. Impact of non-pharmaceutical interventions to reduce COVID19 mortality and healthcare demand
b) Li et al. Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2). Science, 16 March 20