Graphing the pandemic with open data
Visualize
A lot of COVID-19 data is available through online REST APIs. With a little ingenuity and some open source tools, you can extract and analyze the data yourself.
Travel is broadening. You experience different cultures, see issues from different angles, and meet fun and unique people. I love living abroad, but I have to admit that I have less access to the news from home. Unfortunately, news outlets today spend more time on opinion than facts, and sometimes I just want the unvarnished truth. During the pandemic era, I am especially anxious to learn about the challenges faced by my family back home.
The good news is that a lot of open data on COVID-19 is available on the Internet via REST API calls. This data might be too dry for some, but if you want to get your own impressions of the COVID-19 crisis, without the sometimes intrusive "analysis" of newscasters and commentators, this free Internet data is a valuable resource. This article describes how to access and display freely available COVID-19 data using open source tools. And, if you've already had your fill of COVID-19 information, the techniques I'll describe in this article will also help you with other kinds of government and academic data available through REST APIs.
CovidAPI
The CovidAPI project [1] provides COVID data based on the well respected Johns Hopkins University dataset [2]. The original Johns Hopkins data is available in CSV form. Argentine software developer Rodrigo Pomba converted the data to JSON time series format. According to the documentation [3], the goal of the project is to make the data "queryable in a manner in which it could be easily consumed to build public dashboards."
The CovidAPI data is organized by country using the list of ISO country codes [4]. Use the curl
command in a terminal window to send a URL that will access the data for a specific country and date:
curl https://covidapi.info/api/v1/country/USA/2020-06-15
Calling this API, in this case with curl
, will return the following JSON object.
{ "count": 1, "result": { "2020-06-15": { "confirmed": 2114026, "deaths": 116127, "recovered": 576334 } } }
This command is an easy way to get the daily report for a specific country and date, but if you want to visualize and analyze the data yourself, you might prefer to request the values for all dates. If you leave off the date, you'll get the data for all available dates:
curl https://covidapi.info/api/v1/country/USA
This command returns one giant JSON message containing the records for every day in the dataset. However, I ran into problems parsing out the individual days due to the dashes that were part of the date. As an alternative approach, I chose to write a small Bash script to fetch the count of the day records then iterate through the list of days to retrieve the COVID-19 information for each day (Listing 1). Most of the steps are self-explanatory if you are familiar with Bash scripts, but see the comment lines for additional information.
Listing 1
covid19.sh
001 #!/bin/bash 002 003 get_count() 004 { 005 GATHERCOUNTRY=$1 006 007 # get the count of days since the start 008 CNT=`curl https://covidapi.info/api/v1/country/$GATHERCOUNTRY 2>/dev/null | jq '.count'` 009 echo $CNT 010 } 011 012 gather_state() 013 { 014 GATHERSTATE=$1 015 cnt=$2 016 017 echo gather state $GATHERSTATE $cnt days 018 DATAFILE=covid19_${GATHERSTATE}.Data 019 020 # from beginning until yesterday 021 IDX=$cnt 022 023 # absolute values, followed by daily delta 024 if [ ! -f $DATAFILE ] 025 then 026 echo "date positive hospitalized deaths " > $DATAFILE 027 fi 028 029 while [ $IDX -gt 0 ] 030 do 031 #DATE=`date --date="$IDX days ago" +%Y%m%d` 032 DATE=`date --date="12:00 today -$IDX days" +%Y%m%d` 033 FILEDATE=`date --date="12:00 today -$IDX days" +%Y-%m-%d` 034 035 CMD="curl https://api.covidtracking.com/v1/states/${GATHERSTATE}/${DATE}.json" 036 037 grep $FILEDATE $DATAFILE >/dev/null 038 if [ $? -eq 1 ] 039 then 040 SINGLE=`$CMD 2>/dev/null ` 041 error=`echo $SINGLE | jq ".error"` 042 if [ $error == "true" ] 043 then 044 # nothing to output 045 # echo oops looks bad $DATE 046 047 positive=0 048 hospitalized=0 049 deaths=0 050 else 051 positive=`echo $SINGLE | jq ".positive"` 052 deaths=`echo $SINGLE | jq ".death"` 053 hospitalized=`echo $SINGLE | jq ".hospitalizedCurrently"` 054 055 if [ $positive == "null" ]; then positive=0; fi 056 if [ $deaths == "null" ]; then deaths=0; fi 057 if [ $hospitalized == "null" ]; then hospitalized=0; fi 058 echo $DATE $IDX 059 fi 060 echo "$FILEDATE $positive $hospitalized $deaths " >> $DATAFILE 061 062 #else 063 # echo not doing $FILEDATE 064 fi 065 066 067 IDX=$(($IDX - 1)) 068 done 069 } 070 071 072 gather_data() 073 { 074 GATHERCOUNTRY=$1 075 cnt=$2 076 077 echo gather $GATHERCOUNTRY 078 DATAFILE=covid19_${GATHERCOUNTRY}.data 079 080 # absolute values, followed by daily delta 081 if [ ! -f $DATAFILE ] 082 then 083 echo initializing 084 echo "date confirm deaths recover " > $DATAFILE 085 fi 086 087 # from beginning until yesterday 088 IDX=$cnt 089 090 deltadeaths=0 091 deltaconfirm=0 092 deltarecover=0 093 094 while [ $IDX -gt 0 ] 095 do 096 #DATE=`date --date="$IDX days ago" +%Y-%m-%d` 097 DATE=`date --date="12:00 today -$IDX days" +%Y-%m-%d` 098 099 CMD="curl https://covidapi.info/api/v1/country/${GATHERCOUNTRY}/${DATE}" 100 101 grep $DATE $DATAFILE >/dev/null 102 if [ $? -eq 1 ] 103 then 104 105 # 106 # we only do this if this date hasn't been retrieved 107 # 108 SINGLE=`$CMD 2>/dev/null ` 109 ERR=`echo $SINGLE | grep "404 Not Found" | wc -l` 110 111 # 112 # only if date found 113 # 114 if [ $ERR -eq 0 ] 115 then 116 deaths=`echo $SINGLE | jq '.' | grep deaths | sed 's/.*: //' | sed 's/,//' ` 117 confirm=`echo $SINGLE | jq '.' | grep confirm | sed 's/.*: //' | sed 's/,//' ` 118 recover=`echo $SINGLE | jq '.' | grep recover | sed 's/.*: //' | sed 's/,//' ` 119 120 echo $DATE $IDX 121 echo "$DATE $confirm $deaths $recover " >> $DATAFILE 122 #else 123 # echo not doing $DATE 124 fi 125 126 fi 127 128 IDX=$(($IDX - 1)) 129 done 130 } 131 132 CNT=`get_count USA` 133 echo $CNT days 134 gather_data USA $CNT 135 136 # just use state 2 letter code (ie. ny for New York) 137 gather_state mn $CNT 138 gather_state ca $CNT 139 gather_state ia $CNT 140 gather_state mo $CNT 141 gather_state mt $CNT 142 143 144 CNT=`get_count DEU` 145 gather_data DEU $CNT 146 147 CNT=`get_count ESP` 148 gather_data ESP $CNT 149 150 CNT=`get_count GBR` 151 gather_data GBR $CNT 152 153 gnuplot graphs.gp
One part of the script that might not be obvious is how I calculate the date.
DATE=`date --date="12:00 today -$IDX days" +%Y-%m-%d`
The date command subtracts a given number of days from the current date and formats the output as a YYYY-MM-DD string.
Of course, it would be inefficient to download hundreds of days worth of data each time if I just want yesterday's data. Because of this, the script verifies if the data has been retrieved before making the REST API call to retrieve the data. The first time you run the script, you get all the data, and on subsequent runs, you only get the new data.
Data by State
Retrieving COVID-19 figures for a whole country is useful for comparing one country against another, but it is less than helpful if you want to know what is really happening locally. The Covid Tracking Project [5] provides COVID-19 data by US state. (Similar projects track pandemic data for other countries – consult your local health resources.)
Just like at the national level, it is possible to retrieve all COVID-19 information by US state for a given date with REST API calls. For instance, to obtain data on the state of Minnesota for August 21, 2020:
curl https://api.covidtracking.com/v1/states/mn/20200821.json | jq "."
The state data, unlike the national data, contains an amazing number of statistics submitted by the health authorities. The sheer number of values provided can perhaps only truly be appreciated by an epidemiologist or a statistician.
You can see how many people were hospitalized on a given day or the total number of hospitalizations up until that day.(Listing 2) Also included were the incre- mental changes in positive as well as neg- ative tests results. Using this information, I could have graphed how quickly COVID-19 is spreading by graphing positiveCasesViral
vs totalTestsViral
or by graphing hospitalizedCurrently
over time.
Listing 2
Hospitalizations and Deaths by State
{ "date": 20200415, "state": "MN", "positive": 2321, "negative": 41245, "pending": null, "hospitalizedCurrently": 197, "hospitalizedCumulative": 445, "inIcuCurrently": 93, "inIcuCumulative": 175, "onVentilatorCurrently": null, "onVentilatorCumulative": null, "recovered": 853, "dataQualityGrade": "A", "lastUpdateEt": "4/14/2020 17:00", "dateModified": "2020-04-14T17:00:00Z", "checkTimeEt": "04/14 13:00", "death": 87, "hospitalized": 445, "dateChecked": "2020-04-14T17:00:00Z", "totalTestsViral": 43566, "positiveTestsViral": null, "negativeTestsViral": null, "positiveCasesViral": null, "fips": "27", "positiveIncrease": 156, "negativeIncrease": 1540, "total": 43566, "totalTestResults": 43566, "totalTestResultsIncrease": 1696, "posNeg": 43566, "deathIncrease": 8, "hospitalizedIncrease": 40, "hash": "9521e0ce1f2b1ef5aaf1a81bec48961d85170d78", "commercialScore": 0, "negativeRegularScore": 0, "negativeScore": 0, "positiveScore": 0, "score": 0, "grade": "" }
I settled on gathering positives, numbers of people hospitalized, and deaths at a state level. I didn't try to verify that all state totals added up at the national level, as I suspect there can be delays in the reporting chain from the local to the national level.
I can imagine that massive effort to come up with a common structure, as well as getting all the participants to gather all of these types of data. Despite all of their efforts, sometimes the data returned contained fields that were blank, had zeros, or simply had the value null.
Comparing Countries
My COVID-19 gathering script will collect the information from four different countries (Great Britain, USA, Spain, and Germany), as well as statistics for a few US states. This data is temporarily stored in a text file but the information that I am gathering essentially looks similar to Table 1.
Table 1
Sample of Downloaded Data
Country | Date | Confirmed | Deaths | Recovered |
---|---|---|---|---|
USA |
2/21/20 |
15 |
0 |
5 |
Germany |
2/21/20 |
16 |
0 |
14 |
England |
2/21/20 |
9 |
0 |
8 |
Spain |
2/21/20 |
2 |
0 |
2 |
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Fedora Asahi Remix 41 Available for Apple Silicon
If you have an Apple Silicon Mac and you're hoping to install Fedora, you're in luck because the latest release supports the M1 and M2 chips.
-
Systemd Fixes Bug While Facing New Challenger in GNU Shepherd
The systemd developers have fixed a really nasty bug amid the release of the new GNU Shepherd init system.
-
AlmaLinux 10.0 Beta Released
The AlmaLinux OS Foundation has announced the availability of AlmaLinux 10.0 Beta ("Purple Lion") for all supported devices with significant changes.
-
Gnome 47.2 Now Available
Gnome 47.2 is now available for general use but don't expect much in the way of newness, as this is all about improvements and bug fixes.
-
Latest Cinnamon Desktop Releases with a Bold New Look
Just in time for the holidays, the developer of the Cinnamon desktop has shipped a new release to help spice up your eggnog with new features and a new look.
-
Armbian 24.11 Released with Expanded Hardware Support
If you've been waiting for Armbian to support OrangePi 5 Max and Radxa ROCK 5B+, the wait is over.
-
SUSE Renames Several Products for Better Name Recognition
SUSE has been a very powerful player in the European market, but it knows it must branch out to gain serious traction. Will a name change do the trick?
-
ESET Discovers New Linux Malware
WolfsBane is an all-in-one malware that has hit the Linux operating system and includes a dropper, a launcher, and a backdoor.
-
New Linux Kernel Patch Allows Forcing a CPU Mitigation
Even when CPU mitigations can consume precious CPU cycles, it might not be a bad idea to allow users to enable them, even if your machine isn't vulnerable.
-
Red Hat Enterprise Linux 9.5 Released
Notify your friends, loved ones, and colleagues that the latest version of RHEL is available with plenty of enhancements.