The COVID Tracking Project’s data log contains structured notes on all data we collected in our API, COVID-19 cases, tests, hospitalizations, recoveries, and deaths sourced from state and territorial health department websites between March 7, 2020 and March 7, 2021. You can read more about the log and the work it supported at The COVID Tracking Project in this post.
What’s in the data log
The log tracks both notable events for state data and a record of our own historical revisions to COVID-19 data in our API. (For details on how we interpreted metrics and assigned them to fields in our API to capture, you may find our Data Annotations useful.
On the state side, we tracked the following occurrences:
- State-provided data notes about metrics we tracked, like changes in methodology or skipped updates. We made an effort to gather every state data note we could, but may have missed some.
- Unexplained data anomalies like drops in cumulative metrics or unexpected jumps we suspected were artifacts.
- New metrics provided by a state or disappearing metrics from a state.
- Missing data for a state, or a specific metric within a state, either because the state did not update its data or updated its data after our publish time. Whenever possible, we went back and patched missed updates.
We also track the following changes we made to our own data:
- Historical revisions, including both data entry error corrections and backfills of missing historical data.
- Changes to our interpretation of a data point from a state that resulted in moving a metric to another API field.
- Switches in API logic, such as changing a state’s totalTestResults source.
We started regularly updating the log on July 22, 2020. Before then, we did not regularly track state data anomalies in a structured format; however, the “Data Sources and Notes” section for each state on our Data page contains basic notes on anomalies prior to that point, though less granular than what we include in the log. Meanwhile, on The COVID Tracking Project side, our GitHub Issues repository contains a thorough record of any changes that we made to the data. You can also access a versioned history of our data in GitHub.
As part of our data archiving process, we are porting data notes from our website, Slack, and GitHub into this log so that our data archive contains a centralized and thorough record of state data anomalies and our data decision-making. We are also making limited changes to the data to address any outstanding data issues. This log will update live as we add new entries.
How to use the log
Our data log is stored in Airtable, a simple database system that functions similarly to a spreadsheet. Below is an embedded version of the log. Descriptions of each field are accessible by hovering over the “i” button next to the column name. Using the “Filter” option, you can view log entries affecting a particular state, data field, or range of dates. If you’d like to download the log, you can click the “Download CSV” button in the lower right corner of the embed.