The COVID Tracking Project exists because every person, newsroom, and government agency in the United States deserves access to the most complete picture of COVID-19 testing data that can be assembled.
In the early stages of the COVID-19 pandemic, some wealthy countries pursued a strategy of widespread testing that allowed them to successfully pursue a containment strategy. Others, including the US, have been much, much slower to implement mass testing. As has been extensively documented elsewhere, the US testing effort started very late, rolled out slowly and unevenly, and has yet to scale widely in most parts of the country.
In addition to this failure to test early or scale up quickly, central authorities have elected not to publish complete testing data. From March through mid-May, the CDC published a case count for identified cases of lab-confirmed COVID-19 confirmed by testing, though it significantly lagged behind other sources of this data, like the Johns Hopkins University tracker, which has been the gold standard for US case counts throughout the outbreak.
Note: On about May 9, 2020, the CDC began publishing case counts, deaths, and basic testing data in a new dashboard. We are currently evaluating this data, which mostly matches up with official state public health reporting, but also shows several significant gaps.
The problem with relying on case counts is that a simple count of identified COVID-19 cases doesn’t show the true location or comparative severity of outbreaks. Simple case counts show where people are being tested, not where people are sick. To illustrate the point, a state that reports 3 cases of COVID-19 after testing 2,000 people is probably in a very different stage of its outbreak than a state that reports 3 cases but has only tested 20 people—but if all you have is a case count, those states look exactly the same.
Understanding the shape, speed, and location of regional outbreaks requires the entire testing picture: how many people have actually been tested in each state/territory, when they were tested, and what their results were. That’s where our data comes in.
Because we have not had a complete official account of COVID-19 testing data in the US, we collect the data directly from the public health authority in each US state and territory (and the District of Columbia). Each of these authorities reports its data in its own way, including online dashboards, data tables, PDFs, press conferences, tweets, and Facebook posts. And while many states and territories have slowly moved toward more standard ways of reporting, the actual categories of information are still in flux.
Our data team uses website-scrapers and trackers to alert us to changes, but the actual updates to our dataset are done manually by careful humans who double-check each change and extensively annotate areas of ambiguity. The work of data-gathering from official sources is also now supplemented by a fast-growing group of reporters who are constantly pushing authorities to release more comprehensive information.
When we started the project, building on two independently created reporting spreadsheets, we expected to be updating the data for a few days or maybe a week until complete federal data emerged. It never did, so we’re still here.
We also recognize that part of our work is the creation and maintenance of a historical record of the US government’s response. As the pandemic imprints itself on the country, we are building an accurate record of what actually happened, day by day, state by state. Until that gets done elsewhere, we’ll keep doing the counts.