Skip site navigation

The COVID Tracking Project exists because every person, newsroom, and government agency in the United States deserves access to the most complete picture of COVID-19 testing data that can be assembled.

Minimal testing, inadequate reporting

In the early stages of the COVID-19 pandemic, some wealthy countries pursued a strategy of widespread testing that allowed them to successfully pursue a containment strategy. Others, including the US, have been much, much slower to implement mass testing. As has been extensively documented elsewhere, the US testing effort started very late, rolled out slowly and unevenly, and has yet to scale widely in most parts of the country.

In addition to this failure to test early or scale up quickly, central authorities have elected not to publish complete testing data. The CDC publishes a case count, which tracks identified cases of COVID-19 confirmed by testing, though it significantly lags behind other sources of this data, like the Johns Hopkins University tracker, which has been the gold standard for US case counts throughout the outbreak. And although the CDC does offer a national-level account of “specimens tested,” this data is incomplete and lagging, and it uses a different unit (specimens tested) for total tests than for positive results (which are counted in people). This makes it impossible to accurately match testing totals with positive tests to infer a complete picture of COVID-19 testing, even at the national level.

Case counts alone don’t tell the story

The problem with relying on case counts is that a simple count of identified COVID-19 cases doesn’t show the true location or comparative severity of outbreaks. Simple case counts show where people are being tested, not where people are sick. To illustrate the point, a state that reports 3 cases of COVID-19 after testing 2,000 people is probably in a very different stage of its outbreak than a state that reports 3 cases but has only tested 20 people—but if all you have is a case count, those states look exactly the same.

Understanding the shape, speed, and location of regional outbreaks requires the entire testing picture: how many people have actually been tested in each state/territory, when they were tested, and what their results were. That’s where our data comes in.

What it takes to get the data

Because we have no complete official account of COVID-19 testing data in the US, we have to get this data from the public health authority in each US state and territory (and the District of Columbia). Each of these authorities reports its data in its own way, including online dashboards, data tables, PDFs, press conferences, tweets, and Facebook posts. And while many states and territories have slowly moved toward more standard ways of reporting, the actual categories of information are still in flux.

Our data team uses website-scrapers and trackers to alert us to changes, but the actual updates to our dataset are done manually by careful humans who double-check each change and extensively annotate areas of ambiguity. The work of data-gathering from official sources is also now supplemented by a fast-growing group of reporters who are constantly pushing authorities to release more comprehensive information.

We’re in for the duration

When we started the project, building on two independently created reporting spreadsheets, we expected to be updating the data for a few days or maybe a week until complete federal data emerged. It never did, so we’re still here.

We also recognize that part of our work is the creation and maintenance of a historical record of the US government’s response. As the pandemic imprints itself on the country, we are building an accurate record of what actually happened, day by day, state by state. Until that gets done elsewhere, we’ll keep doing the counts.