Since its formation in 1946, the CDC has been the nation’s cornerstone for disease prevention and health promotion. As a federal agency within the Department of Health and Human Services (HHS), its primary role is to protect the United States from threats that endanger the public health. US public health professionals look to the CDC for scientific leadership, expertise, and guidance. For decades, the CDC has coordinated efforts across states and standardized epidemiological data and methods, giving us a nation-wide snapshot of new diseases as they form.
In the case of COVID-19, it took more than 15 weeks from the first reported case in the US for the CDC to release their COVID-19 Data Tracker. The 56 different datasets produced by US states and territories demonstrate the problem of reporting this data without national guidance. There is inconsistency in reporting case counts, completed tests, and death tolls, and these numbers are reported in ways that make it very difficult to compile an accurate national picture of the pandemic.
The launch of the CDC’s new COVID Data Tracker is a major step. Ideally, disease modelers, researchers, and public health authorities would be working from the same data. The general public, too, should be able to trust that there is one set of reliable numbers. Unfortunately, the new data from the CDC doesn’t get us all the way there.
Five days after the launch of the CDC Data Tracker, The COVID Tracking Project at The Atlantic released a detailed evaluation of the new CDC data. In the paper, we compare the CDC’s COVID-19 data with the corresponding data publicly reported by the states and the District of Columbia. For many states, the testing numbers from the CDC and the testing numbers we compile from official state sources paint different pictures of the current state of testing.
We understand how complicated this data is—we’ve been gathering and analyzing it ourselves for months. Some differences in the state and CDC datasets are to be expected, and in over half the states, the testing numbers fall within 10% of one another. Other discrepancies are too large to ignore: in 13 states, the testing numbers differ by over 25%.
The CDC is uniquely positioned to unify and reconcile the many inconsistent datasets from the states and territories. Dozens of our volunteers, most of whom have already been working on this data for weeks or months, worked through nights and weekends to do this analysis. We hope our work will help state and federal agencies understand and close the gaps we’ve identified. Once that happens, we can turn the greater part of our attention to other areas where we have insufficient data about this pandemic. Until then, we’ll be here every day to bring you the numbers.
___
Download: Assessment of the CDC’s New COVID-19 Data Reporting, v1.0 (May 18, 2020)
All the data we used for analysis is publicly available on our GitHub repo. This report is licensed CC-BY 4.0. Please attribute it to “The COVID Tracking Project at The Atlantic.” You can contact us anytime at https://covidtracking.com/contact.
Jessica Malaty Rivera has an MS in Emerging Infectious Diseases and is the Science Communication Lead at The COVID Tracking Project.
Amber Wojcek is a B2B marketing director in Sanford, Florida, and a contributor to The COVID Tracking Project.
More “Testing Data” posts
How Probable Cases Changed Through the COVID-19 Pandemic
When analyzing COVID-19 data, confirmed case counts are obvious to study. But don’t overlook probable cases—and the varying, evolving ways that states have defined them.
20,000 Hours of Data Entry: Why We Didn’t Automate Our Data Collection
Looking back on a year of collecting COVID-19 data, here’s a summary of the tools we automated to make our data entry smoother and why we ultimately relied on manual data collection.
A Wrap-Up: The Five Major Metrics of COVID-19 Data
As The COVID Tracking Project comes to a close, here’s a summary of how states reported data on the five major COVID-19 metrics we tracked—tests, cases, deaths, hospitalizations, and recoveries—and how reporting complexities shaped the data.