Every day for almost a year, hundreds of COVID Tracking Project contributors from all walks of life have compiled, published, and interpreted vitally important COVID-19 data as a service to their fellow Americans. On March 7, the one-year anniversary of our founding, we will release our final daily update and our data compilation will stop. Documentation, analysis, and archival work will continue for another two months, and we will bring the project to a close in May.
The seeds of this choice have been with us from the beginning. From its inception, this project was both unlikely and unprecedented: No one expected a volunteer pop-up collective to publish and interpret public health data for the United States for the first year of a global pandemic. We began the work out of necessity and planned to do it for a couple of weeks at most, always in the expectation that the federal public health establishment would make our work obsolete. Every few months through the course of the project, we asked ourselves whether it was possible to wind down. Instead, we saw the federal government continue to publish patchy and often ill-defined data while our world-famous public health agencies remained sidelined and underfunded, their leadership seemingly inert.
That we were able to carry the data through a full year is a testament to the generosity of the foundations and firms that gave us the resources we needed, to the counsel of our advisory board, to The Atlantic’s support for our highly unusual organization, and above all to the devotion of our contributors. But the work itself—compiling, cleaning, standardizing, and making sense of COVID-19 data from 56 individual states and territories—is properly the work of federal public health agencies. Not only because these efforts are a governmental responsibility—which they are—but because federal teams have access to far more comprehensive data than we do, and can mandate compliance with at least some standards and requirements. We were able to build good working relationships with public health departments in states governed by both Republicans and Democrats, and these relationships helped bring much more data to into public view. But ultimately, the best we could hope to do with unstandardized state data was to build a bridge over the data gaps—and the good news is that we believe we can now see the other side.
Although substantial gaps and complexities remain, we have seen persuasive evidence that the CDC and HHS are now both able and willing to take on the country’s massive deficits in public health data infrastructure, and to offer the best available data and science communication in the interim. Good signs so far include:
December 7: Release of high-quality, facility-level data about COVID-19 hospitalizations.
December 18: Publication of community profile reports that were generated for the decision-makers on the Coronavirus Task Force.
December 20: CDC Vaccine Tracker posted, including substantial state-level vaccine data and topline figures for long-term care facilities.
January 27: Publication of state-level COVID-19 reports that had previously only been sent to governors.
January 27: Reinstatement of regularly scheduled federal COVID-19 briefings, including data-centric pandemic updates from the new director of the CDC.
By bringing increased scrutiny to federal sources as part of the final phase of our work, we think we can also encourage public health agencies to publish more comprehensive COVID-19 data. And that has always been our goal: to be a force for reality by assembling data that better reflects how many people are getting tested and how many are sick, hospitalized, and dying.
Our project contributors have poured thousands of hours of their lives into this crisis response—many of them for almost a year. They’ve borrowed time from other work, quit jobs, postponed graduate degrees, and missed time with their families. More than 400 people have contributed directly to our data compilation efforts, and some of our team members have done more than 300 shifts. It’s time to release these brilliant people back to their lives. We will never be able to thank them enough.
It’s hard to understand all the ways that our data—which started with a single spreadsheet—has been used in the world. Two different presidential administrations have cited it in strategic plans, academic and scientific researchers have used it in nearly 800 papers, and it has helped ground media coverage of the pandemic by national, international, and local news outlets. It’s been read into the Congressional record, mentioned in proposed legislation, used repeatedly by lawmakers to demand answers about the pandemic, and cited in numerous federal lawsuits. Amid so much institutional failure, it has been a sustaining force to see regular people all over the country patch this vital data together every day, united in their commitment to their fellow humans.
Even after we’ve posted our final daily update, our absence won’t leave Americans without outside checks on federal public health data. We have been one point in a constellation of extra-governmental projects to track and analyze data about the pandemic, including the work of the New York Times, Johns Hopkins University, USA Facts, APM Research Lab’s Color of Coronavirus team, the Marshall Project, the Los Angeles Times, Kaiser Family Foundation, The University of Minnesota’s COVID-19 Hospitalization Tracking Project, and Bloomberg, along with many academic research projects and local outlets.
We are extraordinarily grateful to everyone who has worked on the project or used the datasets, visualizations, and analysis we’ve assembled. While we know the pandemic will end one day, it is still far from over. We won’t forget the many people we’ve lost this year, or the people whose suffering has not been represented in the official record. And we’re so grateful to have had this meaningful work during a difficult moment in human history. The COVID Tracking Project might never have been “a paradise built in hell,” as Rebecca Solnit described the communities that can arise after a disaster, but it has been a source of light for many of us in an otherwise dark time.
We will spend the next five weeks publishing detailed guidance for data users who are looking for sources to replace our work, as well as gap analyses noting where replacement data still doesn’t exist. We’ll also be announcing our full archive plans, but the short version is that we’re in the process of finding a home for all of our work—including a lot of metadata and culture documentation that has never been public—and we’re making sure that it comes as close to lasting forever as anything can on the internet. Until March 7, we will still be posting daily updates and a weekly analyses, along with our usual deep dives on data definitions and pipelines. Keep an eye on our feeds for much more over the coming weeks and months.
Looking back on a year of collecting COVID-19 data, here’s a summary of the tools we automated to make our data entry smoother and why we ultimately relied on manual data collection.
As The COVID Tracking Project comes to a close, here’s a summary of how states reported data on the five major COVID-19 metrics we tracked—tests, cases, deaths, hospitalizations, and recoveries—and how reporting complexities shaped the data.
When analyzing COVID-19 data, confirmed case counts are obvious to study. But don’t overlook probable cases—and the varying, evolving ways that states have defined them.