We started The COVID Tracking Project in March of 2020. Tens of thousands of data points, 115 blog posts, and more than 2,500 tweets later, we’re still trying to make sense of what happened in COVID data over the past year. We’re also trying to work out what happened inside of our project, and how a bunch of volunteers skilled up and stuck together for so long.
If there’s one thing we’ve learned about the many datasets we’ve wrestled with this year, it’s that all the data—every single point—is the result of human decision-making. Decisions about how to define metrics, what to collect, how to group and publish the data, and—on our side—how to label and interpret it. The same principle holds true for the way we worked together as a group of highly disparate mostly-strangers who self-organized to fill some of the gaps in the country’s public health data systems. Today is my last day at CTP, and for my final post as managing editor of the project, I wanted to look at the decisions that shaped both our visible outputs and the mostly invisible community of labor that made them possible.
We set limits
We founded the project to count tests, and soon found that we needed to capture other basic public health data that wasn’t being published by the federal government: most notably cases, hospitalizations, and deaths. These metrics quickly fragmented into categories like unique people tested, specimens tested, suspected COVID-19 hospitalizations, confirmed COVID-19 hospitalizations, and so on.
Several entities asked us to compile county-level data—and many of our volunteers were interested in doing that work. We elected not to, largely because it was clear that as a core group, we were overextended to just about the breaking point. This was especially true as we launched teams focused on compiling race and ethnicity data and data on COVID-19 in nursing homes and other long-term-care facilities. We even ran a metro-area data tracking team for a while in the summer, but the data was so inconsistent and our people so over-extended that we wrapped it after several months.
Later on, many entities, including funders, assumed we would be tracking vaccine data. We chose not to do that, either—partly because we didn’t have the resources to do that and everything else we were up to. But we also chose not to track vaccinations because we knew that at various points in the pandemic, state and federal agencies had used our data, and we didn’t want to provide any cover for federal agencies to be slow with vaccine data by compiling it ourselves.
And then this past winter, as more federal data began to emerge into public view, we decided to begin working toward a carefully calibrated shutdown of the project. Our organization was never built to replace federal public health data—or to become its own self-sustaining entity—and when the federal government began providing enough data to allow organizations to keep eyes on the pandemic, we immediately began the months-long process of bringing our many tracks of work to an orderly and well-documented conclusion.
We chose radical transparency
Going from building a single spreadsheet to compiling and analyzing national daily COVID numbers used by so many media outlets, government agencies, private organizations, and regular people was an enormous responsibility deeply felt across the project. It was clear from the very beginning of our work that despite our best efforts and constantly evolving processes, we were going to make mistakes: to misinterpret ill-described metrics, to make the wrong micro-decisions about which data source to prioritize on a given day as states slapped together data communication platforms, and to let mis-keyed figures slip through our layers of quality control. We had to be forthright about the mistakes we made, the un-fillable gaps in the data we could compile, and the compromises inherent in providing national estimates built from 56 differently shaped state and territorial datasets.
As we began doing more public-facing analysis of the data, we decided that we had to be extremely cautious with our interpretations. In keeping with our position as data-compilers—not epidemiologists, not modelers, and definitively not COVID pundits—we stuck to what could be known about and through the data. No matter how worried we were as individuals, we didn’t make predictions or try to spin the numbers to show something they couldn’t. We didn’t publish models and we didn’t predict surges (or reprieves) based on minor movements in the data. When we saw patterns that were likely to reappear, like data backlogs, holiday effects, or reporting lags, we explained what we’d seen before, showed our work, and noted where we might see those effects again.
Our research never uncovered evidence of widespread or routine manipulation of the data: Instead, we found that public data streams were limited by overtaxed infrastructure, exhausted public health workers, and politicians who were not eager to make their COVID data public to begin with. We reported on states’ failure to publish race and ethnicity data on COVID-19 cases and deaths. We flagged states that lumped antibody tests in with their PCR tests, inflating test counts and confusing even the CDC. Along with an army of local journalists, we pressed states to report more of the data they already had, and we tried to help those same states quickly publish more comprehensive statistics when our data was used by third parties to calculate test positive rates that created unfair comparisons between jurisdictions.
Above all, we kept our focus on the knowable facts and told the truth about what we found.
We produced the most complete and comprehensively contextualized data we could
A lot of our research and context work was about trying to find out what the numbers states published actually represented. We discovered early on that many states weren’t actually explaining what numbers they were publishing: Some states would post “total tests” but not say whether they were counting people tested or all tests ever performed or something else. Others would post cases, but never say—even in response to direct and repeated questions—whether those cases included probable and confirmed cases, or just confirmed ones.
This lack of clarity was present in most of the metrics we collected, and meant that we spent hundreds, maybe thousands, of person-hours reading footnotes in obscure state PDFs and watching press conferences to try to catch any turns of phrase that would tell us what—and who—was really represented in a given figure. Definitional problems substantial enough to shape whole narratives about the pandemic haunted our work all year, and we tried to communicate both the answers we found and the uncertainty we encountered. For some wings of the project, like the COVID Racial Data Tracker, we had to say as much about the dramatically incomplete state of the data than about what the data itself showed.
A commitment to context also meant that we had to maintain a public log of known irregularities in the data, like backlogs and data adjustments by state public health departments. (Our commitment to trying to spot and understand these artifacts was a major factor in our decision to keep our processes manual and use scrapers and automation only for supporting work. This limited the breadth of our efforts, but substantially increased their depth.)
We also gradually developed a sense of the rhythms of day-of-week effects, lags between metrics, and holiday artifacts that often distorted the data we were able to compile from states. And for all these areas of specialized knowledge we were building internally, we tried to communicate them out to the media and to members of the public in clear, accessible language.
We put care at the center of our work
The COVID Tracking Project was never a traditional mutual aid organization, but mutual-aid principles and structures were close to the heart of our efforts. We attended to what people outside the project needed, to the point of running a full-on COVID data help desk for much of the first year of the project. We explained our methods and distributed the expertise we’d built up as quickly as we could—in our help desk emails, blog posts, tweet threads, training sessions, and media calls—instead of hoarding it. We are trying, for the next two months, to leave behind patterns and systems, both technical and cultural, that will allow other efforts to spring up in times of need and build on what we made.
We also built an internal culture of care. We relied on careful documentation, training, structure, and support to help keep our accuracy up, and explicitly nixed communication and processes that relied on shame. Instead of assigning blame and hoping that doing so would motivate people to try harder not to make mistakes, we assumed that everyone would make mistakes no matter what, because we’re all humans. We went so far as to ban apologies—which didn’t entirely stop them, but did maybe reduce the sting of having gotten something wrong (or missed a meeting or been called away mid-Zoom to comfort a crying kid). We also avoided anything that felt like an injunction to try harder or work faster. We tried to adopt habits of cutting everyone slack and giving each other grace.
Above all, we trusted that our people would, if anything, work too hard and sacrifice too many hours of family time or relaxation or sleep. Instead of trying to motivate people, we mandated breaks and time away—and especially in data entry shifts, we routinely reminded people to work more slowly, both to maintain accuracy and to maintain a deliberate and sustainable approach to the work.
As our organizational structure became less chaotic and began to require more hierarchy to function, we chose team and shift leads who clearly demonstrated not just dedication and expertise, but also deeply rooted care for other contributors as human beings. Whenever we could, we picked two people to lead each team, so that the work and the emotional weight was distributed. Not everything we tried worked, and despite our attempts we saw people burn out under the stress of the project. Other people never found their footing in the work, or never quite clicked with the way we operated culturally, or just got lost in the momentum of following the pandemic up and down three case surges. We failed to be the organization that some of our volunteers needed. But the people who stayed built intentional cultures of support within and between teams to help everyone keep going.
As the months wore on and the work got grimmer, we brought in a professional mental health counselor to talk through coping techniques and ways of framing the traumatic experiences our contributors were going through. We also spun up intentionally lighter social channels to help give people a break from pandemic data, and we held lots of virtual events, including over holidays normally filled with travel and family gatherings, to help fight back against feelings of isolation. For many of us, assembling a daily count of the sick and the very sick and the dead was an act of service that kept us going through a very difficult year—but it also came at a high cost, especially when we found ourselves working with numbers that included digits representing our own friends and family members. And for many of us, doing the work together, in community with each other, made all the difference.
What can be generalized?
Throughout the past year, we’ve heard from organizations and individuals who have hoped that our work might be a model for other efforts, both in crisis response and more traditional organizations. Having been so enmeshed in the work, it’s very difficult for me to tell what could be extracted from our project to strengthen other work elsewhere. But I think it’s probably not the technical systems or process structures that people expect when they look for a model.
It’s important to be clear that we had an extraordinary amount of freedom to do the work that we thought would best serve our communities and our country, without the pressure to become financially sustainable or even to build a leading brand or own a topic. We just did the work, and a lot of great people decided to help. So maybe the choices we made can’t translate to other systems.
That said, I suspect that intentional constraints on scope and scale allow for deeper, more satisfying, and ultimately more useful work. I suspect that a disciplined commitment to messy truths over smooth narratives would also breathe life into technology, journalism, and public health efforts that too frequently paper over the complex, many-voiced nature of the world. And I suspect that treating people like humans who are intrinsically motivated to do useful work in the world, and who deserve genuine care, allows far more people to do their best work without destroying themselves in the process.
Maybe these things don’t translate outside the boundaries of a volunteer-powered crisis project like ours. But especially now, after the wrenching year we’ve all had, I hope they might.
More “How We Made The COVID Tracking Project” posts
Looking back on a year of collecting COVID-19 data, here’s a summary of the tools we automated to make our data entry smoother and why we ultimately relied on manual data collection.
Dating Data: How We Used Multiple Dating Schemes to Provide the Most Complete Picture of the Pandemic
Throughout our year of tracking COVID-19 tests, cases, and outcomes, we were confronted with data organized by numerous dating schemes. Here’s how we came to understand those dating schemes, and the solution we developed for making the best of them.
A system for regularly capturing static images of state COVID-19 websites helped us produce an archive and verify our published data.