To understand the COVID-19 outbreak in the United States, we need to know who has gotten tested, who has gotten sick, and who has died. Is this disease hitting all communities in similar ways? Or do we see health-outcome disparities in this outbreak as we do in so much other public health data? Political leaders, community advocates, and policy experts all want answers to those questions, and to get them, we need data about the race and ethnicity of those whose lives have been affected by the virus.

No one has written more powerfully on this issue than Ibram Kendi, a historian at American University who writes on the relationship of race and public policy. In a series of recent essays for The Atlantic (which also houses the COVID Tracking Project), he described the urgent need to gather demographic information about who contracts the virus and who dies from it if we’re to understand the outbreak and protect the most vulnerable communities. “Without racial data, we can’t see whether there are disparities between the races in coronavirus testing, infection, and death rates,” Kendi wrote. “If we can’t see racial disparities, then we can’t see the racist policies behind any disparities and deaths. If we can’t see racist policies, we can’t eliminate racist policies, or replace them with anti-racist policies that protect equity and life.”

After Kendi began writing about the need for racial data, the COVID Tracking Project team joined with his colleagues at American University’s Antiracist Research & Policy Center to create the COVID Racial Data Tracker, which is being developed to record, analyze, and regularly update racial data on the pandemic within the United States. The federal government’s failure to assemble and report this data has left it to journalists, researchers, and volunteers to produce this resource for the public.

Over the last 10 days, the COVID Racial Data Tracker team has sized up the scale of the task. When we began, fewer than 10 states and zero territories reported any race or ethnicity data related to COVID-19 testing or outcomes. As of Tuesday afternoon, 30 states and the District of Columbia are reporting something in this category, according to our data availability tracker. This is major progress, and it’s due to the combined efforts of many advocacy groups, local journalists, and public health officials. Major obstacles to creating a useful dataset remain—but more on that below.

Here, you can see our first effort at gathering this data beginning on Sunday, April 12. Over the next month we will test a twice-weekly update schedule, with the possibility of changing the cadence if the data calls for it. Over the next week or two, we’ll make this dataset available on our website and in our API, not just in this spreadsheet. But as with our very first version of the COVID Tracking Project data—which also went live as a spreadsheet—we believe this data needs to be published now. Later on, we’ll make it possible to do more with this data, and with adjacent datasets that can help contextualize it.

In the meantime, we want to identify the central problems we see with the data in its current form. State public health authorities can meaningfully change our understanding of possible racial and ethnic disparities in COVID-19 health outcomes, but they need to move quickly to improve the quantity and quality of data available.

Many states aren’t reporting racial data at all

This is the most obvious problem. Every state should report demographic data. Only 30 states and the District of Columbia report racial data of any kind. Ideally, we’d have regularly updated demographic data for all COVID-19 tests and deaths. Instead, with the exception of Illinois, we’re getting racial data for positive cases only. And just 16 states currently provide racial demographics for COVID-19 deaths.

While an improvement, racial data on positive cases and deaths is not enough. We really need this data for all tests. Only then can we begin to understand if there are problems with basic access to healthcare in the COVID-19 context. Given the known racial and ethnic inequities in the US healthcare system, we can expect to see similar patterns play out during the outbreak as well. Only comprehensive, high-quality demographic data on COVID-19 testing and outcomes can help us understand access problems. Then, those numbers can help guide public health responses that serve everyone in the country equally.

Where states do report racial data, “unknown” remains a huge category

For the states that do report race data about positive cases, not all cases have the patient’s race attached. In fact, 38 percent of the 194,000 cases captured in the initial data on April 12 were listed as “unknown” race. This may be due to the way the test data was captured at the point-of-care, but it could have been lost somewhere else along the reporting chain.

This is a huge gap in our understanding. In some states, the “unknowns” are a majority of the patient data. Here are the states with greater than 50 percent unknowns: Arizona, Connecticut, Georgia, Massachusetts, and Washington.

For deaths, the “unknown” problem still exists, but it is much smaller. Only 16 states currently report this information—but of those states, 89 percent of deaths include the racial demographic data needed to understand disparities.

States report racial data in non-standard ways

Few states report racial data in ways that align with the ways the government normally records data. This complicates apples-to-apples comparisons with existing demographic data. Racial data categories that aren’t consistent across states also prevent us from making accurate comparisons between states or offering summary national statistics.

Most states collapse ethnicity into race, obscuring uneven outcomes across distinct communities

Latinx (or “Hispanic” as the states put it) is an ethnic category that can map onto race in different ways: there are White Mexicans, Black Cubans, Asian Ecuadorians, Indigenous Colombians, etc. But only the state of Georgia reports race separately from ethnicity.

The rest of the states, including large-outbreak states New York, Massachusetts, and Washington, lump “Hispanic” into their racial data. This maps to many people’s common understanding of these cultural groups, but it is not the standard way of categorizing this data for official government or public health purposes.

There is a very good reason for this. There are well-known disparities between the social and economic outcomes of White and Black Hispanic groups. We don’t even know, for example, if Black Puerto Ricans in New York are being lumped into the state’s “Black” or “Hispanic” categories—or if they are even being consistently categorized one way or another. To accurately assess disparities in testing and outcomes, we need complete demographic data that treats race and ethnicity as separate but related data layers that can be used to pinpoint community needs.

States are only one geographic unit for understanding racial data in the COVID-19 outbreak

Racial and ethnic groups have been geographically clustered by state policy for as long as there has been a United States. At the national level, Black, Asian, Native American, and “Hispanic” people are concentrated across the country in distinct and complex ways. Most of these groups have a larger presence in a relatively small number of states. At the state level, Black people in particular are overrepresented in urban areas thanks to policies like redlining and predatory inclusion that explicitly encouraged segregation and environmental injustice. And if you look within a city, the concentration of different racial and ethnic groups is never evenly distributed among zip codes or census tracts.

These structural realities have a major effect on different communities’ health and will have significant repercussions for the COVID-19 outbreak. For example, if you look at Michigan, it is clear the state is an epicenter of the US outbreak. However, most of the cases are clustered in and around Detroit, which has a much higher concentration of African-Americans than other parts of the state. The same is true for New Orleans and Louisiana, Chicago and Illinois, and New York City and New York State.

In order to assess what’s actually happening and prevent bad statistical analyses, the states and other governments should release data about smaller geographical subunits. This data should provide as much spatial resolution as possible, which probably means zip codes. New York City has begun to do so.

Looking ahead

As we’ve seen over the past 10 days, we expect data availability and quality to change a lot. Our experience so far with collecting other kinds of COVID-19 data gives us a cautious optimism that these changes will trend in the direction of transparency. That said, optimism must be accompanied by mechanisms of accountability.

In the next week, we’ll begin grading states on a much wider variety of their reporting, including testing, hospitalization, and racial demographic data completeness. These grades will be available on and will be bundled inside the API as well.

Note: Beginning July 1, 2020, the Antiracist Research & Policy Center is the Boston University Center for Antiracist Research.


Alexis C. Madrigal is a staff writer at The Atlantic, a co-founder of the COVID Tracking Project, and the author of Powering the Dream: The History and Promise of Green Technology.


