We are happy to announce the release of the COVID Tracking Project City Dataset. This dataset was built by a special team focused on gathering, cleaning, analyzing, and reporting on COVID-19 data at the city and county level from May 29 to October 21, 2020. We began collecting data following the murder of George Floyd, when Black Lives Matter protests were held across the US, raising concerns about the spread of COVID-19 at large public gatherings. This project expanded from 30 cities and counties to 65 as the Sunbelt states’ cases and hospitalizations began to spike in June.
This dataset captures the virus’s transmission in 65 cities and counties across the country. Many of these metropolitan areas only report the current day’s totals and remove older data from their public health dashboards so that no historical archive is available. As a result, it’s often impossible to see the impact of the virus on a particular geography over time. Our dataset captures this historical information. It is the only available metropolitan dataset that includes race and ethnicity, which allows us to improve our understanding of how COVID-19 disproportionately affects communities of color.
We have completed our data collection on this project and want to share what we’ve learned from viewing COVID-19 at the local level. Five months in, we’ve seen that local data tells a vastly different story than state-level data. Not only do trends emerge in city and county data before appearing at the state level, but state-level data also obscures local patterns.
COVID-19 fatality rates vary by location
We used data for our 65 metropolitan areas to calculate the Case Fatality Rate (CFR) over time. CFR measures the proportion of confirmed cases that result in a fatality. This metric can help us understand and compare the severity of the disease across locations of varying population size. One limitation of CFR is that it only counts people who have a lab-confirmed COVID-19 case. While a more complete measure may be Infection Fatality Rate (which looks at the proportion of infections that result in fatality), it’s difficult to capture the true number of infected persons because of insufficient testing throughout the pandemic. As a result, the CFR presented here represents an approximation of the true fatality rate.
What CFR can tell us
Our City Dataset runs from late May through late October, and over this five-month period the five metro areas below had the highest average CFR. Having the highest CFR during our data collection period doesn’t necessarily signal recent increases in those cities and counties. We also measured how CFR changed in our selected cities and counties from late spring to mid-fall.
Of the five locations with the highest CFR in the chart above, Detroit; Wayne County, Michigan; New York City; and Orleans Parish, Louisiana; all had a decrease in CFR over time, while Hidalgo County, Texas, experienced a striking increase. Of the 13 locations with rising CFR, four are in Florida, four are in Texas, and all but one are in the geographic south. An increase in CFR later in the pandemic suggests the region lacks adequate protections to reduce the spread of the novel coronavirus.
When you look at state-level data, it’s impossible to see which communities within the state are being most deeply impacted by COVID-19. We compared CFR in metropolitan areas to state-level data and found that as of October 21, 13 cities and counties reported higher CFR’s than the states in which they are located.
How COVID has impacted communities of color
One unique aspect of The COVID Tracking Project’s City Dataset is that it includes race and ethnicity when that information is provided by the city and county. Our data can fuel new insights into the disproportionate impact of COVID-19 on communities of color. Of the 47 cities we tracked that report Black cases and deaths, about half (24) have an average CFR that is higher for Black people than for white people. This disparity is particularly pronounced in the state of Michigan, where the CFR for Black people is over 9 percent but under 5 percent for white people. COVID-19 has been deadlier for both populations in the city of Detroit, where the CFR for Black people is 13.53 percent and 9.62 percent for white people.
Chicago is among the many cities where COVID-19 has a disproportionate impact on Black communities. In late May, the city reported a 9 percent CFR for Black people and about 8 percent for white people. At the close of our data collection in late October, the case fatality rate for both Black and white people in Chicago had declined, but the gap between the two racial groups had widened: The CFR was 6.7 percent for Black people and 4.2 percent for white people.
Why it’s so hard to track cities
Cities and counties decide individually how to report their COVID-19 data, just as states do. Localities determine what and how many metrics to release to the public, what demographic categorizations to use, and how to display their data. Our data-gathering processes took these variations into account.
We standardized our race and ethnicity categorizations based on the US Office of Management and Budget’s guidelines, which are used by the Census Bureau. In states that report race and ethnicity data in tandem, we were able to calculate the overall Hispanic and Non-Hispanic demographic makeup. Both our race and ethnicity data sections include the category ‘unknown’. This category includes data from people who refused to answer the question and places where the information has not yet been included. Some of the counties and cities use categories like "undisclosed" or "pending" in place of "unknown." For localities that indicated a total count greater than the combined race or ethnicity categories, we calculated the number of unknowns.
Some states lump together race categories to produce a categorization outside of the OMB’s list. For example, some counties in Arizona and Nevada group Asian and Pacific Islanders. Boston combines Asian, Native American, and Pacific Islander into one category. This limits our ability to compare city data based on racial and ethnic groups.
When determining the sources to use for data collection, we started at the city level. If that data was unavailable, we gathered information at the county level. In some unique cases, states also lump counties together into one sum. Nevada did this with the entire southern third of the state, which includes Las Vegas.
Many cities and counties did not report their data daily, and some did not provide updates on weekends; as a result, reports on Mondays were often artificially inflated.
Advice to users of our data
Our list of cities and counties has evolved since we began tracking, so some datasets begin and end later than others. Also, due to technical errors or inconsistencies in city and county reporting, some days may be missing counts. The datasets’ public notes provide context, identify limitations in the data, and summarize what is provided.
We suggest that you pair our results with the 2018 ACS five-year-estimates from the US Census Bureau to have a sense of the overall population of each county and city.
Graphics by Júlia Ledur, Emma Rubin, and Clarissa Wong
Data analysis: Catherine Pollack
Editors: Alice Goldfarb, Joanna Pearlstein, and Jessica Malaty Rivera
Acknowledgments: Many people at The COVID Tracking Project contribute to the research and data compilation efforts that make the City Data special project possible. Special thanks to the City Data team that compiled and cleaned this data every day. The team and spreadsheets were managed by Nicki Camberg. Shift leads were Nicki Camberg, Aarushi Sahejpal, and Sharon Wang. Team leads were Artis Curiskis and Kara Oehler. Contributors included Sonya Bahar, Eliot Brody, Nicki Camberg, Hannah Cummins, David Eik, Rebecca Glassman, Alice Goldfarb, Hannah Hoffman, Pat Kelly, Jeffrey Ndubisi, Judith Oppenheim, Rick Palmer, Noah Parker, Catherine Pollack, Aarushi Sahejpal, Isabel Sepúlveda, Eva Sher, Ganda Suthivarakom, Katharine Teigen, Sharon Wang, Allysa Warling, and Jessie Zhang.
Nicki Camberg is a student journalist studying Political Science and Statistics at Barnard College, and the City Data Manager at CTP.
Artis Curiskis is outreach & reporting co-lead at the COVID Tracking Project and collaboratively runs the CTP special projects Long-Term Care COVID Tracker and City Data.
Kara Oehler is outreach & reporting co-lead at the COVID Tracking Project and collaboratively runs the CTP special projects Long-Term Care COVID Tracker and City Data.
Judith Oppenheim works on Texas City Data and state outreach at The COVID Tracking Project.
Catherine Pollack is a third year PhD candidate in the Quantitative Biomedical Sciences program at Dartmouth College. Her dissertation research combines data science, epidemiology, and public policy to combat online health misinformation.
Aarushi Sahejpal is a Shift Lead on Long-Term Care and City Data at the COVID Tracking Project. She also studies International Relations & Data Science at American University
Sharon Wang is a volunteer on the COVID Tracking Project and has supported data collection, process design and other various needs. Sharon is also the CEO and Founder of Pragmatic Strategy Co, a boutique consulting firm helping organizations create longevity in their business through operational and leadership effectiveness.
More “Hospitalization and Death Data” posts
Looking back on a year of collecting COVID-19 data, here’s a summary of the tools we automated to make our data entry smoother and why we ultimately relied on manual data collection.
As The COVID Tracking Project comes to a close, here’s a summary of how states reported data on the five major COVID-19 metrics we tracked—tests, cases, deaths, hospitalizations, and recoveries—and how reporting complexities shaped the data.
During the worst parts of the COVID-19 pandemic, the United States struggled to keep up with COVID-19 death counts.