As we approach our final day of data compilation, we’ve been running a series of posts designed to help advanced data users make the switch from CTP-compiled data to federal datasets, but we also want to point to some equivalent federal data sources for people who may follow us on Twitter or use our charts, but who don’t work directly with our API. The following recommendations are limited to federal data sources—we’ll post other useful datasets separately—and are aimed at people who want the simplest possible way to understand the overall situation as the country enters its second year of dealing with COVID-19.
Daily updates: Community Profile PDFs
If you’d like a single place to look every day to see how the United States is doing, your best bet may be the Community Profile Reports, which are now posted each day as a PDF on HealthData.gov. This report foregrounds trend charts of four familiar metrics: cases, deaths, new hospital admissions, and test positivity.
This part of the report is compiled from three sources we’ve analyzed in detail:
the Unified Hospital Dataset from HHS, which is also extremely well matched with the hospitalization data we’ve compiled at The COVID Tracking Project
the Unified Testing Dataset from HHS, which is less comparable to the testing data we have compiled from states, but which is probably useful for understanding trends, if not absolute numbers
The COVID Tracking Project’s topline charts have shown the number of people currently hospitalized—not the number of people newly admitted to the hospital—but both metrics are useful ways of understanding how many people are severely ill with COVID-19.
A note on test positivity data: The COVID Tracking Project has not calculated test positivity since the summer of 2020, when it became apparent that there were substantial mismatches between the numerator and denominator available from public data. Some of these problems remain in the federal dataset, largely because the available federal testing data appears to be incomplete—something you can read more about in our testing data post. Even so, test positivity can be useful at the national level, particularly when used to watch for trends over time using a seven-day average.
The Community Profile Reports also include population-adjusted charts of the four topline metrics by US Census region—something that will be familiar to people who have relied on our regional charts and weekly updates.
Finally, the reports also include many country-level maps with color-coded representations of several metrics and the change in those metrics over the preceding week, including population-adjusted maps for cases and deaths.
Daily updates: HHS Protect Public Data Hub
The HHS Protect Public Data Hub provides topline absolute numbers and percentage change for national cases, hospitalizations, and deaths, as well as the HHS calculated test positivity rate and its change from the previous week.
It also includes an embedded map version of the Community Profiles data otherwise only available as a PDF, a trend view of the national testing dataset, and a facility-level hospital utilization map. The federal testing dataset diverges substantially from the one we compiled from states in several ways, and we've written both an introductory post and a deep analysis about the dataset and about these differences.
Daily updates: the CDC COVID Data Tracker
The CDC’s COVID Data Tracker includes three of the four elements of our national four-up chart, though they are distributed across two locations within the tracker. In the tracker’s Daily and Total Trends area under Cases & Deaths in the navigation bar, you can toggle between trend-lines for cases and deaths. This chart also includes a seven-day moving average line—which is your friend if you’re trying to extract meaning from data with big day-of-week effects.
You can also view case and death trends by state using a dropdown menu at the top of the chart. The federal case and death data behind these charts is readily available for download.
If you look under Healthcare Systems in the navigation bar, you can see daily trends for New Hospital Admissions and Patients Hospitalized with Confirmed COVID-19. These trend charts are available only at the national level—there are no state/territory breakdowns available for hospitalizations in the tracker itself.
Testing data is available for individual counties in the County View area of the CDC COVID Data Tracker (where it is available if you click on a county and scroll down to see trend charts for all available metrics). This data is publicly available for download, however, as a CSV. If you intend to use this information—especially for calculating test positivity—we would encourage you to read our full analysis of the testing data.
Weekly updates: State Profile Reports
The federal government’s State Profile Reports, released once a week, begin with a dashboard-style page with topline metrics including cases, tests, and deaths per 100,000 people, updates on cases and deaths in skilled nursing facilities (SNFs), also known as nursing homes; and a variety of hospitalization metrics by state, by FEMA region, and for the US as a whole.
These reports also include the elusive testing trend chart, in this case at the state level and paired with test positivity, but not at the national level.
The State Profile Reports also include county-level updates and national summaries of many other indicators.
Weekly updates: COVID Data Tracker Weekly Review
The CDC also releases a new weekly interpretative summary of the data on the CDC COVID Data Tracker on Fridays. We are proponents of weekly views of bumpy COVID-19 data, and we are therefore very happy to see this interpretative offering from the CDC. This summary includes helpful overall interpretation of the movement of the pandemic in the US, with extremely helpful breakdowns of vaccine data and information on known variants of SARS-CoV-2. Our recommendation of this analysis comes with a caution, however: in its first few weeks, the weekly review has not always provided sufficient context to allow a general readership to understand some of the trends being discussed.
Unlike most other COVID-19 data housed at the CDC, the COVID Data Tracker—and the Weekly Review—uses data arranged by date of report, rather than (for example) by date of symptom onset, date of test specimen collection, or date of death. Data arranged by date of report has several important advantages, but also many downsides. Most notably, this data displays very strong day-of-week effects and is also extremely vulnerable to predictable rise-and-drop artifacts after holidays or other major disruptions, like storms and natural disasters, that affect the ability of counties and states to report their data.
The CDC Weekly Reviews so far have had difficulty contextualizing this characteristic of their data for readers, suggesting, for instance, that “Nationally, the number of COVID-19 deaths continue to fluctuate.” Reported deaths do fluctuate—very predictably—by day of week, and are especially vulnerable to holiday/weather effects. Additionally, states are still reporting substantial backlogs of deaths that took place in previous months of the pandemic. But the underlying trends, which can easily be seen in a weekly view of the data, are nevertheless meaningful: Deaths are falling, and have been falling since January 13, using the seven-day average of deaths by date of report. (This chart uses COVID Tracking Project data, but cases and deaths in our data almost perfectly match those in the CDC's Data Tracker.)
We think it’s important to convey this kind of context to readers so that data artifacts don’t entirely obscure the underlying trends. Very helpfully, the most recent Weekly Review did acknowledge the likely effects of severe winter storms on vaccination rates—we would hope to see similar attention to confounding factors included in future analyses of other metrics as well.
Beyond the complexities of working with data arranged by date of report, other confounding factors are critical to interpret the available data on the pandemic to date. The Weekly Review noted on February 26 that cases reported on February 24 remain “much higher than what was seen during the first peak in the pandemic.” This is true, but without historical context, this statement obscures more than it reveals: The United States was testing tiny numbers of people at the beginning of the first surge of the pandemic in the spring of 2020, so numbers of confirmed cases are extremely low for that period—and not at all comparable to case numbers now, almost a year later. Hospitalization and death figures for the three case surges we’ve seen in the United States make it especially clear that a huge number of cases were missed in the first surge because of insufficient testing, and discussions of case numbers in the first surge should always include this caveat.
Things to keep in mind when you’re watching the data
In the process of compiling COVID-19 data for 56 jurisdictions for 365 days, we’ve developed a lot of data habits that help us interpret the data we collect in a responsible, cautious way. Most will be familiar to people who work with data for a living, but perhaps less so to people who normally don’t spend all their time immersed in data dictionaries and CSVs. Here, in compressed form, is our extremely informal cheat sheet for watching COVID-19 data in the United States.
Understand dating schemes. Data arranged by date of report is wiggly and vulnerable to reporting artifacts. Using a seven-day average or a weekly view helps with this. Data arranged by more epidemiologically meaningful data schemes is always incomplete for the most recent days or weeks—or potentially for longer—and requires careful framing to avoid the impression that all metrics are always dropping to zero.
Study data definitions. Make sure you’re using the right metric to answer the questions you’re bringing to the data. Cumulative figures, for example, show the effect of the whole pandemic, while daily change metrics trace specific movements over time. Hospitalization metrics, in particular, are often confusing for new data users. Our federal data 101 series can help you find complete documentation for each of the federal datasets we’ve recommended in this post.
Look for confounding factors. Have holidays or natural disasters affected reporting process or the availability of testing? If a state’s data looks weird, has that state issued any warning or explanation on its dashboard or in local news reports? If a state’s test positivity is being reported at an extremely high percentage, has that state had problems reporting total tests recently?
Use established relationships between metrics to help guide your interpretation. For example, we know that for the majority of the pandemic in the US, reported deaths have lagged behind reported cases by two to three weeks. So if we see cases and deaths jump simultaneously, it’s likely that we’re looking at a reporting artifact, rather than a change in the reality of the pandemic. Similarly, if cases appear to rise sharply while current hospitalizations remain steady, it would be wise to look for reporting or test-availability problems that might result in case backlogs being added to more recent numbers.
Be conservative about what can be known. It’s rare that topline COVID-19 data can show, by itself, the effect of specific events. Without contact tracing and case investigations, it’s usually very hard to make a direct causative claim that a given event produced a subsequent movement in the data. If your aim is to produce credible, sturdy analysis, resist the impulse to use public COVID-19 data to make causative claims when only correlative evidence is available.
Be faithful, not tactical. If something can’t be understood from the data, say so. The possible tactical advantage gained by overstating what the data can explain will be lost if the data later contradicts the claim. It’s better to be transparent every time, even when the result is sometimes muddled or inconclusive.
Here’s a walkthrough of the Centers for Disease Control and Prevention (CDC) daily case dataset and what you should know about it.
The CDC provides two different datasets regarding COVID-19 fatalities. Here’s a walkthrough of how they compare to each other and to The COVID Tracking Project’s data.
Here’s a walkthrough of the Department of Health and Human Services’ dataset and what you should know about it.