Counting the lives we have lost from COVID-19 is an important measure of the pandemic’s impact. While deaths are far from the only negative consequence from the novel coronavirus, 145,447 people have died in the US thus far, according to official reports from the 56 states and territories we track.1
An increase in deaths generally follows an increase in cases of COVID-19, but this does not immediately show up in the data. The lag in reporting these deaths has caused confusion, making it more difficult to implement and sustain public health measures to combat COVID-19. As such, understanding the chronology of these deaths is vital, not only for national records, but in understanding what comes next.
In addition to the two ways states classify deaths due to COVID-19 (confirmed and probable), there are two primary methods of conveying when people with COVID-19 died.
The first is to date deaths by the day they are reported (hereafter “counts by date of report”). This data frequently lags behind the date the death actually occurred, as deaths are not instantly reported to state health departments. Even so, counts by date of report show up in public data before the second method of dating deaths, which assigns each reported death to the date when the person actually died (hereafter “counts by date of death”). This difference is at the heart of the confusion over when COVID-19 fatalities have happened.
We’ll dive into some of the complexities of using these two methods by looking at death data from Arizona, Florida, and Texas. As of July 31, all three states provide deaths by date of report and by date of death, making for an easy comparison.
Counts by date of report
Using the date of report to count deaths offers two distinct advantages: First, the data does not change over time. Once a date’s data is entered, it remains constant and does not require continual retroactive changes across the dataset. Second, recent values in the time series are immediately useful on the day the data is released. If a chart is created on June 6, the graph will have a clear, useful value for that day.
This method has downsides, too. You’ll notice a distinct day-of-week pattern to the numbers when using reported deaths. Typically, numbers on Sundays and Mondays are much lower than other days of the week. It’s not that deaths take weekends off—the humans who report on them do. The jagged nature of these daily values means that using a smoothing function (usually a 7-day rolling average) can be a better way to evaluate the trends in deaths by date of report.
Counts by date of death
Counting (and visualizing) deaths by the date of death is more precise, pinning each death to the date when someone actually died. This method also solves the day-of-week reporting problems we saw earlier. But it also presents drawbacks, some of which can be quite confusing for audiences not accustomed to epidemiological methods and conventions.
Since there is a delay between the time someone dies and the moment authorities become aware of their death, counting by date of death will always produce datasets and graphs that show incomplete—and therefore confusingly low—numbers for the most recent days. Often this incomplete period can be up to two weeks or even more (see the shaded area in the graph above).
The other problem with relying solely on counts by date of death is that this method constantly changes historical values as more death certificates are matched to the correct date. A date that showed 3 deaths when viewed on June 1 might display 30 on June 14 after death certificates are reconciled.
Here’s how that looks in Arizona. If you visit the Arizona Department of Health Service's site on COVID-19 over a period of weeks, you’ll see vastly different numbers for deaths on every visit. Watch July 7 in the GIF below - check the site on the 10th, and you’d see only 14 deaths. Come back two weeks later, and you’d see 77.
Unfortunately, these changing values are often poorly labeled. If an Arizonan visits the AZDHS site, the only indication they will see that the official graph by date of death is incomplete is a small text note below the chart. We believe that if states are using this method, they would better serve the public by clearly marking the incomplete period with faded dates as shown above.
To recap: Counting deaths by date of report allows for the closest thing to a “real-time” view into COVID-19 fatalities, though the seven-day average is more useful than the daily values. Counting deaths by date of death is more accurate in the long run, but lacks immediacy.
But how different are these values in the real world? Is it viable to track trends using date of report, or are the two methods wildly divergent? We took a look at combined deaths in three major hotspots, Arizona, Florida, and Texas, to find out.
When plotted as daily values, on days we can reasonably call “complete,” deaths by date of report were likely to be an undercount of the true number of deaths for that day. Deaths by date of death were higher on 85 of 121 days (70%) from March 15 to July 13. In order to compare trends more easily, let’s look again at the seven-day averages.
The shape of these curves is strikingly similar. As deaths started to climb during these recent outbreaks, the curve of the reported deaths rose sharply. However, this was really a reflection of deaths that had started increasing over the previous two weeks. It is likely that the actual deaths curve will continue to track with the reported curve. Deaths counted by date of report deaths may peak at an earlier date than deaths counted by date of death, but the trends are highly correlated.
The same pattern appears in each of these states individually. The time series below shows that if we look at deaths by date of death in Arizona, Florida, and Texas, they began rising before deaths by date of report. In retrospect, the disconnect between case rises and death increases was narrower than previously believed.
Our analysis suggests that tracking deaths by date of report is an effective, fast way of following the trends in deaths over time. While, ultimately, the date-of-death method will generate more precise statistics, public health officials and the public at large need the immediate and understandable view of trends that the date-of-report method provides. Officials should strive to include a seven-day average when presenting death data using date-of-report.
It’s also notable that an analysis of real-world data shows that providing death counts by date reported does not systematically overcount the number of recent deaths—in fact, it often undercounts them.
We recommend that all states provide both sets of numbers as Arizona, Florida, and Texas do. These are actually complementary types of data, each with its own strengths and drawbacks, and publishing both allows for easy comparisons while minimizing the risk of misinformation.
Finally, states should clearly mark date-of-death charts with the dates that are incomplete to avoid confusion among residents about what’s happened most recently.
1 In New York, the state reports approximately 5,700 fewer deaths than are reported by New York City. We use the state-reported figures, which means our New York numbers reflect a lower total than trackers that include the NYC data that the state omits.
More “Hospitalization and Death Data” posts
Looking back on a year of collecting COVID-19 data, here’s a summary of the tools we automated to make our data entry smoother and why we ultimately relied on manual data collection.
As The COVID Tracking Project comes to a close, here’s a summary of how states reported data on the five major COVID-19 metrics we tracked—tests, cases, deaths, hospitalizations, and recoveries—and how reporting complexities shaped the data.
During the worst parts of the COVID-19 pandemic, the United States struggled to keep up with COVID-19 death counts.