At the federal level, two datasets provide information about COVID-19 fatalities: the Centers for Disease Control’s (CDC) COVID Data Tracker, and Provisional Death Counts for COVID-19 from the National Center of Health Statistics (NCHS), which is also part of the CDC.
The CDC COVID Data Tracker is available on the agency’s website and reports a wide range of information regarding the pandemic: It includes figures on testing, cases, and deaths, vaccination delivery and administration, and more. The statistics about fatalities are reported on the Data Tracker’s Cases and Deaths by State page, and are available in a dataset titled “United States COVID-19 Cases and Deaths by State Over Time.” This data was first reported by the CDC on May 9, 2020, and contains information for dates going back to January 22, 2020.
The other major source of federal data on COVID-19 deaths comes from the NCHS, which is responsible for compiling statistics on the health of the United States population, including mortality data. The NCHS tracks COVID-19 deaths by reviewing death certificates submitted to the National Vital Statistics System (NVSS). The dataset has been available since May 1, 2020, and contains information for dates going back to January 4, 2020. The agency is also responsible for standardized procedures for certifying COVID-19 on death certificates.
The difficulty in using these two sources lies in the different ways in which death counts are tabulated. While the CDC COVID Data Tracker closely tracks the current death counts compiled from states by The COVID Tracking Project—which are well-suited to supporting rapid pandemic response—NCHS’s death certificate data more closely aligns with gold standard definitions for cause of death used in epidemiological studies, standardized across states. However, the data provides a less timely picture of COVID-19 deaths.
What’s in the CDC COVID Data Tracker deaths data
The primary difference between these two datasets is their source.
The data on the CDC COVID Data Tracker is sourced directly from states, which submit updated deaths counts to the CDC each day. As the data is sourced from each state, it very closely matches the information reported on state COVID-19 dashboards—and the COVID Tracking Project’s data for each state. This makes the data a convenient replacement for COVID Tracking Project data.
However, the CDC COVID Data Tracker comes with an important caveat—one familiar to users of COVID Tracking Project data: It isn’t standardized across all states. States make many choices about what to submit to the CDC, from whether they submit confirmed and probable death breakdowns to how they define “confirmed” and “probable.” This means, just like state data, the counts won’t all be comparable across states—even though it’s coming from the federal government.
To date, 33 jurisdictions are reporting both confirmed and probable deaths to the CDC, 3 are reporting just confirmed deaths, and 24 are reporting only a figure for “total deaths,” which appears to include both probable and confirmed deaths. Even among states providing a breakdown of confirmed and probable deaths, different definitions exist. For example, Michigan appears to count a patient who dies with a positive result for COVID-19 via antigen test as a confirmed death, but Georgia would count this as a probable death.
What’s in the NCHS deaths data
The NCHS dataset is based on death certificates from the National Vital Statistics System (NVSS) submitted by states under the National Vital Statistics Cooperative Program that have passed quality control procedures. As such, NCHS data represents data that aligns with how researchers define cause of death in epidemiological studies, standardized across jurisdictions. The NHCS officially counts a death as attributed to COVID-19 when it is listed either as an underlying cause of death (Part I of a death certificate) or contributing to the cause of death (Part II of a death certificate). Because there can be a substantial lag between when a death occurs and when a certificate is submitted to and reviewed by the NCHS, the NCHS notes that its data is provisional, and more recent dates should be interpreted with caution as a likely undercount. Because death certificates need to be processed before they can be included in the NCHS data, NCHS notes the most recent two weeks of data are unstable:
This difference in sources contribute to differences in how each agency’s data matches up with that of The COVID Tracking Project. In general, data from the CDC COVID Data Tracker almost perfectly matches state data, but is not standardized across jurisdictions. On the other hand, NCHS data does not exactly match state data, but definitions are comparable across state lines.
Federal COVID-19 Death Datasets
|CDC COVID Data Tracker||NCHS|
|Source||CDC COVID Data TrackerStates reporting daily updates on deaths directly to the CDC||NCHSAnalysis of death certificates sent by states to the NVSS|
|Update frequency||CDC COVID Data TrackerDaily||NCHSDaily|
|Time interval||CDC COVID Data TrackerDaily||NCHSBy MMWR week|
|Dating method||CDC COVID Data TrackerDate of report||NCHSDate of death|
|Location of death||CDC COVID Data TrackerVaries - residency, place of death||NCHSPlace of death|
|Standardized definitions||CDC COVID Data TrackerNo. Varies by jurisdiction||NCHSYes|
It is important to note that neither dataset provides the definitive truth about the number of deaths associated with COVID-19. Each dataset is subject to delays and likely undercounts the true number of COVID-19 deaths. The CDC COVID Data Tracker reflects issues around data collecting pipelines that state departments of health have been experiencing. On the other hand, data from NCHS is likely impacted by mistakes in the death certifying process, and the overall delays resulting from manual review—delays that can vary by state, making comparisons of recent data between states tricky even though definitions are standardized across state lines.
Because of these delays, both datasets are subject to revisions as states identify additional COVID-19 deaths and the federal government reviews and classifies death certificates.
How to use the data
Both datasets share the same data apparatus operated by the CDC. This allows both datasets to be available through the agency’s website, as well as sharing a common API (application programming interface) infrastructure for automated querying. As such, accessing the datasets is a simple process for data users familiar with federal datasets.
CDC COVID Data Tracker
The CDC COVID Data Tracker’s central dataset on fatalities, United States COVID-19 Cases and Deaths by State Over Time, is updated daily and provides information from 60 jurisdictions: the 50 states, D.C., 8 territories (American Samoa, Federated States of Micronesia, Guam, Northern Mariana Islands, Puerto Rico, Palau, Republic of Marshall Islands, and the Virgin Islands), and information from New York City, which is reported separately from NY state. It includes six categories of information reported on deaths, though not all are reported for every jurisdiction:
the total number of confirmed deaths
the total number of probable deaths
the number of new deaths since last update
the number of new probable deaths since last update
the total number of deaths, which sums both new and historic deaths
and a field specifying whether this “total death” figure includes a state’s probable deaths in the count.
Here’s a look at how this dataset’s categories compare to CTP’s:
|CDC Description||CDC Column Name||CTP API Field Name|
|CDC DescriptionDate of counts||CDC Column Name||CTP API Field Name|
|CDC DescriptionDate and time record was created||CDC Column Name||CTP API Field Name|
|CDC DescriptionTotal number of cases||CDC Column Name||CTP API Field Name|
|CDC DescriptionTotal confirmed cases||CDC Column Name||CTP API Field Name|
|CDC DescriptionTotal probable cases||CDC Column Name||CTP API Field Name|
|CDC DescriptionNumber of new cases||CDC Column Name||CTP API Field Name|
|CDC DescriptionNumber of new probable cases||CDC Column Name||CTP API Field NameN/A|
|CDC DescriptionTotal number of deaths||CDC Column Name||CTP API Field Name|
|CDC DescriptionTotal number of confirmed deaths||CDC Column Name||CTP API Field Name|
|CDC DescriptionTotal number of probable deaths||CDC Column Name||CTP API Field Name|
|CDC DescriptionNumber of new deaths||CDC Column Name||CTP API Field Name|
|CDC DescriptionNumber of new probable deaths||CDC Column Name||CTP API Field NameN/A|
|CDC DescriptionJurisdiction||CDC Column Name||CTP API Field Name|
|CDC DescriptionIf Agree, then confirmed and probable cases are included. If Not Agree, only total cases.||CDC Column Name||CTP API Field NameN/A|
|CDC DescriptionIf Agree, then confirmed and probable deaths are included. If Not Agree, only total deeaths.||CDC Column Name||CTP API Field NameN/A|
It is worth noting that there are several states which have data reported for “new probable deaths” but no information given for “probable deaths” over time: Arkansas, Florida, Iowa, Kansas, Pennsylvania, New Hampshire, and West Virginia. Based on correspondence with state health officials, we believe this problem originates from crossed wires about sharing permissions between states and the federal government. (See our Federal Case Data 101 post for additional information.)
In addition to its main dataset, the CDC COVID Data Tracker provides data tables with demographic breakdowns of fatality information by race and ethnicity, age, and sex.
The NCHS data provides multiple associated data tables, of which a list can be accessed via an index page. These data tables include fatality counts broken down by various demographic factors such as age, sex, race and ethnicity, etc. Additionally, the NCHS provides other specialized measures of mortality such as total deaths from all causes (also known as all-cause mortality), excess deaths estimations, and fatalities coded as due to pneumonia or influenza.
Most data tables provided by NCHS are updated weekly, with the exception of the overall data table, titled provisional COVID-19 deaths by week, which updates daily but counts are still aggregated by week. Each data table can be accessed either through direct download from the CDC data portal or with the Socrata Open Data API.
The NCHS data comes with robust documentation on the data tabulation procedure. Each data table mentioned has its own page, with detailed caveats and explainers. Sections on “Understanding the Numbers” and “Understanding Death Certificate Data” give users information about how provisional counts are tabulated, as well as the different quality issues with regards to the death certificates pipeline. More information can be found in the sections “Technical Notes” as well as the FAQ.
Due to differences in sourcing, users of data from The COVID Tracking Project should be aware of the following:
NCHS does not require laboratory confirmation to count a death as due to COVID-19; factors such as clinical presentation can bear on the classification. Likewise, an individual may die with a positive COVID-19 test but not be counted by NCHS if COVID-19 did not contribute to their death.1
The terms “confirmed” and “probable” used in NCHS communications refer to how COVID-19 is listed on the death certificates, not to the COVID-19 case definition provided by the Council of State and Territorial Epidemiologists.
Data from NCHS are labeled as “provisional” due to delays in the process of submitting a death certificate. More details can be found in the “Technical Notes” section on the NCHS website. As such, data should be considered delayed, and prior data will be adjusted accordingly when new certificates are processed. The agency suggests that the difference might be within the time-scale of 1-2 weeks.
Data is organized by date of death while data from The COVID Tracking Project reflects death by date of reporting. Unlike COVID Tracking Project data, historical NCHS data does not exhibit a “death lag” due to reporting.
How to make federal death data better
These two datasets provide a multi-faceted look at US deaths associated with COVID-19. Even in their current state, these datasets are valuable, but we think they can be even better and more helpful to a wider range of data users.
First, we recommend that documentation of both datasets be complete and centralized. The CDC COVID Data Tracker includes a note stating that definitions of confirmed and probable deaths are “standardized,” but we believe they can’t be, since they so closely match the very unstandardized confirmed and probable totals that states report. The CDC Tracker team should document how each state defines the confirmed and probable COVID-19 deaths it counts. Documentation explaining the deaths reported in the CDC COVID Data Tracker are also spread across multiple parts of the agency’s website, including footnotes on the bottom of its Cases and Deaths by State page, on its FAQ page, and in its daily Community Profile Report. We recommend that this documentation be compiled in a central location.
NCHS deaths data is more clearly explained—for this dataset, we recommend only that documentation be more centralized and clearly labeled.
Second, we would like to suggest that the CDC publish a brief introduction to the two datasets, including the differences between them, and the contexts in which one is more useful or appropriate than the other. This would help non-specialists find information more quickly and help prevent confusion when users encounter both datasets at the same time.
Our suggestions focus on delivering helpful, detailed documentation on the nuances of these datasets because we believe that thorough explanations enhance trust in this vital information.
Additional research and contributions from Dave Luo and Peter Walker.
1 Even though counting such deaths is theoretically possible in state reports, states regularly revise and re-examine deaths data. While revisions can occur due to accidentally counting non-COVID deaths, the number of mistakes is usually small.
Joseph Bensimon is a member of the COVID Tracking Project’s Data Quality team.
Quang P. Nguyen is a PhD candidate in the Department of Epidemiology at Dartmouth College.
More “Federal COVID Data 101” posts
Publicly available federal race and ethnicity COVID-19 data is currently usable and improving, although it shares many of the problems we’ve found in state-reported data.
Beware of dating schemes, data dumps, weather events and other issues that can lead to mistakes that confuse the public.