Despite making up less than 1% of the nation's population, people living in nursing homes and other long-term-care facilities account for at least 35% of the nation’s COVID-19 deaths. But getting even enough data to make that statement has involved a laborious process of stitching together unstandardized datasets produced by the 56 states and territories we track. With our efforts ending on March 7, 2021, we’d like to introduce you to The Centers for Medicare and Medicaid Services (CMS) Nursing Home dataset—the only dataset from the federal government about COVID-19’s effects on the staff and residents of long-term-care facilities. 

What’s in the CMS data

The CMS Nursing Home dataset includes 120 metrics spanning confirmed COVID-19 cases and deaths, suspected COVID-19 cases and deaths, PPE availability, staff shortages, COVID-19 testing, number of occupied beds, and more, all of which are listed in the data dictionary for the dataset. The dataset is accompanied by summary notes, an FAQ, detailed descriptions of CDC/CMS quality assurance processes, and documentation that includes mapping tools, data export options, API documentation, and email addresses for user support.

Nursing home facilities have been required to report COVID-19 data to CMS since May 17, 2020. This data is reported through CDC’s National Healthcare Safety Network (NHSN) module for long-term-care reporting. Facilities can submit data multiple times a week but are required to submit at least once a week. This data has been published weekly by CMS since May 24, 2020. CMS warns that the first week of data (for the week ending May 24, 2020) should not be used for trend or longitudinal analysis. Nursing homes, assisted-living facilities, and care facilities for the developmentally disabled are all eligible to report data to the NHSN’s COVID-19 Module, but only nursing homes are required to report. CMS’s nursing home dataset, as the name suggests, reports data only from nursing homes. (The CMS reporting week goes from Monday to Sunday, so a dataset row labelled 5/31/2020 represents the reporting week of May 25 to May 31.)

Notably, the CMS data includes a resident census, or the number of residents who live in a given nursing home, which is a crucial reference point.The documentation provided by CMS includes several important cautions, including warnings that erroneous reports may be suppressed, and that facilities can correct data after submission. The documentation also cautions users of the data to remember that access to COVID-19 testing has been limited throughout the pandemic, which necessarily shapes the data itself.

What’s missing in the CMS data—and what’s better

The CMS Nursing Home dataset was first published on May 24, 2020, over three months after the first COVID-19 death in LTC residents was confirmed at Life Care Center of Kirkland, Washington. Facilities were not required to backfill historical COVID-19 cases or deaths, and it appears that many did not.1 As a result, CMS’s dataset—like our own long-term-care COVID-19 data—is a known undercount of cases and deaths. Because the CMS dataset was first established in May and does not include mandatory historical backfills, it should not be used as a cumulative representation of the impact of COVID-19 on nursing home residents. 

Additionally, the CMS data does not attempt to provide a comprehensive view of all long-term-care facilities in the country: It includes only nursing homes, which are federally regulated, and which house an estimated 1,347,600 residents, according to data released by the National Center for Health Statistics in 2016. Assisted-living, resident care, and other kinds of care facilities not counted as nursing homes house an estimated 811,500 residents.

These two factors—the May start without mandatory backfill, and the restriction to federally regulated facilities—appear to explain why the CMS dataset reports about 43,000 fewer COVID-19 deaths for long-term-care facilities than states report themselves, according to our compiled data.

A note for researchers: According to a brief review on March 3, 2021, we found that 23 states currently distinguish between nursing homes and other types of facilities in facility-level data on their own websites. Where states publish data on assisted living facilities and other long-term-care facilities separately from nursing home data, it may be possible to consider both federal data on nursing homes and state-reported data on other facilities to arrive at a more complete estimate of a state’s experience with COVID-19 in long-term-care facilities. (Researchers undertaking such a comparison should carefully study the data definitions and time periods available in both state and federal databases and take care to note that different datasets are being used to estimate the pandemic’s effects in the various facility types.)

Unlike the long-term-care COVID-19 data reported by the states, the CMS Nursing Home dataset appears to be largely standardized. The data we were able to compile from states included disparate data definitions, categorizations, data ranges, and levels of granularity. We saw states use varying definitions of COVID-19 cases, deaths, and outbreaks, as well as facility types, and some states didn’t break out resident and staff cases and deaths, making cross-state comparisons nearly impossible. The federal dataset’s standardization of these and other metrics is a substantial advantage for data users.

How CMS Nursing Home data compares to COVID Tracking Project data

As noted above, the CMS Nursing Home dataset includes data on a subset of all long-term-care facilities—only federally regulated nursing homes. This difference complicates direct comparison, but it’s useful to roughly measure the gaps in the two sets of figures. For example, in comparing what percent of US COVID-19 deaths occurred in long-term-care facilities, the numbers in the CMS data would account for 27 percent of the nation’s deaths, while the figures in the COVID Tracking Project LTC dataset represent at least 35 percent of the nation’s deaths as of our most recent update.

In many states, the trend lines for the two datasets are very similar, even when the absolute numbers vary substantially. Since the CMS dataset represents only nursing homes, we would expect the CMS figures to be lower than state-reported figures for states reporting a wider range of long-term-care facilities. This is not always the case, however: The CMS data for 12 states currently includes a larger number of deaths for nursing homes alone than the states report for all long-term-care facilities on their own websites.2 These unexpected findings could be the result of definitional differences, unstandardized state reporting, or artifacts of our compilation process, and suggest that additional analysis of data definitions and time periods represented in the state-reported and federal data in those states is worth further inquiry by researchers.

For the state of Arizona, where a court ruled that the state could keep data on COVID-19 in nursing homes and other long-term-care facilities confidential, the CMS dataset provides the only direct measure of the pandemic’s effects on LTC residents in the state. (Because Arizona’s state health officials do not report this data, The COVID Tracking Project has used public information from Maricopa County and Pima County, the state’s largest counties, to provide a partial estimate of cases and deaths in LTCs.)

How to use the CMS Nursing Home dataset

Working with the CMS Nursing Home dataset can be a challenge because the file contains nearly 600,000 rows of data, each with 120 columns, and grows larger each week. We’ve created a practical user's guide to help data users learn how to acquire the data through CMS’s data hub or their API. Most other questions can be answered by the federal documentation or other documents linked from the dataset’s landing page.

Bonus data: Vaccination Data for LTCs

Although The COVID Tracking Project has never compiled vaccine data, we have been monitoring available sources for this data since vaccinations began in the US. Long-term-care vaccine data is available from four different sources, each at a different level of detail. 

In every state except West Virginia, residents of long-term-care facilities are mostly vaccinated through the CDC’s Federal Pharmacy Partnership for Long-Term Care Program. (West Virginia developed its own program and offers no public data on long-term-care vaccinations.) In this program, most doses are administered by CVS and Walgreens, with other local pharmacies participating in some areas. Federal vaccine data from the Federal Pharmacy Partnership can be found on the CDC’s COVID Data Tracker.

The CDC provides vaccine data by state with total first doses administered and total second doses administered, but this state-level data is not broken out by staff and resident doses. The CDC only breaks out staff and resident doses for total US doses administered in the program. The state-level data is updated daily, and is downloadable from the tracker as a spreadsheet.

Data collected by the Federal Pharmacy Partnership program is recorded in the federal vaccine reporting system called Tiberius. Tiberius data provides a view of how many doses residents and staff have received at individual facilities, but the Tiberius data is currently made public on a daily basis by only one state (that we know of): South Carolina. The Tiberius data includes indications of which pharmacy held vaccine clinics (CVS or Walgreens), the city, the facility name, the total number of beds at the facility, first and second dose by residents, and first and second dose by staff. Our researchers have been able to obtain unpublished Tiberius data directly from several states, many of which provide it upon request.

The final source of vaccine data for LTCs comes from CVS and Walgreens themselves. CVS updates data on weekdays around 4pm ET and Walgreens updates their data daily around 6pm ET. The two pharmacies data is collected from the clinics they hold at facilities.

CVS provides state-level data by facility type (Nursing Homes and Assisted Living/Other LTC Facilities), total doses administered, and number of facilities assigned per state. Walgreens provides state-level data by facility type, date when the program started per state, number of assigned facilities per state, and total vaccines administered. Neither company breaks down data by staff and residents.

Additional federal datasets for cross referencing

CMS also offers several related datasets that can be useful for researchers and reporters looking at COVID-19 in nursing homes. We’ve included a brief list below and created an example of how to link multiple datasets together with CMS data using the CMS Facility ID.

Dataset NameWhat it hasWhere to find it
Dataset NameProvider InformationWhat it hasStar ratings, staffing hours per resident, profit statusWhere to find it
Dataset NamePenaltiesWhat it hasFine amount, date of penaltyWhere to find it
Dataset NameOwnershipWhat it hasOwnership type, owner titleWhere to find it
Dataset NameQuality MeasureWhat it hasQuality of resident careWhere to find it

How federal data can be better

The CMS Nursing Home dataset is already excellent in many respects, but we have a few recommendations for making it even stronger. 

  • Get the historical data. The federal government should require nursing homes to backfill their historical COVID-19 data to include the earliest data they have. The pandemic’s deadly first surge in these facilities is still only patchily documented and ill-understood, and compiling a complete record of known cases and deaths will be an invaluable resource for researchers and policymakers. 

  • Develop a comprehensive reporting system for all long-term-care facilities. State and federal authorities should work together to ensure that all long-term-care facilities—not just federally regulated skilled nursing facilities—are required to report pandemic data to a single authority. Without such a change, it will be impossible to track the true effects of COVID-19—and future pandemics—on this highly vulnerable population.

  • Provide current facility and population counts. The federal government should provide public estimates of the current number of LTC facilities in the United States by facility type, along with estimates of resident and staff populations. Without this information, it’s very difficult to contextualize the effects of the pandemic—or of vaccination programs—in these facilities. 

  • Release the facility-level vaccination data. The federal government should release the vaccination data gathered in the Tiberius system at the state and facility level, along with any available information on quality assurance and anomaly detection. Even if this data is imperfect, detailed information about the vaccination rollout in LTC facilities is too important to remain secret.

Although they are both far from comprehensive, the long-term-care dataset we’ve compiled from states and the CMS Nursing Home dataset tell the same story: US long-term-care facilities were not prepared for a pandemic like COVID-19, and the losses in these facilities have been devastating. With more than 2.5 million US LTC staff and residents now fully vaccinated, we have seen major reductions in COVID-19 deaths in these facilities, and we expect to see these numbers continue to drop. After working closely with this data for the better part of a year, we believe that both better data and better—and fully funded—regulations will be required at every level of American government to ensure that the unthinkable losses we saw in these facilities this year can never happen again.

Research from: Zach Lipton, Jonathan Gilmour, Julia Kodysh, Conor Kelly

1 For example, Connecticut began releasing death data on COVID-19 in nursing homes on their official state site on April 16, 2020, but our researchers suspect that some of those early deaths and cases are not included in the CMS dataset: In early June, the state of Connecticut reported 809 more nursing home resident and staff deaths on its own site than appeared in the CMS data. Our researchers also reviewed data from facilities known to have reported early data on state websites to see if differences persisted between their state-reported data and the corresponding CMS data—which they did.

2 The 12 states are Alaska, Alabama, Arkansas, Hawaii, Iowa, Kentucky, Missouri, Ohio, Oklahoma, Tennessee, West Virginia, and Wyoming.


