Earlier this month, the Trump administration ordered a sweeping change in how hospitals should transmit some COVID-19 data to the federal government.
Starting almost immediately, it said, most hospitals should stop reporting the number of patients with COVID-19 to the Centers for Disease Control and Prevention. Instead, hospitals should report those and other figures to the Department of Health and Human Services, the Cabinet-level agency that oversees the CDC. The new guidelines—which immediately became a national and highly political story—took effect on Wednesday, July 15.
Hospital data is important. Since May, the number of people in the hospital with COVID-19 has been one of the best measurements of the size and severity of outbreaks across the United States—and one of the few indicators of the strain on local health-care systems. Because we collect data directly from state and territorial governments, and not the federal government, we initially did not think that the new federal rules would substantially affect our process.
But two weeks after the rules began, it’s clear that technical requirements associated with the new guidelines have caused major problems. Some of the states facing the largest COVID-19 outbreaks—such as California, Texas, and South Carolina—have warned that they are not reporting accurate hospital information due to the switchover.
These problems mean that our hospitalization data—a crucial metric of the COVID-19 pandemic—is, for now, unreliable, and likely an undercount. We do not think that either the state-level hospitalization data or the new federal data is reliable in isolation. (As we describe below, the new federal hospitalization figures are substantially higher than the same data as reported by most states.)
That said, these problems only affect hospitalization data. Our case, testing, and death data from states continues to show the effects of testing and reporting backlogs and day-of-the-week reporting differences, but are as reliable as they were before the new directive. And we find no evidence to support a popular online conspiracy theory that the switchover from the CDC system to the Health and Human Services system explains a national plateau in new coronavirus cases. The theory is unsupportable: As we explain below, hospitals do not report case count or testing data to the federal government, and those trends in the data are reflected across too many different independent sources to be subject to centralized tampering.
What we’ve seen in the data
Because The COVID Tracking Project collects testing and outcomes data directly from state and territorial public health authorities, we were initially optimistic about the changeover’s effects on the data we compile. Starting on July 15, however, when the new requirements came into effect, we began to see problems with the data on current COVID-19 hospitalizations and related metrics on how many of those patients were admitted to the ICU or were using mechanical ventilators. Idaho, Missouri, South Carolina, and Wyoming were unable to publish all of the hospital data that we normally compile, and publicly documented that the problem was due to the changeover.
In at least three states, the problem seemed to be that the states had been reporting hospitalization data obtained from the CDC’s systems, rather than directly from hospitals, which meant they had to make the switch to the new Health and Human Services (HHS) system. (We also noticed that reported current COVID-19 hospitalizations were dropping in several other states, while cases and cumulative hospitalization numbers were either flat or rising, but those states did not disclose a reason.)
Those problems appeared to have been short-lived, as all the states that posted early warnings got their hospitalization data back online. But by July 22, the problems had spread. California, then Texas posted notices on their dashboards stating that some hospitals were not reporting COVID-19 data due to complications related to the changeover to HHS systems. This week, South Carolina’s COVID-19 hospitalization data went missing again, and we’ve continued to see declines in other states that do not appear to match up with local trends in case counts.
A note about our methodology: When states posted quality warnings that listed the HHS changeover as a cause for the instability of their hospitalization data, we “froze” current hospitalization data in our dataset for for those states, carrying forward the numbers from the last day when they did not post a data-quality warning, and making notes in the public annotations in the dataset.
Freezing data points is a choice of last resort for our Data Entry team, to be used when they encounter unusual, transient problems, like dashboards that crash, or states that miss reporting deadlines. Now that it has become clear that these data-quality problems may continue for some time, we’ve unfrozen all hospitalization data and are now reporting the data states report, despite their warnings that the data is incomplete. Ultimately, our data collection methodology is to report what states report, on the day when they report it, and if this changeover period lasts for weeks, it would become increasingly misleading to keep states’ hospitalization data frozen.
These problems raise other questions about the relationship between state and Federal data. If these states are getting their COVID-19 hospitalization data directly from hospitals, as we believe most of them are, why has the change in federal reporting requirements caused these gaps in the data? There may be several reasons, according to in-depth reporting by trade publication Healthcare IT News:
The change in requirements included very little time for hospitals to transition from one method to another—only two days for many healthcare systems. By contrast, HHS reporting changes issued on June 4 to laboratories are scheduled to take effect on August 1. The new HHS requirements also mandated that hospitals report more data points than many states had required, forcing hospitals to make extremely rapid changes in both IT systems and administrative processes.
The time required to enter all the data into both the HHS and state systems can take four to six hours per hospital each day—an overwhelming burden for many hospitals with large numbers of COVID-19 patients or staffing shortages.
In areas with low population density, rural hospitals with tiny administrative staff numbers must report dozens of data points to the HHS each day—including weekends—whether or not any COVID-19 patients have been admitted.
Some states that intend to report data to the HHS on behalf of their hospitals may not yet have received federal approval to do so, leaving states scrambling to report all their data twice on an interim basis, as Healthcare IT News reports has been the case in New Mexico.
Whatever the causes—which likely vary across states—the result has been that the current COVID-19 hospitalization data from states that was highly stable a few weeks ago is currently fragmented, and appears to be a significant undercount. In an unexpected twist, however, the hospital data made available by the HHS from the dataset they collect from hospitals differs from the states’ own reporting: for most states, the HHS hospitalization data is substantially higher than the same data as reported by the states.
HHS hospitalization data vs. state-reported hospitalization data
We compared hospital data published by the HHS with the same numbers as reported by states, and found substantial discrepancies. On average, the HHS reported 24 percent more patients hospitalized with COVID-19 than did the states.
In some states with major outbreaks, HHS hospitalization data shows a high degree of daily fluctuation compared to the characteristically steady data reporting from states. In Florida and Texas, HHS is reporting many more patients hospitalized with COVID-19 than the states report, with significant day-to-day fluctuation. In other states, like Alabama and Georgia, HHS reports a consistently higher number than the states do, but without marked fluctuation. In California and Arizona, the HHS and state numbers are a close match, with the state’s numbers outpacing the federal count on some days in our seven-day comparison.
Once again, there are several possible reasons for the discrepancies between state and federal datasets.
In some states, hospitals may be reporting data to the HHS but not to their state public health authorities; we know that in California and Texas, the states are not receiving complete data from some percentage of hospitals, because the states have posted warnings on their COVID-19 dashboards to that effect. (We don’t know if those same hospitals are reporting complete data to HHS.)
States may be posting lower numbers because their definitions of COVID-19 hospitalizations are more restrictive than the federal definitions. The HHS reports data on all COVID-19 hospitalizations, including suspected cases. But some states may omit suspect or probable cases from their figures. Other states may, like Florida, only report patients with a primary diagnosis of COVID-19, which potentially excludes patients who entered the hospital for another condition, then tested positive after admission and became seriously ill with COVID-19.
States that get current hospitalization data from their state hospital associations, may not be reporting any hospitalization data from Veteran’s Association and other federal hospitals. We are currently conducting outreach to states to determine how widespread this practice is.
We may be seeing some combination of hospitals double-reporting in error as they get up to speed with the new reporting requirements, and data-entry errors in others cases, but this probably does not explain the national, unidirectional discrepancy between HHS and state reporting.
So what does this all mean for people trying to interpret the data? Until we see the data stabilize at the state level and understand more about the reasons why the state and federal datasets for current COVID-19 hospitalizations don’t match up, we would urge caution in using either state-reported or HHS hospital data in isolation to understand local outbreaks or the burden on healthcare systems.
Case counts are not changing because of hospital data reporting
In the past two weeks, we have seen considerable confusion arise about what changes and gaps in hospitalization data may mean for other COVID-19 data points. Most notably, the idea that declining COVID-19 case counts at the national level are the result of the changeover to HHS systems is based on a misunderstanding of where case data comes from.
The data hospitals used to report (directly or through states) to the CDC and now report (directly or through states) to HHS is about how many people have been admitted to hospitals with COVID-19, and about each hospital’s stock of medication, supplies, and PPE. The directive lists all the data points hospitals are required to report.
This data from hospitals does not include case counts or total tests. Testing and case data originates in laboratories (which are also being required to report richer data to HHS by August 1) and passes through state and territorial public health authorities. Each state’s government decides which data to make public on their official dashboard and daily reports—and this is the information we collect at The COVID Tracking Project.
Because the two data streams—hospital-based and laboratory/health authority-based—handle different data points, it is not possible that any change in federal reporting requirements for hospitals has a causative role in the change in the direction of COVID-19 case counts at the state or national level. Further, the HHS website directs visitors to the CDC’s official case count for information on cases and testing. And again, the specific data hospitals are reporting to the HHS—not cases, but hospitalizations—shows higher counts in the federal data source than in the official state data sources.
What we're left with
COVID-19 data in the United States emerges from an intricate patchwork of sources across states and territories. Each data point is produced and transmitted by individual humans in labs, hospitals, and health departments, passing through many systems on its way to the public and the federal government. Sometimes the human factors are extremely apparent: case and testing data shows a dramatic day-of-week effect, dipping low on weekends and Mondays and rising throughout the week.
Until the last two weeks, the data that comes from hospitals has been an exception—it also originates with people making notes about individual patients in IT systems, but it is highly consistent from day to day, because hospital reporting never lets up, even on holidays and weekends. Data on current hospitalizations is also not subject to the kind of backlogs and lags we see in case, testing, and death data reporting, which makes it an indispensable metric for understanding outbreaks in near real-time.
Right now, this critically important data is less stable than it has ever been while overextended hospital staff try to change reporting systems midstream, but based on accounts from states and hospitals about their ongoing efforts to make the directed changes, we continue to hope that the current instabilities will be temporary.
The hospitalization data that the HHS is publishing to date also suggests that we may soon have multiple, differently useful datasets available to help us understand the reality of COVID-19 hospitalizations. For each of these datasets—federal and state-reported—to be maximally useful, we need to get through the current period of instability, and understand why the two hospitalization counts differ, where they do. In the interim, we will continue to watch both sets of numbers closely and report out what we find here on the blog and in our daily release notes on Twitter.
Erin Kissane is a co-founder of the COVID Tracking Project, and the project’s managing editor.
Robinson Meyer is a co-founder of The COVID Tracking Project and a staff writer at The Atlantic.
Peter Walker is Head of Marketing & Growth at PublicRelay and Data Viz Co-Lead at The COVID Tracking Project.
More “Hospitalization and Death Data” posts
20,000 Hours of Data Entry: Why We Didn’t Automate Our Data Collection
Looking back on a year of collecting COVID-19 data, here’s a summary of the tools we automated to make our data entry smoother and why we ultimately relied on manual data collection.
A Wrap-Up: The Five Major Metrics of COVID-19 Data
As The COVID Tracking Project comes to a close, here’s a summary of how states reported data on the five major COVID-19 metrics we tracked—tests, cases, deaths, hospitalizations, and recoveries—and how reporting complexities shaped the data.
How Lagging Death Counts Muddled Our View of the COVID-19 Pandemic
During the worst parts of the COVID-19 pandemic, the United States struggled to keep up with COVID-19 death counts.