When SARS-CoV-2 (the virus that causes COVID-19) first emerged, public health officials across the US had some quick work to do. First, they needed to learn enough about the virus to understand how and when to test for its presence. Next came manufacturing needs, and the distribution of tests. Data pipelines had to be built, to funnel diagnostic test results up from labs, and all of this needed to happen as fast as humanly possible.
What health officials also knew was that they needed a way to tell if someone probably had COVID-19—a way to account for the situations of virus spread among individuals who weren’t being tested. So, in April 2020, the CDC adopted a definition for probable COVID-19 cases. The definition was based on exposure and symptoms, and it was an attempt to signal to state health officials that there could be one standardized way to interpret a probable case.
In those early months, probable cases comprised only a small fraction of the national case count, likely because the probable case definition relied on contact tracing and symptom tracking, which were difficult for states to perform at scale. Then, as antigen tests became more widely available and able to quickly identify COVID-19 with a high degree of certainty, the CDC expanded the probable case definition to make room for antigen test results, even in the absence of exposure and symptoms. It was only with this expanded definition, adopted in August 2020, that probable cases made a dent in the national case volume.
As we’ve seen across the range of COVID-19 metrics, though, just because there was (some) federal guidance didn’t mean states were following it. With probable cases, how and when states chose to incorporate the CDC’s guidance varied. By now, most states do use a probable case definition that includes antigen positive individuals, but there are still a few outlier states using different or unclear definitions. It is this veneer of uniformity— these quiet definitional discrepancies and the varying journeys states took toward similar conclusions—that this piece will explore.
Current definitions: confirmed, probable, and suspect cases of COVID-19
To track any infectious disease, you need what’s called a “surveillance definition,” a set of criteria used to measure the spread of a disease on a large scale (rather than making individual clinical diagnoses). The Council of State and Territorial Epidemiologists (CSTE) sets the surveillance definitions used by CDC for many infectious diseases, and COVID-19 was no exception: The organization has issued two iterations of definitional guidelines for COVID-19 cases since the pandemic began, in April 2020 and August 2020, that were adopted by the CDC shortly thereafter. The current guidelines call for states and the CDC to track three categories of COVID-19 cases—confirmed cases, probable cases, and suspect cases. The CSTE recommends that states track confirmed and probable cases publicly, while suspect cases are meant as an internal category for disease surveillance.
Confirmed cases of COVID-19 are the most straightforward: When a PCR test comes back positive, the individual is declared a confirmed case. PCR tests are often referred to as “gold standard tests” for COVID-19 because of their high sensitivity, which means PCR tests are able to pick up even very small amounts of viral RNA, even early in an infection. There is a low chance for false positives.
Probable cases can be identified in three main ways.
The primary way currently uses antigen testing, which is generally known to have lower sensitivity than PCR or other viral RNA detection tests. A person with a low level of viral infection might get a false negative result. However, antigen tests are highly specific tests, meaning that false positive results are unlikely. Because these tests are cheaper than PCR tests and produce results much more quickly, they can be used to rapidly identify individuals with high levels of virus who are likely to infect others.
The second way to identify a probable case is through symptoms (clinical criteria) and likely exposure (epidemiological linkage). In other words, if an individual is experiencing certain symptoms known to be associated with COVID-19, and they were in a situation where they had known exposure to a probable or confirmed case, they could be classified as a probable case without a test result or with a test result pending.
Finally, deceased individuals with COVID-19 listed on a death certificate as the presumed cause of death or significant condition contributing to death and without a positive laboratory result can be classified as a probable case. This is different from being classified as a probable COVID-19 death—though the COVID-19 case definitions are often used by jurisdictions to determine COVID-19 deaths as well.
The diagram below illustrates the specific combinations of criteria that the CSTE says should be used to classify a case as probable in the absence of a laboratory test result:
In the hierarchy of case classification, suspect cases represent individuals with the lowest level of evidence that they are presently infected with COVID-19. These individuals may have a positive antibody test—indicating past infection, but probably not a current infection. The CSTE recommends that states and the CDC keep track of these individuals internally but not include them in public COVID-19 case counts.
What changed? The evolution of case definitions
As mentioned, the current CSTE case definition is the second iteration, an update on the original one from April 2020. While the definition of a confirmed case did not change with the updated version, the definition of a probable case was adjusted to reflect the growing understanding of COVID-19’s clinical presentation, patterns of spread, and changing availability of lab tests.
To understand the changes, it’s helpful to know how CSTE case classifications work. Each case classification is made up of a set of criteria: laboratory criteria, clinical criteria, epidemiological factors, and the vital records criteria. This can all be combined to standardize what it means to identify a confirmed or probable case of COVID-19.
Laboratory criterion changes
Between April and August 2020, the most significant changes were to the laboratory criterion:
|April 5, 2020 (CSTE-01)||August 5, 2020 (CSTE-02)|
|Positive antigen||April 5, 2020 (CSTE-01)Detection of specific antigen in a clinical specimen **AND accompanied by clinical or epidemiologic criteria||August 5, 2020 (CSTE-02)Detection of SARS-CoV-2 by antigen test in a respiratory specimen**|
|Positive antibody||April 5, 2020 (CSTE-01)Detection of specific antibody in serum, plasma, or whole blood indicative of a new or recent infection||August 5, 2020 (CSTE-02)No longer considered a probable case|
Changes in how the criterion can be used: Beginning in August, laboratory evidence on its own—absent any other criteria—indicated a probable case. (The original April 2020 definition required clinical or epidemiological criteria to be satisfied, as well, in order to be considered a probable case.)
Changes to the criterion itself: Antibody tests were removed as laboratory evidence, leaving only antigen tests. The internal (not public) suspect cases category was added as a new category for patients who tested positive on antibody tests.
As antigen tests became more prevalent, they proved to be highly reliable at indicating active COVID-19 infections. Meanwhile, unlike both antigen and PCR tests, antibody tests identify prior rather than current infections. They detect the presence of specific antibodies, and according to the CDC, it can take one to three weeks after the infection for your body to produce detectable antibodies. For that reason, antibody tests are not appropriate for diagnosing probable or confirmed cases of COVID-19.
Epidemiological and clinical criteria
The epidemiological linkage criteria were adjusted to reflect a better understanding of COVID-19 transmission:
The updated definition does not require close contact with someone who is sick with a COVID-19-like illness and linked to a confirmed COVID-19 case, reflecting an understanding that asymptomatic transmission of COVID-19 is common.
In addition, travel to an area outside of the US with sustained, ongoing community transmission of COVID-19 was removed as a necessary epidemiological linkage factor, as most of the United States itself could be described as having sustained ongoing community transmission.
|April 5, 2020 (CSTE-01)||August 5, 2020 (CSTE-02)|
|Contact with Case||April 5, 2020 (CSTE-01)Close contact (within six feet for 15 consecutive minutes or more) with a confirmed or probable case of COVID-19 disease;||August 5, 2020 (CSTE-02)Close contact with a confirmed or probable case of COVID-19 disease;|
|Contact with Ill person linked to case||April 5, 2020 (CSTE-01)Close contact with a person with: clinically compatible illness AND linkage to a confirmed case of COVID-19 disease.||August 5, 2020 (CSTE-02)Removed|
|Travel to hi-transmission||April 5, 2020 (CSTE-01)Travel to or residence in an area (outside of the US) with sustained, ongoing community transmission of SARS-CoV-2.||August 5, 2020 (CSTE-02)Removed|
|Risk Cohort||April 5, 2020 (CSTE-01)Member of a risk cohort as defined by public health authorities during an outbreak.||August 5, 2020 (CSTE-02)Member of a risk cohort as defined by public health authorities during an outbreak.|
Definitions of clinical criteria were also modified as COVID-19 symptoms became better understood. Most notably, in the original probable case definition, the loss of taste and smell needed to be present with other symptoms, whereas the updated definition counted the loss of taste and smell (olfactory disorder) as a COVID-19 symptom on its own.
In addition, the symptoms listed in the original definition were almost exclusively respiratory. However, as more and more people presented with gastrointestinal symptoms and extreme fatigue, these additional symptoms were also added to the list of possible COVID-19 symptoms.
|April 5, 2020 (CSTE-01)||August 5, 2020 (CSTE-02)|
|April 5, 2020 (CSTE-01)At least two of the following symptoms: fever, chills, rigors, myalgia, headache, sore throat, new olfactory and taste disorder(s)||August 5, 2020 (CSTE-02)At least two of the following symptoms: fever, chills, rigors, myalgia, headache, sore throat, nausea or vomiting, diarrhea, fatigue, congestion or runny nose|
|April 5, 2020 (CSTE-01)OR at least one of the following symptoms: cough, shortness of breath, or difficulty breathing||August 5, 2020 (CSTE-02)OR any one of the following symptoms: cough, shortness of breath, difficulty breathing, new olfactory disorder,new taste disorder|
|April 5, 2020 (CSTE-01)OR severe respiratory illness with at least one of the following: clinical or radiographic evidence of pneumonia, OR acute respiratory distress syndrome (ARDS).||August 5, 2020 (CSTE-02)OR severe respiratory illness with at least one of the following: clinical or radiographic evidence of pneumonia, OR Acute respiratory distress syndrome (ARDS).|
How did these changes affect the data?
The Covid Tracking Project’s work has focused on understanding and contextualizing COVID-19 data, so we wanted to take a closer look at the probable cases states reported. Namely, we wanted to see what, if any, impact the CSTE’s definitional changes made to the data that was reported. Our research was driven by these questions:
We knew when CSTE updated the case definition for probable cases, but when did states actually adopt the updated definition?
Did the updated definition have an impact on the number of probable cases that were being reported?
Were those effects the same across states?
Our analysis looked at data gathered during our year of daily data collection, which stopped on March 7, 2021, and shows that as of March 7, 2021 the reporting of probable cases was still very uneven across the country. Six states—California, Iowa, Maryland, Missouri1, Nevada, and the Northern Mariana Islands—did not report probable cases. Among states that did report probable cases, most converged on using the updated case definitions in the fall of 2020, but there’s regional variation in how the definitions have been implemented and interpreted.
States’ adoption of CSTE-02 definitions
Of the 55 states and territories we tracked, 36 had been providing separate counts of confirmed and probable cases as of March 7, 2021.2 (When we previously wrote on this topic, in June 2020, we identified only 25 states providing separate counts.)
Of the remaining states, seven reported combined confirmed and probable cases, six states reported confirmed cases only, and six states reported cases with an unclear definition.
Of the 36 states that report confirmed and probable cases separately, 35 used the updated (CSTE-02) definition as adopted by the CDC in August 2020. One state, Vermont, uses a custom probable case definition. Most states that switched to the updated definition moved in September and October 2020, though two of those states, Delaware and Mississippi, still display the original case definition on their dashboards. (We reached out to officials at Delaware’s Department of Health and Social Services, who confirmed that they are using the updated definition and said they will be updating the language. A spokesperson from Mississippi’s Department of Health said they weren’t aware the previous case definition was on the state’s website but confirmed use of CSTE-02.)
Trends in probable case reporting
It’s important to remember that the number of COVID-19 cases will never be a measurement of the number of infections, in part due to the fact that many people across the country, and particularly people in communities hit hard by COVID-19, might not have had easy access to PCR or antigen tests, or might have decided against seeking one. Still, probable case numbers aren’t insignificant. Given the trends we’re seeing from states that do separately report probable and confirmed cases, we can assume that in the six states reporting confirmed cases only, large numbers of would-be probable cases aren’t making it into any public-facing official count. Indeed, our analysis shows that starting in the fall of 2020, an overall increase in the number of probable cases appears in almost every state.
Our analysis also looked at whether the proportion of probable cases as a share of total reported cases changed over the course of the pandemic. An increase in the proportion of probable cases would indicate that states were more efficiently tracking probable cases compared to summer of 2020. If the proportion of probable cases did not increase in the total number of cases, that means the increase in the number of probable cases was simply an artifact of increased cases as a whole.
We looked at the 21 states and territories that reported probable and confirmed cases separately and for which we’ve collected continuous data from August 5, 2020—the day the updated CSTE definition was issued—until March 7, 2021. We found a growing share of probable cases, by as much as 20x in a few states:
Differences in interpretation and use between states
There’s no denying that the decision to count antigen positive test results as probable cases contributed to the growth of probable numbers, but the extent to which antigen tests alone drove that growth isn’t easy to parse. Given the varying definitions across states, it’s especially difficult to interpret this data. In some states, a probable case of COVID-19 translates to an antigen positive test result unequivocally; in others, broader symptomatic and epidemiological criteria has played a role in defining the probable count.
(A quick word of caution, too: It’s important to remember that with many antigen tests happening outside of traditional healthcare settings, the data reporting pipelines might not be set up to actually send all antigen test results to state officials.)
There are nine states that report both probable cases and either antigen positive individuals or antigen positive tests—and report them all smoothly enough and for a long enough period that we can spot meaningful trends.3 Out of those nine states, five—Arkansas, Connecticut, Tennessee, Maine, and Virginia—have probable cases that very closely track trends in antigen positive test results.
Our suspicion is that in these states, officials are identifying probable cases only (or almost entirely) through antigen tests, and we suspect these decisions were made because the scale of the pandemic grew too large to track symptomatic and epidemiological criteria in a meaningful enough way.
Four additional states—Florida, Georgia, Indiana, and Louisiana—have notes on their websites about using only antigen tests to identify probable cases. And one more state, Massachusetts, decided to use only antigen tests and death certificates on September 2, 2020, cutting cases that only met the clinical and epidemiological criteria out of its count. A note from the state about the definitional shift provided some insight into their decision: “In order to provide a single set of consistent data for tracking COVID-19 in Massachusetts, today’s data includes only probable cases identified through antigen testing or death certificates. These criteria are the most objective and able to be applied over time.”
That day, we noticed a sharp decrease in several data points for Massachusetts; the state’s overall case count dropped by more than 7,000.
There are four states—Kentucky, Ohio, South Carolina, and Texas—where trends in antigen positives do not closely track probable case counts. In three of those four states, all except South Carolina, the antigen test numbers have largely been lower than probable cases.4 There are a handful of possible explanations for these discrepancies. It is possible that some of the difference could be due to sourcing between test and case metrics: for example, states could be only reporting electronically transmitted antigen tests but also including faxed reports in probable case counts. But most likely, this difference reflects use of the symptomatic and epidemiological criteria—as Texas officials confirmed it did in their state.
You might think that the decision to track only antigen-identified probable cases would result in an undercount of cases overall—especially given that removing the symptomatic and epidemiological criteria initially led to such a large drop in Massachusetts. But probable cases grew as a share of total cases by a comparable amount in these two groups of states between November 1, 2020 and March 7, 2021—approximately doubling in both the states where probable cases don’t track with antigen positives and in the states where they do.
The near identical growth in both groups leads us to believe that state testing strategies played an important role in shaping probable case counts: The states that dropped the symptomatic and epidemiological criteria likely have strong enough antigen testing programs to pick up on those individuals through testing, anyway. Identifying probable cases from just contact tracing and symptom exposure was a laborious task for state health departments, especially at scale, so with the rise in the availability of antigen tests, health departments had a more efficient method.
This research serves as a reminder that even if all states were to be using standardized definitions, comparisons would still be fraught due to differences in infrastructure and testing strategies.
Probable cases are real COVID-19 cases
Like so many COVID-19 data points, the probable cases metric is complex, and to interpret it accurately requires an understanding of its definition, practical application, and evolution. This is all especially important in a context where the difference between the number of cases and the number of positive PCR tests has become a common source of confusion and of disinformation—with some alleging that counting probable cases inflates case numbers.
The reasons for tracking probable cases aren’t obvious, nor are they consistent across states, but the differences stem from legitimate decision-making rooted in epidemiology, not from anything nefarious or negligent. Given the high specificity of COVID-19 antigen tests, it’s entirely reasonable for states to be counting these positive test results as probable cases. In fact, in many countries, an antigen positive constitutes a confirmed COVID-19 case.
The share of probable cases in the US’s overall COVID-19 case count didn’t grow because state health officials were itching to count just anybody. It grew because as the country’s understanding of COVID-19 and capabilities around testing for it evolved, so too did the ability to paint a fuller picture. There is no reason to be concerned that probable case counts are inflated; if anything, both confirmed and probable case counts, largely at the mercy of inadequate testing, underestimate the true scale of COVID-19 infections in the United States.
Additional research: Michal Mart, Jessica Malaty Rivera
Graphics: Dave Luo, Kara Schechtman, Lauran Hazan
On June 3, 2021, we updated language to reflect our confirmation of Vermont’s custom probable case definition with the state health department. We had been unable to reach them to confirm before publication.
1 After our data collection stopped on March 7, 2021, Missouri began publishing a separate count of probable cases.
2 Our daily data collection also looked at American Samoa, which we did not include in this analysis for the very best reason: The COVID Tracking Project never reported a single COVID-19 case for American Samoa.
3 The boundary between probable cases and antigen positive individuals has been porous since the updated probable case definition, or CSTE-02. There have been a few states that have shared “antigen positives” without clarifying whether they were counting probable cases, which will get revised down upon confirmation, or people testing positive via antigen, which will not get rounded down.
4 Though daily trends differ, South Carolina has a very similar topline number of probable cases and antigen tests, suggesting the difference between trends in antigen positive tests and probable cases could be due to data dumps in one or the other that distorted trends.
Madhavi Bharadwaj is a data driven creative thinker and lifelong learner who is passionate about healthcare and public health.
Lauran Hazan is a product management leader with a minor data obsession. Currently in the Boston area on a new startup adventure.
Kara Schechtman is Data Quality Co-Lead for The COVID Tracking Project.
Sara Simon works on The COVID Tracking Project’s data quality team and is also a contributing writer. She most recently worked as an investigative data reporter at Spotlight PA and software engineer at The New York Times.
More “Testing Data” posts
20,000 Hours of Data Entry: Why We Didn’t Automate Our Data Collection
Looking back on a year of collecting COVID-19 data, here’s a summary of the tools we automated to make our data entry smoother and why we ultimately relied on manual data collection.
A Wrap-Up: The Five Major Metrics of COVID-19 Data
As The COVID Tracking Project comes to a close, here’s a summary of how states reported data on the five major COVID-19 metrics we tracked—tests, cases, deaths, hospitalizations, and recoveries—and how reporting complexities shaped the data.
Releasing Our State COVID-19 Data Log
From July 2020 to March 2021, The COVID Tracking Project compiled a detailed set of structured COVID-19 data notes, both on changes states made to the data and changes we made to the data we captured from states. Today, we’re releasing those notes.