Data Annotations

What are annotations?

Annotations are per-state, per-metric structured notes on state reporting practices that have supported COVID Tracking Project decision making and research. Some examples of annotations we maintain are test type annotations (details about what kinds of tests states include in their main testing totals) and case definition annotations (what states mean when they say individuals have a “confirmed” or a “probable” case of COVID-19).

Since we are winding down our data collection, we hope this archive of metadata we’ve collected from states will be useful to researchers and data users aiming to understand state-level COVID-19 data and the methodologies we have used to collect and analyze it. You can read about our motivations for collecting these annotations and the work they’ve supported at CTP in our post on the subject.

Our data annotations are stored in Airtable, a simple database system that looks a lot like a very complicated spreadsheet.

Current Hospitalizations Annotations

You can read about what our current hospitalizations annotations mean and the work they’ve supported at CTP.

Our current hospitalizations annotations are stored in the below Airtable.

To find the current hospitalization metrics on a given state page, find the row for the state for the state you are interested in and click on the carrot in the “Source” column.

“Source Notes” are a set of instructions for finding the data on state pages and dashboards. States may have changed their reporting since we last looked at them and older annotations may not be reliable. All annotations have a “Last Checked” date that will let you see when we last looked at the metric in question.

You can find out how a state reports their current COVID hospitalizations in the “State Subgroup Labels” column of the hospitalization annotations. This field lists what the state calls the groups it includes in its current hospitalizations metrics. If the annotation is listed as “unclear,” this indicates a data definition was either unavailable or was missing information. “Not reported” indicates that a state does not report current COVID hospitalizations.

The “Evidence” column includes the exact language of material on the state website that leads us to our conclusions. Definitions frequently change, so keep in mind that a metric may not mean the same thing now that it did previously. These annotations reflect only the most recent version of the metric or definition as of the last time we checked.

To determine whether a state lumps their hospitalization metrics, look at the “Cases Reporting” column in Airtable. The cell will include the word “lumped” if the state-reported metric is lumped together. If the cell lists “unclear,” this means the data definition for the state either wasn’t provided or was unclear.

Another major source of variation in hospitalizations definitions is whether states track adult patients, pediatric patients, or both. The vast majority of states are unclear about the populations for which they are tracking current hospitalizations. The “Population” column of this annotation set lists the population included in a state’s currently hospitalized COVID metric.

Case, Test and Death Annotations

To use our annotations, you’ll need to reference four public Airtable tables: Annotation Sets, Definitions, and State Links. All of them are embedded below with a few details about what each contains.

Annotation sets

Annotations belong to different annotation sets, which are categories of labels we apply to a given group of metrics in our API. For example, “test types” is an annotation set that we use to track the types of tests—antigen, viral RNA, or antibody—that are included in all our total test metrics.

These are the different annotation sets we maintain and what metrics they apply to:

Definitions

Different annotation sets contain labels, which we use to categorize different kinds of state reporting. Each of those labels have definitions attached to them and conventions for how we affix them to metrics, which you can find in the following table:

Annotations

We create annotations for the different metrics that we track using the labels from annotations. Metrics can have multiple annotations, for two reasons:

They can have annotations from multiple different annotation sets. For example, we may have an annotation on both whether a case metric includes non-residents of a given state and what CSTE definition it is following.
If there are conflicting sources of evidence: We track information from state health department webpages, outreach, and external media reporting. Sometimes, we may get different information from outreach to a state health department than what they post on their website—or even two conflicting pieces of information from the same kind of source.

At a minimum, we always annotate a metric with labels for any annotation sets that apply to it reflecting what the state says about the subject in its documentation on its website, for a simple reason: we want to know how clear the state is being in its public presentation of the data.

To be labeled with the “website” source type, we expect information the state provides will be accessible from their data pages and presented as an evergreen resource with the clear intent of defining a metric. Examples include data definition documents, dashboard footnotes, or definitions appearing daily in press releases. If the state is not providing that kind of information, we label the metric as “unclear” according to the state health department website, before searching for other sources of information—whether that’s external reporting, our own outreach to state health department officials, or buried resources on the state website (categorized as “sleuthing”).

State Links

To find the metrics we are discussing, you will need to use the State Link corresponding to the link name mentioned in the first step “Metric” column of Annotations, which provides wayfinding for the data points we are annotating on state health department webpages. The link you should reference is the one in the column named the first stage of the instructions (before the first carrot).

Please note: These Metric notes are directly taken from our internal data entry instructions, so they may contain warnings or other internal instructions. You can visit our data source notes if you would like a version of the metric notes geared toward public consumption.

Warning for users

Since our project ends March 7, 2021, we are releasing the annotations as a one-time snapshot of our research into state and territorial definitions, rather than a constantly-updating source of information. State COVID-19 information changes quickly. That means you have to be wary that the annotations’ content may not be comprehensive, for a few reasons:

States may have changed their reporting since we last looked at them. We maintain two of our annotation sets, for test types and case definitions, on a weekly basis. However, others we only revisit every few months. All annotations have a “Last Checked” date that let you see when we last looked at the metric in question. Older annotations may not be reliable.
Not everything is double-checked. Before we use annotations in a formal analysis, all of our annotations are double-checked by experienced contributors. But because of the speed at which we’ve needed to move to keep up with states, for purposes of rough internal decision making about research directions, we don’t always double check annotations. Some of these annotations could be incorrectly classified. You can tell if an annotation has gone through a doublechecker’s review by looking for a “Yes” in the “Doublechecked” field or if the bar on the left side of the annotation is green (annotations that have not been doublechecked will have a yellow bar).
Our annotations have covered different topics in the past. Over the course of the pandemic, some reporting practices have solidified to the point that we don’t need to maintain annotations on them anymore. This new resource only provides our current annotations; we’ll be thinking about how to release the old ones as we archive our data over the next few months.

Please also note that this page does not contain our annotations on hospitalizations metrics. Those annotations are stored in a separate system; we will aim to release them in the next few weeks.