Skip site navigation

This page lists and describes all the data, metadata, and related information we’ve released in public since The COVID Tracking Project began. It will soon be joined by a complete list of all the documentation and posts we’ve published about the data.

Testing and outcomes

National testing and outcomes data

Cumulative daily totals of national level metrics for cases, tests, hospitalizations, and outcomes.

How to use it

This dataset aggregates all the state-level testing and outcomes data on the national level and measures the movement of the COVID-19 pandemic in the US over time.

State testing and outcomes

State-level metrics for cases, tests, hospitalizations, and outcomes.

How to use it

This data can provide a “snapshot” of different COVID-19 metrics between states, while the linked state-level historical data can show how important measures have evolved over time.

Related federal data

State testing and outcomes data source notes

Exact sources and state-level instructions that describe how the state-level testing and outcome metrics are found or calculated.

How to use it

Source notes show the provenance of each data point in our state testing and outcomes dataset.

State screenshots

Screenshots of every state health department webpage from which we collect COVID-19 data, updated four times a day.

How to use it

These screenshots can be used to validate the numbers that are reported in the State Testing and Outcomes Data. They also provide a frequently-updated historical archive of state COVID-19 dashboards that are difficult for other internet archiving tools to record (for example, many internet archiving tools cannot capture ArcGIS).

State recovery definitions

State-level terminology and definitions for how a COVID-19 recovery is defined.

How to use it

This provides important context when interpreting the outcomes data in the State Testing and Outcomes dataset. Recoveries are defined in highly inconsistent ways, and you can read more in our post on the subject.

State antigen lumping

State cards label each jurisdiction’s reporting practices for antigen total testing according to the information available on that jurisdiction’s official website.

How to use it

These annotations are useful as an evaluation of states’ transparency about the test types they include in their total test figures. If a state is labeled as “Unclear,” its documentation needs work. They’re also useful for contrasting official definitions with what it appears that states are actually doing—please see our post The State of State Antigen Test Reporting for specific examples.

Data quality GitHub issues

Public log of every change made to the state-level data. Each “issue” contains a description of the problem and a link to the issue, and each “patch” provides a description of the issue, the date and state affected, and how each number was changed.

How to use it

This Github repository provides transparency in why values in datasets may have been altered. It is a historical record of corrections, patches, and backfills in our data.

Race & ethnicity

COVID Racial Data Tracker

State-level metrics for tests, cases, hospitalizations, and deaths, broken down by race and ethnicity (where available). For most jurisdictions, we have data for cases and deaths only.

How to use it

This dataset can be used to examine the disproportionate effects of the COVID-19 pandemic on racial and ethnic communities within US states and territories, see how disparities have changed over time, and understand what is happening nationwide.

Related federal data

Long-term care

Long-term care tracker

CSV files of every long-term-care facility we collect data for, and every state’s total cumulative and outbreak numbers.

How to use it

This is the same as the data that appears in our LTC facility map and the individual state LTC pages. We currently link to these files from every state’s LTC page.

Related federal data

State-level aggregate long-term care dataset

This dataset is the most representative of the total impact of COVID-19 in long-term-care facilities. Some states report all cases and deaths ever (cumulative) and some only report recent cases and deaths (outbreak). For states that only provide recent cases and deaths(outbreak), the aggregate dataset provides the highest cases and deaths ever reported on a single day and carries this number unless more cases and deaths are reported on a subsequent single day. CTP’s aggregated data for these states drastically under-reports actual cumulative totals because it is only a single day high.

How to use it

Use this dataset for most analysis, paying special attention to which states only report outbreak reporting when trending data. Examine COVID-19 data in long-term care facilities by state and the disproportionate impact experienced in relation to the general population.

Individual state facility-level long-term care dataset

Time series dataset of facilities reported by states to have either cases and/or deaths. The data is categorized by state, county, facility name, facility type when available, state or federal regulator. It provides cumulative and current outbreak cases and deaths.

How to use it

This dataset allows for the most granular analysis of COVID-19 in long-term-care facilities. It provides insight into how individual facilities fared throughout the pandemic. It is a comprehensive list that can be used to identify when and what types of facilities experienced outbreaks and to what magnitude. States that do not provide facility-level data are not included in this dataset.

State-level cumulative long-term care dataset

State’s reported cumulative totals for cases and deaths of residents and staff in nursing homes, assisted living facilities and other long-term-care facilities, as well as the number of facilities tracked.

How to use it

Use this dataset to compare and analyze states that report cumulative data. States that report only recent cases and deaths (outbreak) will not have data in certain categories in this dataset.

State-level current outbreak long-term care dataset

A COVID-19 outbreak is reported when a COVID-19 case (or cases) is identified in a facility. This outbreak is considered open/active until a specified time period (28 days, 14 days, etc.) has passed without the discovery of a new case.

How to use it

Outbreak data tells us where COVID-19 is at a certain point in time and cases go up and down from week to week. This dataset can be used to track current cases and deaths from week to week does not provide a comprehensive, cumulative picture. States that only report cumulative data but not current cases and deaths will not have data in this dataset.

Vaccine metadata

State vaccination metrics

A dataset that provides information for each state on what vaccination data is available, any breakdowns the states provide (e.g. demographic breakdowns, manufacturers), definitions provided by the state of each metric, and where it can be found on state dashboards.

How to use it

This is a guide for those interested in vaccination data including where that information can be found and how states differ in what data they make available.

Related federal data

Demographic vaccine annotations

State-level race and ethnicity categorization for vaccination data. This also includes definitions for how “vaccines” are defined for each state.

How to use it

This provides important context when interpreting the vaccine data at the race and ethnicity level from the State Testing and Outcomes page.

Related federal data

City data

City data

Metropolitan level case and death data broken down by race and ethnicity (where available) for 65 cities and counties from May 29 to October 21 (note: not all locations were tracked for this entire time series).

How to use it

This data can be used to examine COVID-19 at a granular local level, expose racial disparities in terms of case fatality rates or overrepresentation in case numbers, examine the impact of holidays, gatherings, or local legislation, and identify “hotspots” within a state that may be experiencing an outbreak.

Miscellaneous repositories

Website data repository

A collection of data that is on the website that is not included in the comprehensive API. Some examples include long-term care and race and ethnicity data as well as other annotations.

How to use it

This can be used to view miscellaneous content that is not available in the traditional API. It is probably most helpful to individuals with a very specific, unique area of interest from the website that they want to learn more about.

Archived federal data

Github repository with back-ups from the Covid Tracking API and archived HHS, CDC, and FDA government data. The README.md file contains a complete description of what is included.

How to use it

This repository provides an archive of some government COVID data, with CSV and JSON files downloaded regularly from government sites during the pandemic. This allows us to produce a history of federal point-in-time data sources. COVID Tracking Project Data should be collected from the Covid Tracking API instead.