Data Summary

This page lists and describes all the data, metadata, and related information we’ve released in public since The COVID Tracking Project began. It will soon be joined by a complete list of all the documentation and posts we’ve published about the data.

Testing and outcomes
Race & ethnicity
- COVID Racial Data Tracker
Long-term care
Vaccine metadata
- State vaccination metrics
- Demographic vaccine annotations
City data
- City data
Miscellaneous repositories
- Website data repository
- Archived federal data

Testing and outcomes

National testing and outcomes data

Cumulative daily totals of national level metrics for cases, tests, hospitalizations, and outcomes.

How to use it

This dataset aggregates all the state-level testing and outcomes data on the national level and measures the movement of the COVID-19 pandemic in the US over time.

State testing and outcomes

State-level metrics for cases, tests, hospitalizations, and outcomes.

How to use it

This data can provide a “snapshot” of different COVID-19 metrics between states, while the linked state-level historical data can show how important measures have evolved over time.

Related federal data

State testing and outcomes data source notes

Exact sources and state-level instructions that describe how the state-level testing and outcome metrics are found or calculated.

Data source

How to use it

Source notes show the provenance of each data point in our state testing and outcomes dataset.

State screenshots

Screenshots of every state health department webpage from which we collect COVID-19 data, updated four times a day.

Data source

How to use it

These screenshots can be used to validate the numbers that are reported in the State Testing and Outcomes Data. They also provide a frequently-updated historical archive of state COVID-19 dashboards that are difficult for other internet archiving tools to record (for example, many internet archiving tools cannot capture ArcGIS).

State recovery definitions

State-level terminology and definitions for how a COVID-19 recovery is defined.

Data source

How to use it

This provides important context when interpreting the outcomes data in the State Testing and Outcomes dataset. Recoveries are defined in highly inconsistent ways, and you can read more in our post on the subject.

State antigen lumping

State cards label each jurisdiction’s reporting practices for antigen total testing according to the information available on that jurisdiction’s official website.

Data source

How to use it

These annotations are useful as an evaluation of states’ transparency about the test types they include in their total test figures. If a state is labeled as “Unclear,” its documentation needs work. They’re also useful for contrasting official definitions with what it appears that states are actually doing—please see our post The State of State Antigen Test Reporting for specific examples.

Data quality GitHub issues

Public log of every change made to the state-level data. Each “issue” contains a description of the problem and a link to the issue, and each “patch” provides a description of the issue, the date and state affected, and how each number was changed.

Data source

How to use it

This Github repository provides transparency in why values in datasets may have been altered. It is a historical record of corrections, patches, and backfills in our data.

Race & ethnicity

COVID Racial Data Tracker

State-level metrics for tests, cases, hospitalizations, and deaths, broken down by race and ethnicity (where available). For most jurisdictions, we have data for cases and deaths only.

How to use it

This dataset can be used to examine the disproportionate effects of the COVID-19 pandemic on racial and ethnic communities within US states and territories, see how disparities have changed over time, and understand what is happening nationwide.

Related federal data

CDC COVID-19 Case Surveillance Public Use Data

About

Agency:CDC
Start date:January 1, 2020
Timeseries unit:Day
Geographic units:N/A in public dataset, Counties and States/Territories in restricted dataset
Update frequency:Monthly
Data page linkfor CDC COVID-19 Case Surveillance Public Use Data
Download linkfor CDC COVID-19 Case Surveillance Public Use Data
Query linkfor CDC COVID-19 Case Surveillance Public Use Data
Chart linkfor CDC COVID-19 Case Surveillance Public Use Data

Description

The CDC COVID-19 Case Surveillance Public Use Data dataset is line-level data, not aggregate data, which means that it includes a de-identified line for each person reported as a case of COVID-19. Each line includes detailed demographic data for that person. This dataset is updated only once per month because of the complexity of working with such extremely detailed data, and the data itself is very unwieldy to work with—the downloadable version currently has nearly 21 million rows. The CDC’s COVID Data Tracker uses this dataset to provide a daily snapshot of current trends in COVID-19 cases and deaths by race/ethnicity, but these trends are not available in a timeseries---only the most current national counts and percentages are provided.

Because this data contains identifying and sensitive information, the CDC provides both a 12-data-element public-use dataset of the line list and a 32-data-element restricted-access dataset of the same data that includes potentially identifying information. The public use dataset includes fields for sex, age_group, and race_ethnicity_combined, information about whether the person was hospitalized or sent to the ICU, as well as four separate fields that can help assign a date to the case: the date the person said their symptoms began, the date the person’s diagnostic test had a positive result, the date the case was reported to the CDC, and the earliest available of these three dates. The restricted use dataset includes the state and county of residence of each person reported to have contracted COVID-19, as well as information on whether the person is a healthcare worker. If you want to use the COVID-19 Case Surveillance Restricted Access Detailed Data, you must apply to the CDC for permission. Data elements (fields) for both the public and the restricted versions of this dataset can be found on the COVID-19 case report form.

Last updated March 23, 2021

Bar graph showing the percentage of COVID-19 cases by race/ethnicity for 52% of known cases. White, non-Hispanic people are 55.9% of cases, Hispanic/Latino people are 20.7% of cases, Black, non-Hispanic people are 12.2% of cases, Asian, non-Hispanic people are 3.6% of cases, American Indian / Alaska Native people are 1.2% of cases, Native Hawaiian people are are .4% of cases, and multiple/other non-Hispanic people are 6% of cases.

Bar graph showing the percentage of COVID-19 deaths by race/ethnicity for 74% of known deaths. White, non-Hispanic people are 63.1% of deaths, Hispanic/Latino people are 12.2% of deaths, Black, non-Hispanic people are 14.7% of deaths, Asian, non-Hispanic people are 4.3% of deaths, American Indian / Alaska Native people are 1% of deaths, Native Hawaiian people are are .2% of deaths, and multiple/other non-Hispanic people are 4.4% of deaths.

Our related posts

Long-term care

Long-term care tracker

CSV files of every long-term-care facility we collect data for, and every state’s total cumulative and outbreak numbers.

Data source

How to use it

This is the same as the data that appears in our LTC facility map and the individual state LTC pages. We currently link to these files from every state’s LTC page.

Related federal data

State-level aggregate long-term care dataset

This dataset is the most representative of the total impact of COVID-19 in long-term-care facilities. Some states report all cases and deaths ever (cumulative) and some only report recent cases and deaths (outbreak). For states that only provide recent cases and deaths(outbreak), the aggregate dataset provides the highest cases and deaths ever reported on a single day and carries this number unless more cases and deaths are reported on a subsequent single day. CTP’s aggregated data for these states drastically under-reports actual cumulative totals because it is only a single day high.

How to use it

Use this dataset for most analysis, paying special attention to which states only report outbreak reporting when trending data. Examine COVID-19 data in long-term care facilities by state and the disproportionate impact experienced in relation to the general population.

Individual state facility-level long-term care dataset

Time series dataset of facilities reported by states to have either cases and/or deaths. The data is categorized by state, county, facility name, facility type when available, state or federal regulator. It provides cumulative and current outbreak cases and deaths.

How to use it

This dataset allows for the most granular analysis of COVID-19 in long-term-care facilities. It provides insight into how individual facilities fared throughout the pandemic. It is a comprehensive list that can be used to identify when and what types of facilities experienced outbreaks and to what magnitude. States that do not provide facility-level data are not included in this dataset.

State-level cumulative long-term care dataset

State’s reported cumulative totals for cases and deaths of residents and staff in nursing homes, assisted living facilities and other long-term-care facilities, as well as the number of facilities tracked.

How to use it

Use this dataset to compare and analyze states that report cumulative data. States that report only recent cases and deaths (outbreak) will not have data in certain categories in this dataset.

State-level current outbreak long-term care dataset

A COVID-19 outbreak is reported when a COVID-19 case (or cases) is identified in a facility. This outbreak is considered open/active until a specified time period (28 days, 14 days, etc.) has passed without the discovery of a new case.

How to use it

Outbreak data tells us where COVID-19 is at a certain point in time and cases go up and down from week to week. This dataset can be used to track current cases and deaths from week to week does not provide a comprehensive, cumulative picture. States that only report cumulative data but not current cases and deaths will not have data in this dataset.

Vaccine metadata

State vaccination metrics

A dataset that provides information for each state on what vaccination data is available, any breakdowns the states provide (e.g. demographic breakdowns, manufacturers), definitions provided by the state of each metric, and where it can be found on state dashboards.

Data source

How to use it

This is a guide for those interested in vaccination data including where that information can be found and how states differ in what data they make available.

Related federal data

Demographic vaccine annotations

State-level race and ethnicity categorization for vaccination data. This also includes definitions for how “vaccines” are defined for each state.

Data source

How to use it

This provides important context when interpreting the vaccine data at the race and ethnicity level from the State Testing and Outcomes page.

Related federal data

City data

Metropolitan level case and death data broken down by race and ethnicity (where available) for 65 cities and counties from May 29 to October 21 (note: not all locations were tracked for this entire time series).

Data source

How to use it

This data can be used to examine COVID-19 at a granular local level, expose racial disparities in terms of case fatality rates or overrepresentation in case numbers, examine the impact of holidays, gatherings, or local legislation, and identify “hotspots” within a state that may be experiencing an outbreak.

Miscellaneous repositories

Website data repository

A collection of data that is on the website that is not included in the comprehensive API. Some examples include long-term care and race and ethnicity data as well as other annotations.

Data source

How to use it

This can be used to view miscellaneous content that is not available in the traditional API. It is probably most helpful to individuals with a very specific, unique area of interest from the website that they want to learn more about.

Archived federal data

Github repository with back-ups from the Covid Tracking API and archived HHS, CDC, and FDA government data. The README.md file contains a complete description of what is included.

Data source

How to use it

This repository provides an archive of some government COVID data, with CSV and JSON files downloaded regularly from government sites during the pandemic. This allows us to produce a history of federal point-in-time data sources. COVID Tracking Project Data should be collected from the Covid Tracking API instead.

Testing and outcomes

National testing and outcomes data

How to use it

State testing and outcomes

How to use it

Related federal data

CDC United States COVID-19 Cases and Deaths by State over Time

About

Description

Our related posts

HHS COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries

About

Description

Our related posts

HHS COVID-19 Diagnostic Laboratory Testing (PCR Testing) Time Series

About

Description

Our related posts

CDC NCHS Provisional Death Counts for Coronavirus Disease Index of Files

About

Description

Our related posts

State testing and outcomes data source notes

How to use it

State screenshots

How to use it

State recovery definitions

How to use it

State antigen lumping

How to use it

Data quality GitHub issues

How to use it

Race & ethnicity

COVID Racial Data Tracker

How to use it

Related federal data

CDC COVID-19 Case Surveillance Public Use Data

About

Description

Our related posts

CDC COVID-NET Rates of COVID-19-Associated Hospitalization per 100,000 Population

About

Description

Our related posts

CDC NCHS Provisional Death Counts for Coronavirus Disease (COVID-19): Distribution of Deaths by Race and Hispanic Origin

About

Description

Our related posts

Long-term care

Long-term care tracker

How to use it

Related federal data

CMS COVID-19 Nursing Home Dataset

About

Description

Our related posts

State-level aggregate long-term care dataset

How to use it

Individual state facility-level long-term care dataset

How to use it

State-level cumulative long-term care dataset

How to use it

State-level current outbreak long-term care dataset

How to use it

Vaccine metadata

State vaccination metrics

How to use it

Related federal data

CDC COVID-19 Vaccinations in the United States

About

Description

Our related posts

CDC Federal Pharmacy Partnership for Long-Term Care (LTC) Program

About

Description

Our related posts

Demographic vaccine annotations

How to use it

Related federal data

CDC COVID-19 Vaccinations in the United States