This page lists and describes all the data, metadata, and related information we’ve released in public since The COVID Tracking Project began. It will soon be joined by a complete list of all the documentation and posts we’ve published about the data.
- Testing and outcomes
- Race & ethnicity
- Long-term care
- Vaccine metadata
- City data
- Miscellaneous repositories
Testing and outcomes
National testing and outcomes data
Cumulative daily totals of national level metrics for cases, tests, hospitalizations, and outcomes.
How to use it
This dataset aggregates all the state-level testing and outcomes data on the national level and measures the movement of the COVID-19 pandemic in the US over time.
State testing and outcomes
State-level metrics for cases, tests, hospitalizations, and outcomes.
How to use it
This data can provide a “snapshot” of different COVID-19 metrics between states, while the linked state-level historical data can show how important measures have evolved over time.
Related federal data
CDC United States COVID-19 Cases and Deaths by State over Time
About
- Agency:CDC
- Start date:January 22, 2020
- Timeseries unit:Day
- Geographic units:State/Territory
- Update frequency:Twice daily
- Data page linkfor CDC United States COVID-19 Cases and Deaths by State over Time
- Download linkfor CDC United States COVID-19 Cases and Deaths by State over Time
- Query linkfor CDC United States COVID-19 Cases and Deaths by State over Time
- Chart linkfor CDC United States COVID-19 Cases and Deaths by State over Time
Description
COVID-19 cases and deaths by state/territory. This aggregate cases and deaths dataset is used in the COVID Data Tracker, COVID Data Tracker Weekly Review, Community Profile Reports and State Profile Reports. It is available for download both from the CDC and on HealthData.gov, in a variety of formats including CSV and XML. The CSVs are straightforward and easy to work with. It is also possible to filter, sort, and visualize the data on the CDC website without downloading it. You can also query the data online via the Socrata Open Data API after consulting the excellent and comprehensive documentation provided.
Last updated March 15, 2021
Our related posts
HHS COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries
About
- Agency:HHS
- Start date:January 1, 2020
- Timeseries unit:Day
- Geographic units:States/Territories
- Update frequency:Weekly
- Data page linkfor HHS COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries
- Download linkfor HHS COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries
- Query linkfor HHS COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries
- Chart linkfor HHS COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries
Description
This dataset reports hospital metrics by state and date, including current COVID-19 hospitalizations and many other metrics. It includes many hospital capacity and usage metrics, including, for example, the current number of adult and pediatric patients who are suspected or confirmed to have COVID-19 hospitalized in inpatient and intensive care unit (ICU) beds. Each of these metrics are reported by every US hospital every day, with the exception of psychiatric and rehabilitation hospitals, which report weekly. More than 6,000 hospitals report to HHS either directly or via their state or state hospital associations, and the underlying dataset is publicly accessible and used across federal and state agencies: The CDC uses this hospitalization data in their COVID-19 Data Tracker, and it is included in the publicly available Community Profile Reports and State Profile Reports.
Different versions or “slices” of this dataset are also available in HealthData.gov: The very large facility-level dataset “COVID-19 Reported Patient Impact and Hospital Capacity by Facility” of more than 92,000 rows includes the same hospital capacity and usage metrics for each reporting hospital by name. The summary, non-timeseries dataset “COVID-19 Reported Patient Impact and Hospital Capacity by State” dataset provides only the most recent (within the last four days) total value for each hospital metric by state.
Last updated March 15, 2021
Our related posts
HHS COVID-19 Diagnostic Laboratory Testing (PCR Testing) Time Series
About
- Agency:HHS
- Start date:March 1, 2020
- Timeseries unit:Day
- Geographic units:States/Territories
- Update frequency:Daily
- Data page linkfor HHS COVID-19 Diagnostic Laboratory Testing (PCR Testing) Time Series
- Download linkfor HHS COVID-19 Diagnostic Laboratory Testing (PCR Testing) Time Series
- Query linkfor HHS COVID-19 Diagnostic Laboratory Testing (PCR Testing) Time Series
- Chart linkfor HHS COVID-19 Diagnostic Laboratory Testing (PCR Testing) Time Series
Description
This testing dataset is a time series of PCR (polymerase chain reaction) tests and test results by state by day that begins March 1, 2020. It includes only PCR tests, not antibody (serology) tests or antigen tests, and it includes only results for test specimens, not numbers of unique people tested. Tests in this dataset are organized by the date the test was administered or the date of the test result, not by date of report. The data comes from The COVID Electronic Laboratory Reporting Program (CELR), a system launched in spring of 2020 specifically for the purpose of collecting COVID-19 data from laboratories. The dataset powers charts and visualizations on the HHS Protect Public Data Hub as well as weekly state-level charts of testing in the HHS State Profile Reports. CMS uses this dataset to compile and publish a weekly spreadsheet of the 14-day average of test positivity rates by county that is meant to help nursing homes estimate viral prevalence in their area.
This dataset has several significant differences from the state-aggregated data compiled and published by The COVID Tracking Project. Please read our analysis of federal testing data to be sure you understand these differences.
Last updated March 15, 2021
Our related posts
CDC NCHS Provisional Death Counts for Coronavirus Disease Index of Files
About
- Agency:CDC / NCHS
- Data page linkfor CDC NCHS Provisional Death Counts for Coronavirus Disease Index of Files
Description
The CDC’s National Center for Health Statistics regularly publishes various datasets about COVID-19 deaths based on death certificate data submitted to the National Vital Statistics System. Because death certificates take several weeks to be received and entered into the NVSS system, this death data significantly lags other sources of reported death counts from COVID-19. Death certificates, however, contain a great deal of information about the person who died, so these datasets are particularly useful for demographic research on factors such as age, race/ethnicity, and geographic location. Death certificates can contain errors and omissions that NCHS takes time to correct, and the data in these files has not yet been fully investigated, so NCHS emphasizes that this data is to be considered “provisional” and “ad hoc.”
Last updated March 11, 2021
Our related posts
State testing and outcomes data source notes
Exact sources and state-level instructions that describe how the state-level testing and outcome metrics are found or calculated.
How to use it
Source notes show the provenance of each data point in our state testing and outcomes dataset.
State screenshots
Screenshots of every state health department webpage from which we collect COVID-19 data, updated four times a day.
How to use it
These screenshots can be used to validate the numbers that are reported in the State Testing and Outcomes Data. They also provide a frequently-updated historical archive of state COVID-19 dashboards that are difficult for other internet archiving tools to record (for example, many internet archiving tools cannot capture ArcGIS).
State recovery definitions
State-level terminology and definitions for how a COVID-19 recovery is defined.
How to use it
This provides important context when interpreting the outcomes data in the State Testing and Outcomes dataset. Recoveries are defined in highly inconsistent ways, and you can read more in our post on the subject.
State antigen lumping
State cards label each jurisdiction’s reporting practices for antigen total testing according to the information available on that jurisdiction’s official website.
How to use it
These annotations are useful as an evaluation of states’ transparency about the test types they include in their total test figures. If a state is labeled as “Unclear,” its documentation needs work. They’re also useful for contrasting official definitions with what it appears that states are actually doing—please see our post The State of State Antigen Test Reporting for specific examples.
Data quality GitHub issues
Public log of every change made to the state-level data. Each “issue” contains a description of the problem and a link to the issue, and each “patch” provides a description of the issue, the date and state affected, and how each number was changed.
How to use it
This Github repository provides transparency in why values in datasets may have been altered. It is a historical record of corrections, patches, and backfills in our data.
Race & ethnicity
COVID Racial Data Tracker
State-level metrics for tests, cases, hospitalizations, and deaths, broken down by race and ethnicity (where available). For most jurisdictions, we have data for cases and deaths only.
How to use it
This dataset can be used to examine the disproportionate effects of the COVID-19 pandemic on racial and ethnic communities within US states and territories, see how disparities have changed over time, and understand what is happening nationwide.
Related federal data
CDC COVID-19 Case Surveillance Public Use Data
About
- Agency:CDC
- Start date:January 1, 2020
- Timeseries unit:Day
- Geographic units:N/A in public dataset, Counties and States/Territories in restricted dataset
- Update frequency:Monthly
- Data page linkfor CDC COVID-19 Case Surveillance Public Use Data
- Download linkfor CDC COVID-19 Case Surveillance Public Use Data
- Query linkfor CDC COVID-19 Case Surveillance Public Use Data
- Chart linkfor CDC COVID-19 Case Surveillance Public Use Data
Description
The CDC COVID-19 Case Surveillance Public Use Data dataset is line-level data, not aggregate data, which means that it includes a de-identified line for each person reported as a case of COVID-19. Each line includes detailed demographic data for that person. This dataset is updated only once per month because of the complexity of working with such extremely detailed data, and the data itself is very unwieldy to work with—the downloadable version currently has nearly 21 million rows. The CDC’s COVID Data Tracker uses this dataset to provide a daily snapshot of current trends in COVID-19 cases and deaths by race/ethnicity, but these trends are not available in a timeseries---only the most current national counts and percentages are provided.
Because this data contains identifying and sensitive information, the CDC provides both a 12-data-element public-use dataset of the line list and a 32-data-element restricted-access dataset of the same data that includes potentially identifying information. The public use dataset includes fields for sex
, age_group
, and race_ethnicity_combined
, information about whether the person was hospitalized or sent to the ICU, as well as four separate fields that can help assign a date to the case: the date the person said their symptoms began, the date the person’s diagnostic test had a positive result, the date the case was reported to the CDC, and the earliest available of these three dates. The restricted use dataset includes the state and county of residence of each person reported to have contracted COVID-19, as well as information on whether the person is a healthcare worker. If you want to use the COVID-19 Case Surveillance Restricted Access Detailed Data, you must apply to the CDC for permission. Data elements (fields) for both the public and the restricted versions of this dataset can be found on the COVID-19 case report form.
Last updated March 23, 2021
Our related posts
CDC COVID-NET Rates of COVID-19-Associated Hospitalization per 100,000 Population
About
- Agency:CDC, COVID-NET
- Start date:March 1, 2020
- Timeseries unit:Week
- Geographic units:State (14 states only)
- Update frequency:Weekly
- Data page linkfor CDC COVID-NET Rates of COVID-19-Associated Hospitalization per 100,000 Population
- Chart linkfor CDC COVID-NET Rates of COVID-19-Associated Hospitalization per 100,000 Population
Description
COVID-NET, the COVID-19-Associated Hospitalization Surveillance Network, is a group of more than 250 US hospitals in ninety-nine counties in fourteen states that gathers in-depth data about people who are hospitalized with confirmed cases of COVID-19. The fourteen states with selected participating hospitals are California, Colorado, Connecticut, Georgia, Iowa, Maryland, Michigan, Minnesota, New Mexico, New York, Ohio, Oregon, Tennessee, and Utah. The data from COVID-NET is obviously representative, not comprehensive, since only a small number of US hospitals, counties, and states are part of COVID-NET. COVID-NET data represents about 10% of the US population.
COVID-NET data can be downloaded bly clicking the “Download data” button at the top right of the COVID-NET data page. COVID-NET reports data weekly by the number of the week in a particular year, so, for instance, the data begins with week 10 of the year 2020, which was the week ending March 7, 2020. The elements tracked are Age Category, Sex, and Race, and the data is standardized within each metric. Unfortunately, the Race category as tracked by COVID-NET is not perfectly comparable to the race_ethnicity_combined
category tracked in the HHS COVID-19 Case Surveillance Data. Please watch our analyses of federal race and ethnicity data to be sure you understand these differences.
Last updated March 23, 2021
Our related posts
CDC NCHS Provisional Death Counts for Coronavirus Disease (COVID-19): Distribution of Deaths by Race and Hispanic Origin
About
- Agency:CDC, NCHS
- Start date:January 1, 2020
- Geographic units:State, Country
- Update frequency:Weekly
- Data page linkfor CDC NCHS Provisional Death Counts for Coronavirus Disease (COVID-19): Distribution of Deaths by Race and Hispanic Origin
- Download linkfor CDC NCHS Provisional Death Counts for Coronavirus Disease (COVID-19): Distribution of Deaths by Race and Hispanic Origin
- Query linkfor CDC NCHS Provisional Death Counts for Coronavirus Disease (COVID-19): Distribution of Deaths by Race and Hispanic Origin
- Chart linkfor CDC NCHS Provisional Death Counts for Coronavirus Disease (COVID-19): Distribution of Deaths by Race and Hispanic Origin
Description
The CDC’s National Center for Health Statistics regularly publishes various datasets about COVID-19 deaths based on death certificate data submitted to the National Vital Statistics System. Because death certificates take several weeks to be received and entered into the NVSS system, this death data significantly lags other sources of reported death counts from COVID-19. Death certificates, however, contain a great deal of information about the person who died, so these datasets are particularly useful for demographic research on factors such as age, race/ethnicity, and geographic location. Death certificates can contain errors and omissions that NCHS takes time to correct, and the data in these files has not yet been fully investigated, so NCHS emphasizes that this data is to be considered “provisional” and “ad hoc.” A full list of these datasets about deaths from COVID-19 is available at https://www.cdc.gov/nchs/covid19/covid-19-mortality-data-files.htm.
This particular dataset concerning the distribution of COVID-19 deaths by race and Hispanic origin gives aggregate proportions by state and for the US of all those who have died since January 1, 2020 according to the ethnicity and race information on that person’s death certificate. It is updated weekly, and the dataset also gives the “as of” date so that a user can judge the recency of the information. Proportions are given both as unweighted (raw) percentage of all COVID-19 deaths and as a weighted distribution of population.
Last updated March 23, 2021
Our related posts
Long-term care
Long-term care tracker
CSV files of every long-term-care facility we collect data for, and every state’s total cumulative and outbreak numbers.
How to use it
This is the same as the data that appears in our LTC facility map and the individual state LTC pages. We currently link to these files from every state’s LTC page.
Related federal data
CMS COVID-19 Nursing Home Dataset
About
- Agency:CMS
- Start date:May 17, 2020
- Timeseries unit:Week
- Geographic units:Individual Nursing Home, Address, City, State, Zip Code
- Update frequency:Weekly
- Data page linkfor CMS COVID-19 Nursing Home Dataset
- Download linkfor CMS COVID-19 Nursing Home Dataset
- Query linkfor CMS COVID-19 Nursing Home Dataset
- Chart linkfor CMS COVID-19 Nursing Home Dataset
Description
This very large dataset includes facility-level data for Skilled Nursing Facilities that report COVID-19 information to the CDC’s National Healthcare Safety Network (NHSN). It does not include data before May 17, 2020, and it does not include data from state-regulated assisted living facilities and other resident care homes. Case data (both confirmed and suspected cases) and death data for both residents and staff are included, as is the total number of residents in each facility. Outbreaks are not reported by that term, but fields such as “Three or More Confirmed Cases of COVID-19 This Week” serve the same purpose. Charts and topline figures from this data appear on the CMS COVID-19 Nursing Home Data page.
This data can be difficult to work with, in part because of the sheer size of the dataset: it is currently nearly 600,000 rows and grows weekly. However, data definitions for all metrics are much more standardized in the CMS nursing home data than in the state aggregate data compiled by CTP. Please read our analyses of federal long-term-care data to be sure you understand these differences.
Last updated March 8, 2021
Our related posts
State-level aggregate long-term care dataset
This dataset is the most representative of the total impact of COVID-19 in long-term-care facilities. Some states report all cases and deaths ever (cumulative) and some only report recent cases and deaths (outbreak). For states that only provide recent cases and deaths(outbreak), the aggregate dataset provides the highest cases and deaths ever reported on a single day and carries this number unless more cases and deaths are reported on a subsequent single day. CTP’s aggregated data for these states drastically under-reports actual cumulative totals because it is only a single day high.
How to use it
Use this dataset for most analysis, paying special attention to which states only report outbreak reporting when trending data. Examine COVID-19 data in long-term care facilities by state and the disproportionate impact experienced in relation to the general population.
Individual state facility-level long-term care dataset
Time series dataset of facilities reported by states to have either cases and/or deaths. The data is categorized by state, county, facility name, facility type when available, state or federal regulator. It provides cumulative and current outbreak cases and deaths.
How to use it
This dataset allows for the most granular analysis of COVID-19 in long-term-care facilities. It provides insight into how individual facilities fared throughout the pandemic. It is a comprehensive list that can be used to identify when and what types of facilities experienced outbreaks and to what magnitude. States that do not provide facility-level data are not included in this dataset.
State-level cumulative long-term care dataset
State’s reported cumulative totals for cases and deaths of residents and staff in nursing homes, assisted living facilities and other long-term-care facilities, as well as the number of facilities tracked.
How to use it
Use this dataset to compare and analyze states that report cumulative data. States that report only recent cases and deaths (outbreak) will not have data in certain categories in this dataset.
State-level current outbreak long-term care dataset
A COVID-19 outbreak is reported when a COVID-19 case (or cases) is identified in a facility. This outbreak is considered open/active until a specified time period (28 days, 14 days, etc.) has passed without the discovery of a new case.
How to use it
Outbreak data tells us where COVID-19 is at a certain point in time and cases go up and down from week to week. This dataset can be used to track current cases and deaths from week to week does not provide a comprehensive, cumulative picture. States that only report cumulative data but not current cases and deaths will not have data in this dataset.
Vaccine metadata
State vaccination metrics
A dataset that provides information for each state on what vaccination data is available, any breakdowns the states provide (e.g. demographic breakdowns, manufacturers), definitions provided by the state of each metric, and where it can be found on state dashboards.
How to use it
This is a guide for those interested in vaccination data including where that information can be found and how states differ in what data they make available.
Related federal data
CDC COVID-19 Vaccinations in the United States
About
- Agency:CDC
- Geographic units:States/Territories
- Update frequency:Daily
- Data page linkfor CDC COVID-19 Vaccinations in the United States
- Chart linkfor CDC COVID-19 Vaccinations in the United States
Description
The CDC’s COVID Data Tracker gives a daily snapshot of COVID-19 vaccinations in the US by state. Metrics include doses delivered, doses administered, people vaccinated who have received one or more doses, and people vaccinated who have received two or more doses. Currently, only people who are at least eighteen years old are included. The underlying data is available for download on the page, but it is not a timeseries: it includes only the most recent totals by state.
Timeseries datasets for vaccine allocation by manufacturer are also available on Data.CDC.gov. The Pfizer dataset begins 12/14/2020, the Moderna dataset begins 12/21/20, and the Janssen dataset begins 3/1/21. These three datasets are updated weekly and provide information about both first and second dose allocations for each state/territory.
Last updated March 11, 2021
Our related posts
CDC Federal Pharmacy Partnership for Long-Term Care (LTC) Program
About
- Agency:CDC
- Geographic units:States/Territories
- Update frequency:Daily
- Data page linkfor CDC Federal Pharmacy Partnership for Long-Term Care (LTC) Program
- Chart linkfor CDC Federal Pharmacy Partnership for Long-Term Care (LTC) Program
Description
The CDC’s COVID Data Tracker gives a daily snapshot of COVID-19 vaccinations in US long-term-care facilities by state/territory. Metrics include total doses administered in long-term-care facilities, people in long-term-care facilities vaccinated who have received one or more doses, and people in long-term-care facilities vaccinated who have received two or more doses. The data does not differentiate between residents and staff of long-term-care facilities. The underlying data is available for download on the COVID Data Tracker page, but it is not a timeseries: it includes only the most recent totals by state.
The Federal Pharmacy Partnership for Long-Term Care (LTC) Program is a public/private partnership between the federal government and commercial pharmacies, notably Walgreens and CVS, with the purpose of distributing and administering vaccinations to one of the most at-risk populations in the US for death from COVID-19. West Virginia and most US territories chose not participate in the program and so are not included in the data: Puerto Rico is a participant and is included. Facility-level data about vaccine administration in long-term-care facilities is stored in the Tiberius system but is generally not public. Only South Carolina currently publishes facility-level data from Tiberius about vaccine administration in long-term-care settings as far as we can tell. Walgreens and CVS also publish data from this program about vaccine administration.
Last updated March 12, 2021
Our related posts
Demographic vaccine annotations
State-level race and ethnicity categorization for vaccination data. This also includes definitions for how “vaccines” are defined for each state.
How to use it
This provides important context when interpreting the vaccine data at the race and ethnicity level from the State Testing and Outcomes page.
Related federal data
CDC COVID-19 Vaccinations in the United States
About
- Agency:CDC
- Geographic units:States/Territories
- Update frequency:Daily
- Data page linkfor CDC COVID-19 Vaccinations in the United States
- Chart linkfor CDC COVID-19 Vaccinations in the United States
Description
The CDC’s COVID Data Tracker gives a daily snapshot of COVID-19 vaccinations in the US by state. Metrics include doses delivered, doses administered, people vaccinated who have received one or more doses, and people vaccinated who have received two or more doses. Currently, only people who are at least eighteen years old are included. The underlying data is available for download on the page, but it is not a timeseries: it includes only the most recent totals by state.
Timeseries datasets for vaccine allocation by manufacturer are also available on Data.CDC.gov. The Pfizer dataset begins 12/14/2020, the Moderna dataset begins 12/21/20, and the Janssen dataset begins 3/1/21. These three datasets are updated weekly and provide information about both first and second dose allocations for each state/territory.
Last updated March 11, 2021
Our related posts
City data
City data
Metropolitan level case and death data broken down by race and ethnicity (where available) for 65 cities and counties from May 29 to October 21 (note: not all locations were tracked for this entire time series).
How to use it
This data can be used to examine COVID-19 at a granular local level, expose racial disparities in terms of case fatality rates or overrepresentation in case numbers, examine the impact of holidays, gatherings, or local legislation, and identify “hotspots” within a state that may be experiencing an outbreak.
Miscellaneous repositories
Website data repository
A collection of data that is on the website that is not included in the comprehensive API. Some examples include long-term care and race and ethnicity data as well as other annotations.
How to use it
This can be used to view miscellaneous content that is not available in the traditional API. It is probably most helpful to individuals with a very specific, unique area of interest from the website that they want to learn more about.
Archived federal data
Github repository with back-ups from the Covid Tracking API and archived HHS, CDC, and FDA government data. The README.md file contains a complete description of what is included.
How to use it
This repository provides an archive of some government COVID data, with CSV and JSON files downloaded regularly from government sites during the pandemic. This allows us to produce a history of federal point-in-time data sources. COVID Tracking Project Data should be collected from the Covid Tracking API instead.