Table of contents
- Where do you get your data?
- Why are there so many spikes in the data?
- Why doesn’t your data match what I see on the official COVID-19 page for my state?
- Why doesn’t your data match the data from the CDC, or Worldometer, or Johns Hopkins, or USAFacts.org, or The New York Times, or another site?
- Why are your “new” cases, tests, or deaths counts for my state different from the “new” counts my state reports?
- How does the data deal with people who have had multiple tests for COVID-19?
- What are test encounters?
- Why do you list 56 states?
- Why have you stopped reporting data about the number of people who have recovered from COVID-19?
- Why are you removing values from the API field negative from various states starting on January 27, 2021?
- Why have you started including HHS hospitalization data on your data pages?
- Why have you stopped reporting national recoveries?
- Why have you stopped reporting national cumulative hospitalizations, ICU, and ventilation numbers on your website?
- Where has the “spreadsheet” option gone on the data page?
- Why don’t you report historical data on the state pages any more?
- Why did your national “total test results” numbers change on September 17?
- Why have your “Total test results” numbers changed for a particular state?
- Why don’t you report test positivity rates?
- Are you planning to track vaccination data?
- Why don’t you report county-level data? Will you be doing so in the future?
- Why aren’t you tracking age and sex?
- Can you report COVID-19 data related to schools and/or colleges?
- Why do you give X state such a good/bad grade?
- What states are included in the regions you display on your charts? What population figures do you use for per capita charts?
- Why don’t you harvest data automatically?
Almost all of the data we compile is taken directly from the websites of state/territory public health authorities. See our data sources page for more information, or go to any individual state’s data page, such as Alabama’s data page, to see “Where this data comes from” at the top of the page. To see where any specific data point comes from, refer to our public spreadsheet of source notes.. Read more about our data sources in our article “How We Source Our Data and Why It Matters.”
There are very strong day of the week effects in this dataset. Testing and reporting activity slows down on weekends, and health care staff and public health officials tend to “catch up” with their data reporting on Mondays and Tuesdays, causing spikes in the numbers later in the week. Some states report some metrics once per week, which will cause a spike on the day of the week they report that metric. Holidays can create backlogs of data, causing large apparent decreases followed by large apparent increases: read more about that here.
There are also other occasions when a lab or a county “dumps” a great deal of data all at once on a particular day, which makes the state’s numbers for that day unusually large. We try to report all such unusual data spikes in the public notes on each state’s data page and on our Twitter feed.
In general, any date in our data should be understood to be defined as “the date on which data was collected by The COVID Tracking Project,” which is generally the date the state reported the data point to the public in its cumulative totals. We recommend analyzing our data with 7-day or 14-day averages instead of with single day values to help mitigate the effect of these reporting spikes.
There are several reasons why our tracker might show different data than your state’s COVID-19 page, even when we use that same page as a source:
- Date lag. We update the dataset by hand once a day and release the data between about 5:30pm and 7pm Eastern Time. If a state updates its data after our daily compilation, we won’t pick up the new information until the next day.
- Hidden data. In some cases, we retrieve data that states do not display on their public dashboards from data files that the state provides. This data is still public and still official, but might only be visible “behind the scenes” of a data dashboard or in an obscure corner of the state’s COVID-19 site. Missouri, for example, does not display the value for Total PCR tests (in specimens) on its dashboard, but the data is there, though not displayed, and we retrieve it with a machine query.
- Different data definitions. In the absence of national data standards, we might use the same name for a metric as your state but use a different definition. For instance, our case, death, and hospitalization metrics all include “probable” and “suspected” cases for states that report them, whereas your state might include only lab-confirmed cases in its official case count while reporting probable cases separately.
- Different ways of reporting “new” cases, tests, or deaths. Please note that our “new” values for cases, tests, deaths and other metrics are calculated as the increase in the total cumulative value reported by the state since yesterday. States themselves, however, frequently define “new” cases, tests, and deaths differently. See our FAQ on “new” data points below for more information.
- Backfilled / backdated data. As explained above, we report data once each day on the date the state adds that data to its systems, whereas states themselves frequently “backfill” data, meaning that they enter data for previous days. By doing this, states can connect data points to pertinent dates such as the date a death occurred or the date a laboratory completed its analysis of a test. For instance, Florida’s state report includes a graph titled “COVID-19: cases and laboratory testing over time” whose numbers by date change frequently: on 9/9/20 the graph reported 2352 cases for 9/8/2020 and on 9/10/2020 the graph reported 2337 cases for 9/8/2020. Similarly, Rhode Island continually revises its historic values for “Cumulative people who tested positive” as they receive more results from laboratories, so our time series falls out of sync with the state’s time series. We do sometimes backfill our own historic data when states provide us with a time series for a metric in a structured format. This work is tracked in one of our Github repositories.
See the notes associated with each state and territory for more information about the data for that state.
Why doesn’t your data match the data from the CDC, or Worldometer, or Johns Hopkins, or USAFacts.org, or The New York Times, or another site?
There are several reasons why different data trackers show different data:
- Manual capture vs. automatic capture. Our volunteers manually update our numbers by visiting state/territory public health websites once a day, annotating any changes to data sources or data anomalies as they go. Our volunteers are often retrieving data from sources such as PDFs and livestreams of press conferences that automated tools have not been engineered to capture.
- Time lag. When other trackers rely on automated tools to collect data from state/territory public health authorities, their counts tend to be updated more frequently than ours. We currently spend about three hours every afternoon collecting data, and we publish it only once each day.
Data sources other than states/territories. Other trackers retrieve data directly from sources other than the state/territory public health authorities we use for our dataset.
Many other trackers, including Johns Hopkins, USAFacts.org, and The New York Times, rely on county data rather than state data. While counties do report their data to the state, in practice the sum total of county data points can often differ from the totals the state reports, probably because the state normalizes county data to its own standards.
The CDC has direct access to other sources of data in addition to state public health authorities. For instance, as of August 25, 2020, the CDC reports on its COVID-19 testing tracker that “The data for each state are sourced from either data submitted directly by the state health department via COVID-19 electronic laboratory reporting (CELR), or a combination of commercial, public health, and in-house hospital laboratories.”
Different data definitions. States/territories define data points in inconsistent ways, and the various trackers deal with those inconsistent definitions differently. For example, “deaths” is treated very differently by various states and trackers, especially when it comes to “probable deaths,” which are not reported by all states or trackers.
The state of New York, for instance, has not been reporting “probable deaths” from COVID-19, whereas New York City reports thousands of probable deaths. Worldometer includes the NYC probables in its death counts, whereas The COVID Tracking Project does not. (Johns Hopkins also does not include the NYC probable deaths on its US map but does on its Global map.) When the state of New York includes these probable deaths in its reporting, we will include them in ours.
Why are your “new” cases, tests, or deaths counts for my state different from the “new” counts my state reports?
Our “new” values for cases, tests, deaths and other metrics are calculated as the increase in the cumulative total number of cases, tests, or deaths reported by the state since yesterday. This way of calculating “new” data points is a function of how we collect data. For the most part, we enter data manually once each day by visiting the state’s official COVID-19 data sites, and we capture, record, and report the cumulative totals reported by the state to the public on that day.
States themselves, however, frequently enter data into their systems for previous dates. Cases might be recorded by the state with dates such as “date of symptom onset,” which is not usually the same as the first date the state reports that case to the public. Tests might be recorded by the state with dates such as “specimen collection date,” which might not be the same as the first date the state reported that test to the public. Deaths might be recorded by the state with dates such as “date of death,” which is not usually the same as the first date the state reports that death to the public.
On a Friday, for example, a state might enter five tests into its public reporting system, one whose specimen was collected on Wednesday, two whose specimens were collected on Thursday, and two whose specimens were collected on Friday. In that example, the state might report “2 new tests” for that Friday and we might report “5 new tests” for that Friday. We recommend analyzing our data with 7-day or 14-day averages instead of with single day values to help mitigate the effect of these reporting spikes.
COVID-19 “cases” refer to individual people, and even if a person tests positive for COVID-19 more than once, that person should in general only be counted once in the case counts. The same is true for death data, recovery data, and hospitalization data: those values should (barring mistakes) represent unique individuals.
Testing figures, however, might or might not include multiple tests administered to the same person: it depends on how the state “deduplicates” that data, meaning how it identifies and removes (or chooses not to remove) redundant / repeated information. It also depends to an extent on what units the state reports COVID-19 tests in. Some states report test results in units of “specimens tested,” some states report test results in units of “people tested,” some states report in units of “testing encounters” (meaning the number of times one person was tested), and some states report test results in more than one of these ways. On our data page and on each individual state’s data page, we list all three of these ways a state might be reporting the total number of tests conducted in its jurisdiction.
We have written about these issues in depth in our articles “Test Positivity in the US is a Mess” and “Counting COVID-19 Tests: How States Do It, How We Do It, and What’s Changing.” The best source of information about your own county or state’s method of deduplicating and reporting tests is your own county or state public health department.
“Test encounters” or “testing encounters” measures the number of people who have been tested in a single day. Though the phrase is probably unfamiliar, its definition just describes the way we talk about how many times people have been “tested for COVID-19” in everyday life. If a person was tested once every week for a month, she would likely say she had been tested four times. Those four occasions on which she was tested are four “testing encounters.” For more information, see our full data definition for testing encounters and our blog post on testing encounters.
We track data for the District of Columbia as well as for US territories including American Samoa, Guam, the Mariana Islands, Puerto Rico, and the US Virgin Islands. We try to say “states and territories” everywhere that it’s appropriate, but sometimes we might use the short term “states” when we mean “states, territories, and the District of Columbia.”
On January 13, 2021, we removed all state-level “recovered” metrics from our website and shifted to reporting only “hospital discharges” for the eight states that report it. After a comprehensive review of the 56 states and territories we track, we determined that most “recovered” metrics are estimates rather than precise figures, that no two states define and calculate “recovered” in the same way, and that “recovered” metrics are often based on guidelines about whether a person is infectious rather than follow-up investigation with the patient to confirm that they have returned to health. Read more about this decision in our article on the difficulty of counting recoveries
Why are you removing values from the API field negative from various states starting on January 27, 2021?
As part of our larger project of moving to reporting explicit totals for all states, we are also removing negatives that were created from mixed units (specimens minus cases or test encounters minus cases) for states that are using explicit totals in our main total test results field, called totalTestResults in the API. (Check out the above FAQ entry and blog post for more information about changes in our totalTestResults).
Before these states provided full histories of explicit totals, we were using positive plus negative (following early reporting practices of many states) to produce total test counts in order to get a full time-series. When states stopped reporting negatives directly, we computed them by subtracting the cases from the totals, so that positive+negative would equal the new explicitly reported values. In some cases, this led to mixing units in the negative field. Now that these states have provided full histories of their total tests, we have switched them away from positive plus negative for total test results and can remove these mixed unit values.
We are starting with AK, CA, DC, GA, KY, NY, OH, OR, TX, VA and WA on January 27, 2021, and we will continue to remove any negatives mixing units as we switch states over to explicit total test figures.
We have been keeping a close eye on HHS hospitalization data throughout the pandemic. We believe that comparing the data we collect from states and territories to the federal data the HHS collects from hospitals and states should increase public faith in both datasets, since the values these datasets report are very similar when different data definitions and reporting times are taken into account.
To facilitate this comparison, on December 8th, 2020, we introduced a new “card” on our national and state data pages that includes figures for Now hospitalized (confirmed + suspected), Now hospitalized (confirmed only), and Now in ICU (confirmed + suspected) from the HHS dataset “COVID-19 Reported Patient Impact and Hospital Capacity by State.”
Note that the hospitalization data we collect from states and territories includes suspected cases and pediatric cases of COVID-19 when available, but in many instances the jurisdiction either does not report this information or does not make clear whether its hospitalization data includes suspected and pediatric cases. Both Now hospitalized HHS metrics include both adult and pediatric COVID-19 patients, while the Now in ICU HHS metric includes only adult COVID-19 patients.
The HHS hospital data on our site is updated when a new version of the HHS dataset is published. The data is not available in our API nor in our downloadable CSVs, but it is available in both forms from healthdata.gov along with other COVID-19 datasets. Read more about this data from HHS in our article “What We’ve Learned About the HHS’s Hospitalization Data.”
Not all states and territories report the number of Recovered COVID-19 patients or the number of COVID-19 patients discharged from the hospital, and large states like Florida, California, and Washington are among those who do not report Recovered. Therefore, adding only the available state and territory figures together to get a national Recovered total results in a significant undercount of the true national number of people who have survived COVID-19.
Furthermore, Recovered is a particularly non-standardized metric at the state level, since there is no official definition of clinical recovery from COVID-19: the CDC only gives guidance about when COVID-19 patients are no longer infectious and can therefore be released from isolation. Some states use the CDC definition for COVID-19 patients released from isolation to estimate recoveries, while some states define recovery in their own way, often considering COVID-19 patients who have not died within a certain interval of time after infection as “recovered,” no matter what their state of health: these Recovered figures therefore include people who suffer medium-term or long-term disability caused by COVID-19. Some states do not give any indication how they define or calculate recoveries.
To avoid confusion, as of November 16, 2020 we have stopped displaying the national value for Recovered COVID-19 patients on our website, and the metric will be deprecated in and then removed from the API in December. When a more standard definition of Recovered is adopted nationally by a critical mass of states, we will restore the figure.
Why have you stopped reporting national cumulative hospitalizations, ICU, and ventilation numbers on your website?
Only about two-thirds of states and territories report data for Cumulative hospitalized/Ever hospitalized, and even fewer states report data for Cumulative in ICU/Ever in ICU and Cumulative on ventilator/Ever hospitalized. Therefore, adding these state and territory figures together to get a national count (as we do for other COVID-19 metrics with complete reporting such as cases and tests) drastically undercounts the true cumulative national number of COVID-19 patients who have ever been hospitalized, admitted to the the ICU, or placed on a ventilator.
This incomplete reporting can lead to a misleading national picture. For example, since more states report the number of people currently in the ICU or on a ventilator than report them cumulatively, the national numbers for individuals currently in the ICU or on a ventilator sometimes exceed the cumulative values.
To avoid confusion, as of November 16, 2020 we have therefore decided to stop displaying these national sums of cumulative hospitalization, ICU, and ventilated values on our website, although the fields remain available in our API. We will continue to ask states to report cumulative hospitalization figures and hope to restore the national sums to our website when a critical mass of states report them.
Since the COVID Tracking Project started in March, we have been collating and publishing our data in the form of a single Google Sheet. Our API and website both used that sheet to publish all our core dataset. As our data collection effort has matured, however, we have built new tools to improve our publishing process. All of our API and website data are now based on an improved publishing system that no longer uses Google Sheets.
However, people have been using our public sheets to import our data in ways that were never intended. We only support pulling data through our API. Supporting users whose applications broke because we changed the public sheet has had a significant impact on our support teams.
We encourage anyone who is using the public sheet for importing data to switch to our API, or import the CSV files available from our download page. As of October 26, 2020, we have removed the “Spreadsheet” button on the data page. As of November 28, 2020, the sheet will be static and no longer get new rows or columns, and on December 24, 2020, it will be taken offline.
On Thursday, September 10, 2020, we removed the table of historical data on our state data pages (for example, the data page for Alaska). This table included screenshots, new tests, cases, negative test results, pending test results, hospitalized, deaths, and total test results. We removed this table because reporting “Negative” and “Total” test results so simply was misleading, given the multiple ways that states report test results, and given our legacy practice of calculating a state’s total tests by adding its positive and negative test results.
In the original historical data table, the “Negative” test results and “Total” test results could sometimes refer to different data (people tested, specimens tested, or testing encounters). We did a great deal of work to make sure that we were reporting different test units accurately for our website redesign of August 25, and we have moved the old “Total” figure on the state’s data page to the state’s history page for the category of Viral (PCR) Tests, where it appears as “Total test results - legacy (positive + negative).”
We realize that the full history page was a convenient way to get a time series of data elements for a single state. As the datasets of individual states change and our knowledge of those datasets improves, however, it has become clear that COVID-19 test and outcome reporting is getting even more complex than it was to begin with. States (and we) have also begun reporting new kinds of tests. To present a complete and accurate historical time series for a state’s data on our website would require more columns than a single web page could comfortably contain and would ultimately be a disservice to our users.
All historical data for every metric for every state is still available, and we are in fact providing more historical data on the web than we did previously.
We encourage you to get a state’s historical data in any or all of the following ways:
- Use our full-history pages for each data category, viewable by clicking “Historical data” in any data category on a state page. The list of screenshots for each state’s data sources is also available from that state’s page.
- Download the full CSV data for a state to build your own charts or do your own analysis.
- Use our API if you have automated, daily tasks that need to process our data.
Since August 13, we have been preparing the API field totalTestResults on a state-by-state level to prefer units of testing encounters and specimens over our legacy calculation of summing states’ positive and negative figures. (You can read more about the motivation behind this policy change at the question directly above or here). As of September 17, we had been able to make that switch for four states: Colorado, Massachusetts, North Dakota, and Rhode Island, and the current list of included states is available in the “API Changes” section of our total tests documentation page, and each state page is also annotated on its data page. Future changes will immediately affect the US totalTestResults API field.
However, we did not immediately change the national totalTestResults field of our API, which continued to use positive+negative until September 17. This changeover resulted in a cumulative increase of 2,136,206 US tests. These tests are distributed over the entire time series back to March, so the daily difference is smaller, comprising 57,400 additional tests (about an 8% increase) on September 17.
These upticks are expected when we prioritize counting total tests in units of specimens and test encounters, which include repeat testing, over positive+negative, which usually does not. All four states whose totalTestResults we have switched used to reflect unique people, explaining the large cumulative difference.
Please do not use the posNeg field on the national level. It has been deprecated and zeroed out since we are switching away from using positive + negative to calculate totalTestResults.
As of August 13, 2020, we have made and will continue to make a number of changes to our state-level “total test results” metric to clarify it and to make it more useful for gauging state and national testing capacity.
Lacking federal data standards, states and territories have been reporting test results in different ways, using different units, and often with unclear definitions and documentation. Most commonly, states chose to report “total tests” either in units of “specimens” (e.g., number of nasal swabs processed by a laboratory, even if a single person provided more than one swab) or in units of “people” (individuals tested for COVID-19). In many cases states have not made clear exactly how they are counting “tests” at all.
Given the substantial lack of clarity and consistency in total test results definitions between states, The COVID Tracking Project created the “Total Test Results,” totalTestsResults field in our API, to assemble a national number, operating by a simple principle to fill it: we took whatever we could get. In the early months of our work, since we preferred to report in units of “people” rather than specimens, this usually meant summing a state’s figures for individuals receiving positive and negative results, because even when states directly provided a figure for total tests, that figure was often unclear or in units of “specimens.”
While we are still far from having a national data standard on how to count tests, most states have clarified their definitions enough that we can start switching states from using calculated positive+negative totals to using explicitly reported “total tests” figures in our main totalTestResults API field and our Total Test Results figures on our website. To support this change, we are launching a new policy about which units of total tests we prioritize in that column and are making it more evident which ones we are using in each state. The current list of included states is available in the “API Changes” section of our total tests documentation page, and each state page is also annotated on its data page.
Because we are rolling out these modifications gradually, you will see some movement in state/territorial totals for the totalTestResults API field and the Total Test Results figures on each state page. We will keep you posted whenever we make a change to the way we count a state’s test figures—each state and territory has its own page on our site, linked to from the main Our Data page, and our notes for these changes will appear that page, on each state page, in our notes for the state, in our API and relevant CSVs, as well as in a forthcoming central repository for everything we know about state and territorial test units.
Read more about the context of these changes here.
Test positivity, also called the “percent positive rate” or “positivity rate,” can change dramatically depending on which total test metric is used as the denominator. Until every state reports their most basic COVID-19 data in the same way, direct test positivity comparisons across states remain an intractable problem: read more about these issues in our blog post on the subject. The COVID Tracking Project does not currently calculate test positivity rates and will not do so until we are confident in our ability to communicate precisely about these complex issues. We urge caution when relying on any (governmental or non-governmental) test positivity calculation that does not transparently and prominently address the question of inconsistencies across jurisdictions. We particularly and emphatically recommend against an over-reliance on test positivity calculations to justify changes in public health responses or policies.
We do not plan to report COVID-19 vaccination data, but we will be keeping a close eye on how states define and report vaccination metrics and will be maintaining internal logs and annotations about interesting features of this data. The CDC is reporting vaccination distribution and adminstration data at https://covid.cdc.gov/covid-data-tracker/#vaccinations and Bloomberg has launched a COVID-19 Vaccine Tracker at https://www.bloomberg.com/graphics/covid-vaccine-tracker-global-distribution/.
We do not currently have plans to collect data at the county level, both because we do not have the resources to do so manually and because Johns Hopkins, The New York Times, and USAFacts.org are collecting county-level data automatically for cases and deaths.
We had planned to track COVID-19 data by sex, but before we could muster the effort, the GenderSci Lab at Harvard published the US Gender/Sex COVID-19 Data Tracker, which “reports up-to-date and historical gender/sex-disaggregated data on COVID-19 cases and fatalities for 50 US States and 2 US Territories.”
Unfortunately, age is a complicated problem for us, because the states group ages in incompatible ranges: one state might report ages 29-39 as a group, while another reports 25-35, and a third reports 30-45. Because of this non-standardized reporting, age data is therefore very difficult to provide as a national set of metrics. Age data for COVID-19 hospitalizations can be found in the CDC’s weekly COVID-NET summary.
We do not currently have plans to track COVID-19 data related to either K-12 schools or colleges. Some states have begun to report COVID-19 data by K-12 school district, including South Carolina and New York. The New York Times has launched a college COVID cases tracker](https://www.nytimes.com/interactive/2020/us/covid-college-cases-tracker.html). For additional information on COVID-19 in your area schools or colleges, check your local news sources, your city or county public health department, or your state public health authority.
Our State Grades currently rate only the completeness of the state’s data, not the accuracy of the state’s data. They also do not rate the state’s success in managing in the pandemic. More information about how exactly we determine state grades can be found here and here. This is, however, a common complaint about our state grades, and we are working on a new rating system that will take other factors into account.
What states are included in the regions you display on your charts? What population figures do you use for per capita charts?
Regions displayed in our charts are defined by the US Census. Population estimates used in per capita charts are also from the US Census: we use the American Community Survey 5-Year Estimates for 2019.
We do have tools that monitor, scrape, harvest, fetch, query, and otherwise capture data automatically, but because the 56 states and territories provide the dozens of data points we collect in so many different ways, and because they change and move and revise their systems and definitions for this data so continually, we rely on human intelligence first and technology second.
If you are a developer who is interested in volunteering with us, we ask that you learn how we collect data manually first (“dogfooding”) before working on our data tools.