Table of contents
- Where do you get your data?
- Why doesn’t your data match what I see on the official COVID-19 page for my state?
- Why doesn’t your data match the data from the CDC, or Worldometer, or Johns Hopkins, or USAFacts.org, or The New York Times, or another site?
- Why don’t you harvest data automatically?
- Why don’t you report county-level data? Will you be doing so in the future?
- Why are there so many spikes in the data?
- Why have your “Total test results” numbers changed for a particular state?
- Why did your national “total test results” numbers change on September 17?
- What are test encounters?
- Why don’t you report historical data on the state pages any more?
- Will you still be able to report hospital data given the new HHS reporting requirements of July 15, 2020?
- Can you report COVID-19 data related to schools and/or colleges?
- Why aren’t you tracking age and sex?
- Why do you list 56 states?
- Why do you give X state such a good/bad grade?
- What states are included in the regions you display on your charts?
Have a question about our data? Ask us. Our small team of mostly volunteers can’t always reply to all questions, but we will do our best to get back to you and/or post answers to frequently asked questions on this page.
Almost all of the data we compile is taken directly from the websites of state/territory public health authorities. See our data sources page or flip to the “States” tab of our public spreadsheet, which includes constantly updated annotations about data-source changes.
There are several reasons why our tracker might show different data than your state’s COVID-19 page, even when we use that same page as a source:
- Date lag. We update the dataset by hand once a day and release the data between about 5:30pm and 7pm Eastern Time. If a state updates its data after our daily compilation, we won’t pick up the new information until the next day.
- Hidden data. In some cases, we retrieve data that states do not display on their public dashboards from data files that the state provides. This data is still public and still official, but might only be visible “behind the scenes” of a data dashboard or in an obscure corner of the state’s COVID-19 site. Missouri, for example, does not display the value for Total PCR tests (in specimens) on its dashboard, but the data is there, though not displayed, and we retrieve it with a machine query.
- Different data definitions. In the absence of national data standards, we might use the same name for a metric as your state but use a different definition. For instance, our case, death, and hospitalization metrics all include “probable” and “suspected” cases for states that report them, whereas your state might include only lab-confirmed cases in its official case count while reporting probable cases separately.
- Different ways of reporting “new” cases, tests, or deaths. Please note that our “new” values for cases, tests, deaths and other metrics are calculated as the increase in the total value reported by the state since yesterday. This way of calculating “new” data points is a function of how we collect data. For the most part, we enter data manually once each day by visiting the state’s official COVID-19 data sites, so we capture the data reported on that day. States themselves, however, frequently enter data into their systems for previous days. On a Friday, for example, a state might enter five cases into its system, one whose test result came back positive from the lab on Wednesday, two that came back on Thursday, and two that came back on Friday. In that example, the state might report “2 new cases” for that Friday and we might report “5 new cases” for that Friday.
Backfilled / backdated data. As explained above, we report data once each day on the date the state adds that data to its systems, whereas states themselves frequently “backfill” data, meaning that they enter data for previous days. By doing this, states can connect data points to pertinent dates such as the date a death occurred or the date a laboratory completed its analysis of a test. For instance, Florida’s state report includes a graph titled “COVID-19: cases and laboratory testing over time” whose numbers by date change frequently: on 9/9/20 the graph reported 2352 cases for 9/8/2020 and on 9/10/2020 the graph reported 2337 cases for 9/8/2020. Similarly, Rhode Island continually revises its historic values for “Cumulative people who tested positive” as they receive more results from laboratories, so our time series falls out of sync with the state’s time series.
We do sometimes backfill our own historic data when states provide us with a time series for a metric in a structured format. This work is tracked in one of our Github repositories.
See the notes associated with each state and territory for more information about the data for that state.
Why doesn’t your data match the data from the CDC, or Worldometer, or Johns Hopkins, or USAFacts.org, or The New York Times, or another site?
There are several reasons why different data trackers show different data:
- Manual capture vs. automatic capture. Our volunteers manually update our numbers by visiting state/territory public health websites once a day, annotating any changes to data sources or data anomalies as they go. Our volunteers are often retrieving data from sources such as PDFs and livestreams of press conferences that automated tools have not been engineered to capture.
- Time lag. When other trackers rely on automated tools to collect data from state/territory public health authorities, their counts tend to be updated more frequently than ours. We currently spend about three hours every afternoon collecting data, and we publish it only once each day.
Data sources other than states/territories. Other trackers retrieve data directly from sources other than the state/territory public health authorities we use for our dataset.
Many other trackers, including Johns Hopkins, USAFacts.org, and The New York Times, rely on county data rather than state data. While counties do report their data to the state, in practice the sum total of county data points can often differ from the totals the state reports, probably because the state normalizes county data to its own standards.
The CDC has direct access to other sources of data in addition to state public health authorities. For instance, as of August 25, 2020, the CDC reports on its COVID-19 testing tracker that “The data for each state are sourced from either data submitted directly by the state health department via COVID-19 electronic laboratory reporting (CELR), or a combination of commercial, public health, and in-house hospital laboratories.”
Different data definitions. States/territories define data points in inconsistent ways, and the various trackers deal with those inconsistent definitions differently. For example, “deaths” is treated very differently by various states and trackers, especially when it comes to “probable deaths,” which are not reported by all states or trackers.
The state of New York, for instance, has not been reporting “probable deaths” from COVID-19, whereas New York City reports thousands of probable deaths. Worldometer includes the NYC probables in its death counts, whereas The COVID Tracking Project does not. (Johns Hopkins also does not include the NYC probable deaths on its US map but does on its Global map.) When the state of New York includes these probable deaths in its reporting, we will include them in ours.
We do have tools that monitor, scrape, harvest, fetch, query, and otherwise capture data automatically, but because the 56 states and territories provide the dozens of data points we collect in so many different ways, and because they change and move and revise their systems and definitions for this data so continually, we rely on human intelligence first and technology second.
If you are a developer who is interested in volunteering with us, we ask that you learn how we collect data manually first (“dogfooding”) before working on our data tools.
We do not currently have plans to collect data at the county level, both because we do not have the resources to do so manually and because Johns Hopkins, The New York Times, and USAFacts.org are collecting county-level data automatically for cases and deaths.
There are very strong day of the week effects in this dataset. Testing and reporting activity slows down on weekends, and health care staff and public health officials tend to “catch up” with their data reporting on Mondays and Tuesdays, causing spikes in the numbers early in the week. Some states report some metrics once per week, which will cause a spike on the day of the week they report that metric.
There are also other occasions when a lab or a county “dumps” a great deal of data all at once on a particular day, which makes the state’s numbers for that day unusually large. We try to report all such unusual data spikes in the public notes on each state’s data page and on our Twitter feed.
In general, any date in our data should be understood to be defined as “the date on which data was collected by The COVID Tracking Project,” which is generally the date the state reported the data point. Analyzing this data with 7-day averages can help mitigate the effect of these reporting spikes.
As of August 13, 2020, we have made and will continue to make a number of changes to our state-level “total test results” metric to clarify it and to make it more useful for gauging state and national testing capacity.
Lacking federal data standards, states and territories have been reporting test results in different ways, using different units, and often with unclear definitions and documentation. Most commonly, states chose to report “total tests” either in units of “specimens” (e.g., number of nasal swabs processed by a laboratory, even if a single person provided more than one swab) or in units of “people” (individuals tested for COVID-19). In many cases states have not made clear exactly how they are counting “tests” at all.
Given the substantial lack of clarity and consistency in total test results definitions between states, The COVID Tracking Project created the “Total Test Results,” totalTestsResults field in our API, to assemble a national number, operating by a simple principle to fill it: we took whatever we could get. In the early months of our work, since we preferred to report in units of “people” rather than specimens, this usually meant summing a state’s figures for individuals receiving positive and negative results, because even when states directly provided a figure for total tests, that figure was often unclear or in units of “specimens.”
While we are still far from having a national data standard on how to count tests, most states have clarified their definitions enough that we can start switching states from using calculated positive+negative totals to using explicitly reported “total tests” figures in our main totalTestResults API field and our Total Test Results figures on our website. To support this change, we are launching a new policy about which units of total tests we prioritize in that column and are making it more evident which ones we are using in each state.
Because we are rolling out these modifications gradually, you will see some movement in state/territorial totals for the totalTestResults API field and the Total Test Results figures on each state page. We will keep you posted whenever we make a change to the way we count a state’s test figures—each state and territory has its own page on our site, linked to from the main Our Data page, and our notes for these changes will appear that page, on each state page, in our notes for the state, in our API and relevant CSVs, as well as in a forthcoming central repository for everything we know about state and territorial test units.
Read more about the context of these changes here.
Since August 13, we have been preparing the API field totalTestResults on a state-by-state level to prefer units of testing encounters and specimens over our legacy calculation of summing states’ positive and negative figures. (You can read more about the motivation behind this policy change at the question directly above or here). As of September 17, we had been able to make that switch for four states: Colorado, Massachusetts, North Dakota, and Rhode Island, and the current list of included states is available in the “API Changes” section of our total tests documentation page, and each state page is also annotated. Future changes will immediately affect the US totalTestResults API field.
However, we did not immediately change the national totalTestResults field of our API, which continued to use positive+negative until September 17. This changeover resulted in a cumulative increase of 2,136,206 US tests. These tests are distributed over the entire time series back to March, so the daily difference is smaller, comprising 57,400 additional tests (about an 8% increase) on September 17.
These upticks are expected when we prioritize counting total tests in units of specimens and test encounters, which include repeat testing, over positive+negative, which usually does not. All four states whose totalTestResults we have switched used to reflect unique people, explaining the large cumulative difference.
Please do not use the posNeg field on the national level. It has been deprecated and zeroed out since we are switching away from using positive + negative to calculate totalTestResults.
“Test encounters” or “testing encounters” measures the number of people who have been tested in a single day. Though the phrase is probably unfamiliar, its definition just describes the way we talk about how many times people have been “tested for COVID-19” in everyday life. If a person was tested once every week for a month, she would likely say she had been tested four times. Those four occasions on which she was tested are four “testing encounters.” For more information, see our full data definition for testing encounters and our blog post on testing encounters.
On Thursday, September 10, 2020, we removed the table of historical data on our state data pages (for example, the data page for Alaska). This table included screenshots, new tests, cases, negative test results, pending test results, hospitalized, deaths, and total test results. We removed this table because reporting “Negative” and “Total” test results so simply was misleading, given the multiple ways that states report test results, and given our legacy practice of calculating a state’s total tests by adding its positive and negative test results.
In the original historical data table, the “Negative” test results and “Total” test results could sometimes refer to different data (people tested, specimens tested, or testing encounters). We did a great deal of work to make sure that we were reporting different test units accurately for our website redesign of August 25, and we have moved the old “Total” figure on the state’s data page to the state’s history page for the category of Viral (PCR) Tests, where it appears as “Total test results - legacy (positive + negative).”
We realize that the full history page was a convenient way to get a time series of data elements for a single state. As the datasets of individual states change and our knowledge of those datasets improves, however, it has become clear that COVID-19 test and outcome reporting is getting even more complex than it was to begin with. States (and we) have also begun reporting new kinds of tests. To present a complete and accurate historical time series for a state’s data on our website would require more columns than a single web page could comfortably contain and would ultimately be a disservice to our users.
All historical data for every metric for every state is still available, and we are in fact providing more historical data on the web than we did previously.
We encourage you to get a state’s historical data in any or all of the following ways:
- Use our full-history pages for each data category, viewable by clicking “Historical data” in any data category on a state page. The list of screenshots for each state’s data sources is also available from that state’s page.
- Download the full CSV data for a state to build your own charts or do your own analysis.
- Use our API if you have automated, daily tasks that need to process our data.
Will you still be able to report hospital data given the new HHS reporting requirements of July 15, 2020?
We are following this issue closely. Hospitals are often required to report to states as well as to federal agencies, and we compile data from state public health agencies, so in most cases we believe we will still be able to compile current and cumulative hospitalization, ICU, and ventilator data. However, the state sources we rely on may be affected for states that get their hospitalization data from the CDC’s National Health Safety Network.
As of July 17, three states — Idaho, Missouri, and Wyoming —- have made statements about interruptions in their hospital data at the state level. The Missouri Hospital Association writes:
Please note, due to the abrupt change in data measures and the reporting platform issued by the White House on Monday, July 13, and effective Wednesday, July 15, MHA and the State of Missouri will be unable to access critical hospitalization data during the transition. While we are working to collect interim data, situational awareness will be limited. It is uncertain whether we will be able to produce all data included in this regional dashboard on Wednesday, July 22. We will resume producing the daily hospitalization snapshot and weekly regional dashboards as soon as data feeds are fully restored.
And a spokesperson for Idaho’s Department of Health and Welfare told the Idaho Statesman that the HHS directive “was issued abruptly and presents some significant challenges for Idaho to continue to monitor the number of hospitalizations in the state.” Idaho appears not to have updated its hospital data since July 16, when they posted data from July 13 (a standard delay for Idaho’s hospital information).
Wyoming has appended a note to its hospitalization data: “As part of the transition to a new HHS data system, we are experiencing data quality / reporting problems with this indicator and are working with hospitals to resolve.”
We anticipate seeing changes in the data throughout July as hospitals adjust to the new rules.
We do not currently have plans to track COVID data related to either K-12 schools or colleges. Some states have begun to report COVID data by K-12 school district, including South Carolina and New York. The New York Times has launched a college COVID cases tracker](https://www.nytimes.com/interactive/2020/us/covid-college-cases-tracker.html). For additional information on COVID-19 in your area schools or colleges, check your local news sources, your city or county public health department, or your state public health authority.
We had planned to track COVID-19 data by sex, but before we could muster the effort, the GenderSci Lab at Harvard published the US Gender/Sex COVID-19 Data Tracker, which “reports up-to-date and historical gender/sex-disaggregated data on COVID-19 cases and fatalities for 50 US States and 2 US Territories.”
Unfortunately, age is a complicated problem for us, because the states group ages in incompatible ranges: one state might report ages 29-39 as a group, while another reports 25-35, and a third reports 30-45. Because of this non-standardized reporting, age data is therefore very difficult to provide as a national set of metrics.
We track data for the District of Columbia as well as for US territories including American Samoa, Guam, the Mariana Islands, Puerto Rico, and the US Virgin Islands. We try to say “states and territories” everywhere that it’s appropriate, but sometimes we might use the short term “states” when we mean “states, territories, and the District of Columbia.”
Our State Grades currently rate only the completeness of the state’s data, not the accuracy of the state’s data. They also do not rate the state’s success in managing in the pandemic. More information about how exactly we determine state grades can be found here and here. This is, however, a common complaint about our state grades, and we are working on a new rating system that will take other factors into account.
When our charts display United States regions, they are using the region definitions set by the United States Census.