Skip site navigation

This past fall, the Centers for Disease Control and Prevention asked every US state and territory to sign a contract, agreeing to share wide-ranging vaccine data and records with the federal government. Several states raised concerns, but every one signed. 

But it turns out that most states ended up withholding the names, addresses, ZIP codes, and dates of birth of those vaccinated, and at least seven states go a step further and redact race and ethnicity data from the federal government or don’t collect it in the first place.

As part of a collaboration with the Documenting COVID-19 project at Columbia University’s Brown Institute for Media Innovation and The COVID Tracking Project at The Atlantic, we sent written requests to all 50 states, the District of Columbia, and five territories for the data-sharing contracts they’ve signed with the federal government, along with the data fields they withhold. 

So far, we’ve obtained data-use agreements from 36 states, the District of Columbia, the U.S. Virgin Islands, and Guam. Based on the agreements we’ve seen, most states and territories understandably withhold select pieces of information from the CDC, including the names of those vaccinated, their addresses, and their dates of birth. Personally-identifiable information—like names and ZIP codes—are routinely “de-identified” or scrubbed from many kinds of government data portals before being shared. 

But a handful of states—including California, North Dakota, Texas, and Vermontredact ethnicity and race information from the already de-identified vaccine data they share with the federal government, citing state laws designed to protect patient privacy. California offers an opt-out clause at the point of vaccination for those who don’t want their information shared with the CDC, though it remains unclear how much information is still passed on to the CDC when someone opts out. The California Department of Public Health has not responded to our requests for clarification on this point, or on their vaccine data policies.

Several other states, including Alabama, Minnesota, and New York, did not respond to our questions or our FOIA filings. Others, like Montana and Texas, have opt-in requirements for patient health information to be disclosed, creating a significant data gap for those states. Idaho doesn’t collect the race and ethnicity of people vaccinated in the first place and, as a result, doesn’t have much data to provide to the CDC.

In Connecticut, some hospitals don’t fill out vaccine intake forms in order to process billing forms more quickly. (In response to our questions, the Connecticut Department of Public Health released a statement and data file last week, acknowledging gaps and under-reporting of race and ethnicity information but noting that “data quality issues are not unique to Connecticut.”) 

The effect of these—and likely other—obstacles to complete reporting are substantial: The CDC’s COVID Data Tracker reports that just 53.4 percent of vaccination records provided to the federal government to date include race and ethnicity information. (The CDC has released one report on the demographics of vaccine recipients and noted gaps in demographic data; another report is forthcoming.)

The rationale for withholding data

“There is no precedent for the federal government capturing this information,” said Rebecca Coyle, executive director of the American Immunization Registry Association, which represents state immunology departments. “But the more barriers you have, the harder it is to get data on what’s happening and you can miss a chunk of people that are important to making decisions about the vaccine.” 

A major barrier to the CDC’s data collection efforts is a lack of regulatory power and enforcement needed to make race and ethnicity data sharing a requirement for states, said Alyssa Hundrup, acting director of health care for the US Government Accountability Office. That power would have to come through congressional legislation.

“Unfortunately, [the CDC] stated that they feel like they don't have the authority to require that states and jurisdictions report this information,” Hundrup said. “I think we all agree that there is a need for more data and the states and jurisdictions are inconsistent in terms of what they are reporting back.” But the real-world impact is clear, the CDC stated in response to our request for comment, noting that more detailed, “patient-level” data “could lead to bigger, and more impactful insights.”

And it’s not just a lack of enforcement that makes good data collection difficult. Many states have a long-simmering distrust of federal mandates and data repositories, which have been the subject of several well-publicized hacks. And collecting accurate data isn’t easy or cheap, especially as states struggle with limited resources and shrinking budgets. “Electronic health records, at the end of the day, have to be collected by a human being and it can be deceptively burdensome,” said Dr. Shaun Grannis, a professor of family medicine at the Indiana University School of Medicine and a data scientist who advises the CDC on vaccine and COVID-19 data-sharing.

Missing data in high-population states

The problem is especially acute in the country’s largest two states, California and Texas, where vaccine-related ethnicity and racial data isn’t being shared with the federal government—and where even incomplete demographic information associated with other COVID-19 metrics points to the presence of racial and ethnic disparities. 

Data from the COVID Racial Data Tracker shows that, in California, Latinx people account for 55 percent of COVID-19 cases and 46 percent of COVID-19 deaths where race or ethnicity is known—while making up only 39 percent of the state’s population. Likewise, in Texas, Latinx people account for 47 percent of COVID-19 deaths where race or ethnicity is known, while making up only 39 percent of the state’s population. (Texas has published race and ethnicity data for just 3 percent of all COVID-19 cases in the state.)

Texas is providing race and ethnicity data for vaccinations on its own state health department’s dashboard—but isn’t sharing it with the federal government. Experts say it is unsurprising that Texas does not choose to share this data. The state has the “spirit of protecting privacy and liberty,” said Dr. Lauren Ancel Meyers, director of the UT Austin Covid-19 Modeling Consortium who has worked with the state health department.

When asked about its data gaps, the Texas Department of State Health Services said in a statement, without elaborating, that “Texas law prevents us from sharing individual records, so we share aggregate data.”

Aside from personally identifiable data, ZIP codes, and race and ethnicity, California even redacts the county where a vaccine recipient lives when the county’s population is less than 20,000. Compared to other states, California is one of the most restrictive in redacting personal and health info from public records, relying on its 1977 state civil code related to the disclosure of personal information. In its contract with the CDC, California cites its 68-page Health and Human Services Agency Data De-Identification Guidelines, which include a detailed scoring rubric for the release of personal data and note that “record level data inherently has higher risk than summarized data, even after personal identifiers are removed.” 

If someone had access to a person's entire health care record, even with personal identifiers removed, it could still lead to re-identification, said Dr. Rita Hamad, a family physician and director of the University of California of San Francisco’s Social Policies for Health Equity Research Program. But with vaccine data, there are enough people in each racial and ethnic category that disclosing limited demographic information would not pose a risk of re-identification—and the state is not sharing entire medical health records.

“It’s not justifiable” to withhold limited demographic data, Hamad said. 

California released statewide vaccine demographic data last week, but the state’s top epidemiologist, Dr. Erica Pan, said in a vaccine committee meeting the first week of February that “there is a lot of missing data.”

For federal public agencies, restricted data-sharing by states leads to data gaps about how many Black, Latinx, and Indigenous people—people whose communities have been disproportionately harmed by the pandemic to date—are actually receiving vaccines. A recent analysis by The COVID Tracking Project found that public availability of state-published data on the race and ethnicity of vaccine recipients is spotty at best.

Demographic data is often omitted before it ever reaches states

Pharmacies deal with different legal requirements for data sharing compared to doctors' offices, making them more likely to opt out of reporting demographic data on vaccine recipients, even in states with a robust immunization registry capable of documenting race and ethnicity. 

“There needs to be a real push for hospitals and pharmacies to compile the information because we’re hearing about data collection issues from New York City to Florida to Texas to Colorado,” said Claire Hannan, executive director of the Association of Immunization Managers and a longtime Capitol Hill veteran. “The challenge is you have to require more from the sites and encourage them to fill out the required fields.”

In Illinois, the state health department sent an internal alert on Feb. 5, encouraging vaccine providers to report race and ethnicity as it is “critical to not only inform vaccine administration but also to ensure the COVID-19 vaccine is getting into the arms of Illinois’ most vulnerable populations.” In New Jersey, some vaccine providers say state health officials have at times threatened to withhold doses unless the providers fully fill out patient data.

The CDC’s goal to collect that information nationwide is a good thing, the University of California’s Dr. Rita Hamad said: “The first step in understanding the roots of health disparities and health inequities and how to fix them is having enough data to do so.”

For this project, we’ve filed 34 open-records requests across the country and several are still pending. If you have any questions or tips, email us at [email protected].


Caitlin Antonios is a California-based investigative journalist who graduated from Columbia University’s Stabile Program in 2020. She has been working with the Documenting COVID-19 project since September 2020 and is currently reporting on California’s vaccine distribution as a USC Annenberg’s Center for Health Journalism Fellow.

Mohar Chatterjee joined the Documenting COVID-19 project in February 2021 and is a dual-degree Journalism and Computer Science graduate student at Columbia University.

Georgia Gee is a London-based investigative journalist who graduated from Columbia Journalism School’s Stabile program in 2020 and has worked with the Documenting COVID-19 project since June 2020.

Derek Kravitz is a New York-based investigative journalist who started the Documenting COVID-19 project as part of a grant through the Brown Institute for Media Innovation in April 2020 and teaches at Columbia Journalism School’s Stabile Center for Investigative Journalism.

Kyra Senese is a Chicago-based investigative journalist who graduated from Columbia Journalism School’s Stabile program in 2020 and has worked with the Documenting COVID-19 project since June 2020.


More “Vaccination Data” posts

State-Level Vaccine Demographic Data is Messy and Incomplete—We Need Federal Data, Now

Only a third of states and territories with public vaccine data share information on the race and ethnicity of vaccine recipients, and those that do share it do so in highly unstandardized ways. But data from the federal government could answer the question of who’s getting vaccinated.

By Alice Goldfarb & Kara W. SchechtmanJanuary 15, 2021

How We Hope Vaccines Will Be Tracked

The federal government seems poised to provide high-quality data on vaccinations, but even a minimal dataset must answer key questions about who is getting vaccinated.

By Alexis Madrigal & Kara W. SchechtmanDecember 22, 2020

Vaccine Distribution Data in Long-Term-Care Facilities Needs to Be Public

For the last month, the public has had minimal visibility into the roll out of COVID-19 vaccines to long-term-care facilities. Last week, South Carolina published the names of nursing homes and other long-term-care facilities where residents and staff have been vaccinated. States—and the CDC—should follow suit.