Skip site navigation

In the past few months, multiple concerning variants of SARS-CoV-2, the virus that causes COVID-19, have emerged around the world. However, information surrounding variants, viral mutations, and the implications on public health has led to a lot of confusion and panic. In this post we seek to provide a back-to-basics explanation of SARS-CoV-2 variants, from naming conventions to possible vaccine implications.

What are the basics?

DNA is the blueprint for the proteins that make up all cells. Viruses, though they are not cells, may contain either DNA or RNA as their blueprint. This also determines the structure of their proteins. The RNA blueprint for SARS-CoV-2, its genome, is made up of about 30,000 bases, represented by the letters A, U, G, and C. Each group of three bases corresponds to one of 20 amino acids, and the sequence of amino acids determines the structure of a protein, and consequently its function.

How did the virus get its name?

The first genetic sequence of the virus that causes COVID-19 (which is shorthand for coronavirus disease first documented in 2019) was published in early January of 2020. That sequence was most similar to a coronavirus (CoV) that caused an outbreak of severe acute respiratory syndrome, or SARS, in 2002. Thus the new strain of coronavirus was dubbed SARS-CoV-2. 

What is a mutation?

When a virus replicates, it makes copies of its genome. Sometimes it makes mistakes that may insert, delete, or change one or more bases. This happens relatively frequently, especially with RNA viruses, although coronaviruses make fewer mistakes than other RNA viruses because they have the ability to proofread. A proofreading protein follows the enzyme that copies the viral RNA and replaces any bases that are incorrectly incorporated, like an editor reading over your first draft. 

Are all mutations bad news?

Not necessarily. Some mutations partially or completely inactivate a virus. Others don’t make any difference in how the virus functions. Those mutations can help scientists trace the spread of the virus, much like genetic analysis services can tell you where your ancestors came from. Mutations that can be bad news may make the virus more able to infect, multiply, spread, or evade the immune system. These viruses can become more widespread, outcompeting their predecessors. 

Are these the first important mutations we have seen?

No. In February 2020, a mutation in the spike gene called D614G was identified in Europe. This mutation became the dominant form of the gene worldwide by spring 2020. 

What do those letters and numbers in a mutation mean?

They refer to the position of the mutation within the protein, and which amino acids are involved. For example, in mutation S:D614G, there is a mutation in position 614 of the spike protein (S), where the amino acid aspartic acid (abbreviated as D) is replaced with glycine (abbreviated as G).

Why is the spike so important?

The spike is the club-shaped structure on the outside of the virus. It gives coronaviruses their characteristic “crown” appearance. It is the part that binds with the receptor on a host (i.e., human) cell and allows the virus to infect it. Antibodies to the spike can block infection, in a process called neutralization, and most vaccines are designed to elicit these neutralizing antibodies by training your immune system to recognize this particular part of the virus. 

What is a variant?

A variant is a version of the virus that possesses a set of mutations that differ from its parent virus in a way that makes it clinically or epidemiologically important, by causing more severe disease or being more easily transmitted, for example. Variants are identified by sequencing a virus’s genome—determining the exact order of all 30,000 bases. Through active genomic surveillance, scientists can identify variants that are becoming more prevalent or more dangerous, requiring further study and precautions. These variants are classified as “variants of concern.” 

What are the variants of most concern?

There are three important variants that possess the mutation N501Y, in which the amino acid asparagine in position 501 of the spike is replaced with tyrosine. It is very unusual for the exact same mutation to occur in three different variants in different parts of the world. Virologists consider this a sign that the mutation may be helping the virus evolve to have an advantage over other variants. This particular mutation causes a change in the part of the spike that interacts with the cell to initiate infection. It is believed to make that interaction stronger.

The CDC is currently tracking these three variants:

  • The variant B.1.1.7 is also called 501Y.V1 and was first identified in the UK. It has 23 mutations, eight of which occur in the spike. There is evidence that it has an enhanced ability to spread between people. 

  • Variant B.1.351 is also called 501Y.V2 and was first identified in South Africa. It has 21 mutations, nine of which are present in the spike. 

  • Variant P.1 is also called 501Y.V3 and was first identified in four people arriving in Japan from Brazil. It has 17 mutations, 10 of which are in the spike. 

Both B.1.351 and P.1 have mutations at positions 484 and 417 of the spike. Scientists continue to monitor these and other variants for differences in their ability to cause severe disease, evade the immune response, or spread in a population.

How do we name variants?

There are several ways of naming a variant, all of which include the goal of not naming the variant based on where it was first discovered. The WHO recommends avoiding this because it can be both stigmatizing and give the incorrect impression that where a variant is first identified is where it first emerged. But given that scientific names can be less descriptive and are less easily accessible, variants are often referred to as “[scientific name] variant, first discovered in [country].” For example, “the B.1.1.7 variant, first identified in the UK.”

Early in its study, B.1.1.7 was given the name VOC 202012/01, which breaks down to “Variant of Concern, found on 12/01/2020.” Sometimes you may also see VUI as a shorthand, which stands for “Variant Under Investigation.”

Alternative names like N501Y.V1 are based on a particular mutation in the variant (N501Y) and the order in which they were found, so .V1 would be the first notable appearance of a variant with this mutation.

Names like B.1.1.7, B.1.135, and P1 are based on the variant’s lineage, or where the variant is found in its family tree. The use of these names is more common after it has become clear these variants are going to be important on a scientific or epidemiological basis. 

What is the danger of variants?

Aside from possible increases in transmissibility and virulence, scientists are monitoring the variants for other new challenges. Variations in viral proteins, especially the spike, could in theory cause false negatives in diagnostic tests. However, most PCR tests detect areas of the genome that remain constant, and some detect multiple regions of the genome, making it the gold standard for diagnostic testing. Variants can theoretically reduce the efficacy of vaccines and treatments or increase the possibility of reinfection. However, it is important to note that so far none of the variants can completely escape our own immune defenses or those produced by current treatments and vaccines. Immunity isn’t binary—a slightly less effective vaccine that still protects very well against severe disease and death is still an excellent vaccine. Vigilant monitoring of the variants will allow scientists to stay one step ahead of the virus as it continues to evolve, just as it is expected to do. 

What can be done to avoid these problems?

  • The B.1.1.7 variant coincidentally had a mutation that gave some PCR-based tests a means of following its spread. This will likely not be the case with other variants. Increased genomic surveillance by sequencing viral genomes instead of traditional detection by PCR or antigen testing will be needed to detect and track the spread of variant viruses in populations. 

  • The antibodies used in treatment need to be evaluated for their ability to neutralize variants, which may indicate how effectively they will work in patients. 

  • Vaccines could also be evaluated and adjusted so that they are effective against new variants that may emerge. Vaccines based on mRNA, like the Moderna and Pfizer vaccines in current use, are relatively easy to tweak, and both companies are exploring whether booster shots for variants can be effective. Booster shots are like “software updates” to the vaccine that could allow for longer term protective immunity against these new variants.

  • The most effective way to reduce the emergence and spread of variants is to reduce transmission. A virus can’t mutate if it is not replicating, and it can’t replicate without a host. Doubling down on masking (with more consistent use and better masks), distancing, and avoidance of travel and gatherings is the best way to reduce transmission.

Where do we stand?

Antibodies against the spike are not the only weapon in our immune system’s arsenal. Even if antibody levels wane, or if they are less effective against variants, other elements of the innate and adaptive immune systems can fight the virus. T-cells are an important antiviral defense, and they are not affected as much by viral mutations. Memory cells are also formed after infection or vaccination and they can act quickly to ramp up defenses if the virus is encountered again. These other elements of the immune system may help explain how vaccines seem to continue to protect against severe disease in trials even when they are less effective against mild-moderate disease.

The vaccines in use in the US elicit very strong immune responses, and even if the antibodies are less effective, they are still strong. Vaccine producers are already exploring ways to make their products more effective against variants—and if the virus continues to evolve in a meaningful way, so can the vaccines. 

We’ll be providing updates as the situation and data evolves. Stay tuned for more analysis on what the variants mean in daily COVID-19 data.


Jacqueline Houtman has a PhD in Medical Microbiology and Immunology and works as a freelance biomedical writer.


Lindsey Shultz is a physician and public health analyst from Pittsburgh, PA.


Tong Wang is an MD-PhD candidate at the University of Pennsylvania.


Latest posts

20,000 Hours of Data Entry: Why We Didn’t Automate Our Data Collection

Looking back on a year of collecting COVID-19 data, here’s a summary of the tools we automated to make our data entry smoother and why we ultimately relied on manual data collection.

By Jonathan GilmourMay 28, 2021

A Wrap-Up: The Five Major Metrics of COVID-19 Data

As The COVID Tracking Project comes to a close, here’s a summary of how states reported data on the five major COVID-19 metrics we tracked—tests, cases, deaths, hospitalizations, and recoveries—and how reporting complexities shaped the data.

How Probable Cases Changed Through the COVID-19 Pandemic

When analyzing COVID-19 data, confirmed case counts are obvious to study. But don’t overlook probable cases—and the varying, evolving ways that states have defined them.