Science’s COVID-19 reporting is supported by the Heising-Simons Foundation.
In a world starved for any fresh data to help clarify the origin of the COVID-19 pandemic, a study claiming to have unearthed early sequences of SARS-CoV-2 that were deliberately hidden was bound to ignite a sizzling debate. The unreviewed paper, by evolutionary biologist Jesse Bloom of the Fred Hutchinson Cancer Research Center, asserts that a team of Chinese researchers sampled viruses from some of the earliest COVID-19 patients in Wuhan, China, posted the viral sequences to a widely used U.S. database, and then a few months later had the genetic information removed to “obscure their existence.”
To some scientists, the claims reinforce suspicions that China has something to hide about the origins of the pandemic. But critics of the preprint, posted yesterday on bioRxiv, say Bloom’s detective work is much ado about nothing, because the Chinese scientists later published the viral information in a different form, and the recovered sequences add little to what’s know about SARS-CoV-2’s origins.
The sequences, Bloom says, do support other evidence that the pandemic did not originate in Wuhan’s Huanan Seafood Market, where SARS-CoV-2 initially came to light. Chinese health officials on 31 December 2019 tied the market to an outbreak of an “unexplained pneumonia,” but a month later, it had become clear that many of the earliest cases had no link to the location. The paper highlights three mutations found in SARS-CoV-2 collected from patients linked to the market that are not in the unearthed sequences of the coronavirus or its closest relative, which researchers from the Wuhan Institute of Virology discovered in bats in 2013.
Bloom’s more explosive assertion, that the Chinese researchers deleted data, is bound to intensify the debate about whether the virus originally jumped to humans from an unknown animal or somehow leaked from a laboratory. Bloom says he has no bias toward a particular origin hypothesis for SARS-CoV-2, and he agrees that the viral sequences he highlighted are a small piece of a large unfinished puzzle. “I don’t think this bolsters either the lab origin or zoonosis hypothesis,” he says. “I think it provides additional evidence that this virus was probably circulating in Wuhan before December, certainly, and that probably, we have a less than complete picture of the sequences of the early viruses.”
Bloom, who studies viral evolution, launched his study after a controversial report on the pandemic’s origin issued in March by a joint commission of Chinese and foreign researchers organized by the World Health Organization (WHO). Bloom helped organize a much discussed letter, co-signed by 17 other scientists, that criticized the WHO report for deeming it “extremely unlikely” that SARS-CoV-2 escaped from a laboratory. In the letter, published on 14 May in Science, the authors argued for “a dispassionate science-based discourse on this difficult but important issue.”
The WHO report relied heavily on sequences of SARS-CoV-2 found in COVID-19 patients tied to the market, Bloom notes. “I was just going through and trying to repeat a number of the analyses in the joint WHO-China report,” Bloom says. This led him to a study that listed all SARS-CoV-2 sequences submitted before 31 March 2020 to the Sequence Read Archive (SRA), a database overseen by the National Center for Biotechnology Information, a division of the U.S. National Institutes of Health (NIH). But when he checked SRA for one of the listed projects, he couldn’t find its sequences.
Googling some of the project’s information, he found another study, led by Ming Wang from Wuhan University’s Renmin Hospital, that was posted as preprint on 6 March on medRxiv, and later published, on 24 June, in Small, a journal more focused on materials and chemistry than virology. That paper lists some of the earliest Wuhan COVID-19 patients and the specific mutations in their viruses, but doesn’t give the full sequence data. Further internet sleuthing led Bloom to discover that SRA backs up its information in Google’s Cloud platform, and a search there turned up files containing some of the Wang’s team earlier data submissions.
The paper in Small makes no mention of any corrections to viral sequences that might explain why they were removed from SRA, which led Bloom to conclude in his preprint that “the trusting structures of science have been abused to obscure sequences relevant to the early spread of SARS-CoV-2 in Wuhan.” Bloom asserts that because the deleted sequences lack the three mutations seen in the SARS-CoV-2 from the seafood market, the viruses Wang’s team found more likely represent a progenitor.
But the sequence of that bat virus found in 2013 differs from SARS-CoV-2 by about 1100 nucleotides, which means decades must have passed before it evolved into the pandemic coronavirus—and other species may well have been infected with the bat virus before it made the final jump into people. This great difference in sequences, says evolutionary biologist Andrew Rambaut at the University of Edinburgh, means researchers cannot use a few mutations like the ones Bloom highlights to look back in time to see the “roots” of the family tree of SARS-CoV-2 tree.
Bloom says he contacted the Chinese researchers to ask why they removed the SRA data, but they did not reply. (Science also received no reply after emailing the lead authors.) NIH issued a statement today saying it removed the sequences at the request of the submitting investigator, who the agency says holds the rights to the data. The scientist “indicated the sequence information had been updated, was being submitted to another database, and wanted the data removed from SRA to avoid version control issues,” NIH said (Bloom says he cannot find the sequences in any other virology database he knows.)
Researchers are sharply divided about the value of Bloom’s resurrection of the SRA data. “This is a creative and rigorous approach to investigating the provenance of SARS-CoV-2,” says Ian Lipkin, a microbiologist at Columbia University’s Mailman School of Public Health. “The two take-home points are that the virus was circulating before the outbreak linked to the Wuhan seafood market and that there may have been active suppression of epidemiological and sequence data needed to track its origin.”
Leaving aside the meaning of the sequences Bloom found, the demonstration that researchers can potentially find “new” data in the cloud is an exciting advance, adds Sudhir Kumar, who does genomics research at Temple University and has published his own analysis of early SARS-CoV-2 sequences, “Many people feel that there is a lot more Chinese data out there, and they don’t have access to it,” he says.
Others are underwhelmed. “Jesse is resurfacing info that’s been online for over a year and claiming it proves a cover-up,” says Stephen Goldstein, an evolutionary virologist at the University of Utah. “I don’t understand [his reasoning].” The Small paper is simply a good study that “unfortunately flew below the radar,” he adds.
Rambaut notes that the Chinese researchers submitted their Small paper before requesting SRA remove the data. “The idea that the group was trying to hide something is farcical,” Rambaut says. “If they were covering something [up] they surely would have not submitted the paper. … I don’t like the insinuations about malfeasance where [Bloom] has zero knowledge of the reasons the authors of the paper had for removing their data.”
A member of the WHO origin commission, Marion Koopmans from the Erasmus University Medical Center, notes that its report stresses the need to find more data about the earliest viruses in circulation. “It’s good to see additional data, but I’m not sure what point this makes,” Koopmans says, adding that the preprint’s accusations could harm future collaborations on origin studies with Chinese researchers. “The tone of the intro is in my view rather suggestive and I wish science would stay away from this.”
Bloom acknowledges that researchers can piece together the coronavirus sequences from the data found in the Small paper, but he says that’s not the way most in the field conduct evolutionary analyses of SARS-CoV-2. “No one knew about these sequences because the way that people find sequences is to go to the sequence databases and download the sequences and look at them,” Bloom says.
Stepping into the divisive discussion of SARS-CoV-2’s origin comes at a price, he acknowledges. “So many people have agendas and preconceived notions on this topic that if you open your mouth on the topic, someone’s going to take what you’ve said to support or reject some particular narrative,” he says. “So the choices are either not to say anything at all, which I don’t think is useful or productive, or just to try to draw the conclusions you can and make it as transparent as possible. No matter how much people like [my new study] or don’t like it, or agree with the interpretation or disagree with the interpretation, they can at least go download it and repeat it themselves.”