On a Chinese researcher's request, the United States National Institute of Health removed data regarding the sequences from coronavirus samples obtained in January and February 2020 in the Chinese city of Wuhan from individuals hospitalised with or suspected of having Covid-19 from its database.
The healthcare agency said that the scientist who submitted the sequences requested that data be erased in June 2020.
The United States National Institute of Health (NIH) has revealed that the organisation deleted gene sequences of early novel coronavirus cases from the key scientific database after receiving a request from a Chinese researcher who submitted the data. This new revelation has raised concern that scientists studying the Covid-19 pandemic may not have access to the key pieces of information.
Jesse Bloom, a virologist at the Fred Hutchinson Cancer Research Center in Seattle, has described the removal of the sequencing data in a new report published on 22 June in bioRxiv. The missing data include sequences from virus samples obtained in January and February 2020 in the Chinese city of Wuhan from individuals hospitalised with or suspected of having Covid-19.
According to The Wall Street Journal, Bloom said that some of the deleted data is still available in a paper published in a specialised journal, but scientists normally hunt for gene sequences in large databases like NIH's Sequence Read Archive. However, the virologist stated that he recovered the deleted information from Google Cloud. Their removal raises questions about China's transparency in the ongoing investigation into the pandemic's origins, according to Bloom. But researchers' existing understanding of the early weeks of the Covid-19 outbreak in Wuhan is unlikely to be altered by the missing sequences.
After facing obstructions by the Chinese authorities, a team of experts—representatives of the World Health Organization (WHO)—went to China and conducted their study to find out the origin of the virus. While they claimed that a lab leak is highly unlikely, the team failed to provide a conclusive answer. This led many experts to claim that the restrictions imposed by the Chinese Communist Party during this investigation may have jeopardised the entire effort to find the source of the pandemic.
Since there are still several questions about the Covid-19 origin, scientists require information that will help them understand how the virus entered the human population and propagated. But the lack of access to studies and removal of content from an extremely important database can make it more difficult for them to find it, potentially slowing down the research. Vaughn S. Cooper, a University of Pittsburgh evolutionary biologist who was not involved in the new paper and has not yet studied the deleted sequences, said: "It makes us wonder if there are other sequences like these that have been purged".
The Crucial Covid Data
As per The Wall Street Journal, the NIH said in a statement: "Submitting investigators hold the rights to their data and can request withdrawal of the data".
The healthcare agency said that the scientist who submitted the sequences requested that they be erased in June 2020 because they had been modified and were to be sent to an undisclosed database. The NIH also revealed that the sequences were first submitted to the agency's database in March 2020, and information about them was published in a report on a preprint server. The paper explained how SARS-CoV-2 was detected using modern sequencing technology.
Bloom stated that the lack of data from early cases in Wuhan is a barrier for experts examining the virus's origins. According to him, the sequences obtained in December 2019 are from a dozen patients associated with the Huanan Seafood Market—claimed to be the Covid-19 outbreak site—and a small number of sequences gathered before late January 2020.
According to Sergei Pond, a biology professor at the Temple University with expertise on the evolution of viral pathogens said: "If more sequences came to light, especially from early time points, or archival samples elsewhere, everything could change once again. I think this is likely to happen." Joel Wertheim, an evolutionary biologist at the University of California, San Diego, said that the deleted sequences are fragments and full genome sequences have traditionally been the most informative, while Stephen Goldstein, a University of Utah evolutionary virologist who had not made his own analysis of the sequences said that from a scientific perspective, "I don't think they point to anything nefarious".