top of page
Search

The recency and geographical origins of the bat viruses ancestral to SARS-CoV and SARS-CoV-2

The emergence of SARS-CoV in 2002 and SARS-CoV-2 in 2019 led to increased sampling of sarbecoviruses circulating in horseshoe bats. Employing phylogenetic inference while accounting for recombination of bat sarbecoviruses, we find that the closest-inferred bat virus ancestors of SARS-CoV and SARS-CoV-2 existed less than a decade prior to their emergence in humans. Phylogeographic analyses show bat sarbecoviruses traveled at rates approximating their horseshoe bat hosts and circulated in Asia for millennia. We find that the direct ancestors of SARS-CoV and SARS-CoV-2 are unlikely to have reached their respective sites of emergence via dispersal in the bat reservoir alone, supporting interactions with intermediate hosts through wildlife trade playing a role in zoonotic spillover. These results can guide future sampling efforts and demonstrate that viral genomic regions extremely closely related to SARS-CoV and SARS-CoV-2 were circulating in horseshoe bats, confirming their importance as the reservoir species for SARS viruses. 


Introduction

Horseshoe bats (Rhinolophus spp.) are the main hosts of the Sarbecovirus subgenus1 (genus Betacoronavirus, family Coronaviridae) and, likely via virus transmission through transient intermediate species, were the source of SARS-CoV (referred to hereafter as SARS-CoV-1, now extinct) and SARS-CoV-2.2,3,4 Since the first emergence of SARS-CoV-1 in Guangzhou, Guangdong Province, in 2002 and SARS-CoV-2 (jointly referred to as the SARS-CoVs) in Wuhan, Hubei Province, in 2019, the sampling and analyses of sarbecoviruses in horseshoe bats has contributed to our understanding of sarbecovirus diversity and geographical spread. Although sarbecoviruses exhibit substantial geographic structuring linked to the ranges of their host horseshoe bat species,2,3,4,5,6 where and when their ancestors circulated, both in recent and ancient history,7 is poorly understood.

Genome-wide sequence identity is typically used to compare bat sarbecoviruses with the SARS-CoVs, but because coronaviruses frequently recombine,5,6 whole-genome identity of these bat viruses and the SARS-CoVs does not adequately reflect their complex evolutionary histories (Figure S1A). Previous attempts to date the most recent common ancestor (MRCA) of SARS-CoV-2 and a closely related bat virus genome using long genomic fragments resulted in estimates of 30–40 years before 2019.4,6,8 However, when analyzing all non-recombinant regions (NRRs), we will inevitably find some NRRs with older and some with younger MRCAs than either this estimate from a long fragment or an estimate based on the entire genome, which would effectively be a weighted average of NRR time of the MRCAs (tMRCAs). To find a bat virus genome that has not undergone recombination relative to either of the SARS-CoVs, it would likely have been necessary to identify the sarbecovirus involved in each cross-species transmission event leading to the SARS-CoVs (which we refer to as the “direct ancestor”; Figure S1B) or a very closely related virus from a bat in the same ecological community infected around the time of the transmission event. What we have, instead, is a relatively small sample of bat sarbecovirus genome sequences, gathered across different times and places. Hence, it is necessary to identify all the NRRs of sarbecoviruses—genomic regions within which there is little to no detectable signal of recombination among the analyzed genomes and representing, to the extent possible, a single evolutionary history. These genomic regions provide the clearest insights into the evolutionary history of SARS-CoV-1 and SARS-CoV-2 and the origins of their respective outbreaks.

Here, we separately analyze the recombination patterns and respective evolutionary histories of SARS-CoV-1 and its closely related viruses (referred to as SARS-CoV-1-like viruses) and SARS-CoV-2 and its closely related viruses (referred to as SARS-CoV-2-like viruses). We estimate the separate sets of NRRs for the SARS-CoV-1-like and SARS-CoV-2-like viruses. For each NRR, we utilize a molecular clock rate specific for that portion of the genome and determine the MRCA of the SARS-CoV and the most closely related bat virus, or set of viruses, among currently sampled sarbecoviruses for the NRR. We refer to this MRCA as the “closest-inferred ancestor” for that NRR because it is the parent node of the SARS-CoV in the phylogeny (Figure S1B) and is limited by the current sample set of bat sarbecoviruses. Additionally, the dating of deeper nodes in the phylogeny is complicated by substitution saturation: the repeated occurrence of nucleotide substitutions at the same site resulting in biased estimation of older divergence in viral phylogenies.9,10 We therefore account for substitution saturation in our NRR-specific inferences when examining the deeper evolutionary history of sarbecoviruses.


Read more (free article, open access):



 
 
 

Comments


Subscribe Form

Thanks for submitting!

©2020 by Mostly Microbes and Infectious Diseases. Proudly created with Wix.com

bottom of page