Phylogenetic analysis of 48 early SARS-CoV-2 genomes

Main Article Content

Maggie Xiao
Dylan Whitney



Background: First isolated in December 2019, SARS CoV-2 is the agent responsible for the ongoing breakout of COVID-19.

Method: We curated an assembly of the first 48 full-length SARS-CoV-2 genomes isolated and sequenced across the world and performed a phylogenetic network analysis to monitor the emergence of genomic divergence in the global SARS-CoV-2 population.

Results: We identified regions of the genome that have accumulated mutations producing non-synonymous changes at the protein level, suggesting ongoing adaptation of SARS-CoV-2 to its novel human host. We identified a strong L84S mutational signal in ORF8 (present in 29.16% of genomes) together with 12 variable sites in the region encoding non-structural protein Nsp3 that represent the strongest putative regions under selection in our dataset. We did not detect mutations in the coronavirus spike protein, which is reassuring for the vaccines currently available or are ongoing large-scale clinical trials.

Conclusion: Our analysis provides a snapshot in time of a rapidly evolving pandemic based on available data. Our results are in line with previous findings that point to a common ancestor isolated in Wuhan that is likely to have circulated and spread worldwide.