Comparative analysis of alignment-free genome clustering and whole genome alignment-based phylogenomic relationship of coronaviruses

Описание

Тип публикации: статья из журнала

Год издания: 2022

Идентификатор DOI: 10.1371/journal.pone.0264640

Аннотация: The SARS-CoV-2 is the third coronavirus in addition to SARS-CoV and MERS-CoV that causes severe respiratory syndrome in humans. All of them likely crossed the interspecific barrier between animals and humans and are of zoonotic origin, respectively. The origin and evolution of viruses and their phylogenetic relationships are of greПоказать полностьюat importance for study of their pathogenicity and development of antiviral drugs and vaccines. The main objective of the presented study was to compare two methods for identifying relationships between coronavirus genomes: phylogenetic one based on the whole genome alignment followed by molecular phylogenetic tree inference and alignment-free clustering of triplet frequencies, respectively, using 69 coronavirus genomes selected from two public databases. Both approaches resulted in well-resolved robust classifications. In general, the clusters identified by the first approach were in good agreement with the classes identified by the second using K-means and the elastic map method, but not always, which still needs to be explained. Both approaches demonstrated also a significant divergence of genomes on a taxonomic level, but there was less correspondence between genomes regarding the types of diseases they caused, which may be due to the individual characteristics of the host. This research showed that alignment-free methods are efficient in combination with alignment-based methods. They have a significant advantage in computational complexity and provide valuable additional alternative information on the genomes relationships. © 2022 Kirichenko et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. The SARS-CoV-2 is the third coronavirus in addition to SARS-CoV and MERS-CoV that causes severe respiratory syndrome in humans. All of them likely crossed the interspecific barrier between animals and humans and are of zoonotic origin, respectively. The origin and evolution of viruses and their phylogenetic relationships are of great importance for study of their pathogenicity and development of antiviral drugs and vaccines. The main objective of the presented study was to compare two methods for identifying relationships between coronavirus genomes: phylogenetic one based on the whole genome alignment followed by molecular phylogenetic tree inference and alignment-free clustering of triplet frequencies, respectively, using 69 coronavirus genomes selected from two public databases. Both approaches resulted in well-resolved robust classifications. In general, the clusters identified by the first approach were in good agreement with the classes identified by the second using K-means and the elastic map method, but not always, which still needs to be explained. Both approaches demonstrated also a significant divergence of genomes on a taxonomic level, but there was less correspondence between genomes regarding the types of diseases they caused, which may be due to the individual characteristics of the host. This research showed that alignment-free methods are efficient in combination with alignment-based methods. They have a significant advantage in computational complexity and provide valuable additional alternative information on the genomes relationships. © 2022 Kirichenko et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Ссылки на полный текст

Издание

Журнал: PLoS ONE

Выпуск журнала: Vol. 17, Is. 3 March

Номера страниц: 0264640

ISSN журнала: 19326203

Издатель: Public Library of Science

Персоны

  • Kirichenko A.D. (Department of Genomics and Bioinformatics, Institute of Fundamental Biology and Biotechnology, Siberian Federal University, Krasnoyarsk, Russian Federation)
  • Poroshina A.A. (Laboratory of Molecular Systematics, Limnological Institute, Siberian Branch of Russian Academy of Sciences, Irkutsk, Russian Federation)
  • Sherbakov D.Yu. (Laboratory of Molecular Systematics, Limnological Institute, Siberian Branch of Russian Academy of Sciences, Irkutsk, Russian Federation, Faculty of Biology and Soil Studies, Irkutsk State University, Irkutsk, Russian Federation, Novosibirsk State University, Faculty of Natural Sciences, Novosibirsk, Russian Federation)
  • Sadovsky M.G. (Institute of Computational Modelling, Siberian Branch of Russian Academy of Sciences, Krasnoyarsk, Russian Federation, V.F. Voino-Yasenetsky Krasnoyarsk State Medical University, Krasnoyarsk, Russian Federation, Federal Research and Clinical Center, Federal Medical-Biological Agency, Krasnoyarsk, Russian Federation)
  • Krutovsky K.V. (Department of Genomics and Bioinformatics, Institute of Fundamental Biology and Biotechnology, Siberian Federal University, Krasnoyarsk, Russian Federation, Department of Forest Genetics and Forest Tree Breeding, Georg-August University of Göttingen, Göttingen, Germany, Center for Integrated Breeding Research, Georg-August University of Göttingen, Göttingen, Germany, Laboratory of Forest Genomics, Genome Research and Education Center, Institute of Fundamental Biology and Biotechnology, Siberian Federal University, Krasnoyarsk, Russian Federation, Laboratory of Population Genetics, N.I. Vavilov Institute of General Genetics, Russian Academy of Sciences, Moscow, Russian Federation, Scientific and methodological center, G. F. Morozov Voronezh State University of Forestry and Technologies, Voronezh, Russian Federation)

Вхождение в базы данных