Machine learning enables scalable and systematic hierarchical virus taxonomy

  • Roux, S. et al. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature 537, 689–693 (2016).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Guidi, L. et al. Plankton networks driving carbon export in the oligotrophic ocean. Nature 532, 465–470 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zimmerman, A. E. et al. Metabolic and biogeochemical consequences of viral infection in aquatic ecosystems. Nat. Rev. Microbiol. https://doi.org/10.1038/s41579-019-0270-x (2020).

    Article 
    PubMed 

    Google Scholar
     

  • Emerson, J. B. et al. Host-linked soil viral ecology along a permafrost thaw gradient. Nat. Microbiol. 3, 870–880 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Jansson, J. K. & Wu, R. Soil viral diversity, ecology and climate change. Nat. Rev. Microbiol. 21, 296–311 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Koskella, B. & Taylor, T. B. Multifaceted impacts of bacteriophages in the plant microbiome. Annu. Rev. Phytopathol. 56, 361–380 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Yan, M. et al. Interrogating the viral dark matter of the rumen ecosystem with a global virome database. Nat. Commun. 14, 5254 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Yan, M. & Yu, Z. Viruses contribute to microbial diversification in the rumen ecosystem and are associated with certain animal production traits. Microbiome 12, 82 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Shkoporov, A. N. & Hill, C. Bacteriophages of the human gut: the “known unknown” of the microbiome. Cell Host Microbe 25, 195–209 (2019).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Shkoporov, A. N., Turkington, C. J. & Hill, C. Mutualistic interplay between bacteriophages and bacteria in the human gut. Nat. Rev. Microbiol. 20, 737–749 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Walker, P. J. et al. Changes to virus taxonomy and the Statutes ratified by the International Committee on Taxonomy of Viruses (2020). Arch. Virol. 165, 2737–2748 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Walker, P. J. et al. Recent changes to virus taxonomy ratified by the International Committee on Taxonomy of Viruses (2022). Arch. Virol. 167, 2429–2440 (2022).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Zerbini, F. M. et al. Changes to virus taxonomy and the ICTV Statutes ratified by the International Committee on Taxonomy of Viruses (2023). Arch. Virol. 168, 175 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gorbalenya, A. E. et al. The new scope of virus taxonomy: partitioning the virosphere into 15 hierarchical ranks. Nat. Microbiol 5, 668–674 (2020).

    Article 

    Google Scholar
     

  • Camargo, A. P. et al. IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata. Nucleic Acids Res. 51, D733–D743 (2023).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Roux, S. et al. Minimum Information about an Uncultivated Virus Genome (MIUViG): a community consensus on standards and best practices for describing genome sequences from uncultivated viruses. Nat. Biotechnol. 37, 29–37 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Simmonds, P. et al. Consensus statement: virus taxonomy in the age of metagenomics. Nat. Rev. Microbiol. 15, 161–168 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Simmonds, P. et al. Four principles to establish a universal virus taxonomy. PLoS Biol. 21, e3001922 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Dutilh, B. E. et al. Perspective on taxonomic classification of uncultivated viruses. Curr. Opin. Virol. 51, 207–215 (2021).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Koonin, E. V., Senkevich, T. G. & Dolja, V. V. The ancient Virus World and evolution of cells. Biol. Direct 1, 29 (2006).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Holmes, E. C. What does virus evolution tell us about virus origins? J. Virol. 85, 5247–5251 (2011).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Koonin, E. V. & Dolja, V. V. Virus World as an evolutionary network of viruses and capsidless selfish elements. Microbiol. Mol. Biol. Rev. 78, 278–303 (2014).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Moraru, C. VirClust—a tool for hierarchical clustering, core protein detection and annotation of (prokaryotic) viruses. Viruses 15, 1007 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Aiewsakun, P. & Simmonds, P. The genomic underpinnings of eukaryotic virus taxonomy: creating a sequence-based framework for family-level virus classification. Microbiome 6, 38 (2018).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Pons, J. C. et al. VPF-Class: taxonomic assignment and host prediction of uncultivated viruses based on viral protein families. Bioinformatics https://doi.org/10.1093/bioinformatics/btab026 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Camargo, A. P. et al. Identification of mobile genetic elements with geNomad. Nat. Biotechnol. 42, 1303–1312 (2024).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Moraru, C., Varsani, A. & Kropinski, A. M. VIRIDIC—a novel tool to calculate the intergenomic similarities of prokaryote-infecting viruses. Viruses 12, 1268 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bao, Y., Chetvernin, V. & Tatusova, T. Improvements to pairwise sequence comparison (PASC): a genome-based web tool for virus classification. Arch. Virol. 159, 3293–3304 (2014).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Tisza, M. J., Belford, A. K., Domínguez-Huerta, G., Bolduc, B. & Buck, C. B. Cenote-Taker 2 democratizes virus discovery and sequence annotation. Virus Evol. 7, veaa100 (2021).

    Article 
    PubMed 

    Google Scholar
     

  • Lima-Mendez, G., Van Helden, J., Toussaint, A. & Leplae, R. Reticulate representation of evolutionary and functional relationships between phage genomes. Mol. Biol. Evol. 25, 762–777 (2008).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Bolduc, B. et al. vConTACT: an iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. PeerJ 5, e3243 (2017).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bin Jang, H. et al. Taxonomic assignment of uncultivated prokaryotic virus genomes is enabled by gene-sharing networks. Nat. Biotechnol. 37, 632–639 (2019).

    Article 

    Google Scholar
     

  • Barylski, J. et al. Analysis of Spounaviruses as a case study for the overdue reclassification of tailed phages. Syst. Biol. 69, 110–123 (2020).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Turner, D. et al. Abolishment of morphology-based taxa and change to binomial species names: 2022 taxonomy update of the ICTV bacterial viruses subcommittee. Arch. Virol. 168, 74 (2023).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Van Dongen, S. Graph clustering via a discrete uncoupling process. SIAM J. Matrix Anal. Appl. 30, 121–141 (2008).

    Article 

    Google Scholar
     

  • Gorbalenya, A. E. & Lauber, C. Bioinformatics of virus taxonomy: foundations and tools for developing sequence-based hierarchical classification. Curr. Opin. Virol. 52, 48–56 (2022).

    Article 
    PubMed 

    Google Scholar
     

  • Wertheim, J. O., Steel, M. & Sanderson, M. J. Accuracy in near-perfect virus phylogenies. Syst. Biol. 71, 426–438 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Meier-Kolthoff, J. P. & Göker, M. VICTOR: genome-based phylogeny and classification of prokaryotic viruses. Bioinformatics 33, 3396–3404 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gregory, A. C. et al. Genomic differentiation among wild cyanophages despite widespread horizontal gene transfer. BMC Genomics 17, 930 (2016).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Bobay, L. & Ochman, H. Biological species in the viral world. Proc. Natl Acad. Sci. USA 115, 6040–6045 (2018).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Ndovie, W. et al. Exploration of the genetic landscape of bacterial dsDNA viruses reveals an ANI gap amid extensive mosaicism. mSystems https://doi.org/10.1128/msystems.01661-24 (2025).

  • Cook, R. et al. INfrastructure for a PHAge REference Database: identification of large-scale biases in the current collection of cultured phage genomes. PHAGE 2, 214–223 (2021).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Nelson, D. Phage taxonomy: we agree to disagree. J. Bacteriol. 186, 7029–7031 (2004).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Krupovic, M., Quemin, E. R. J., Bamford, D. H., Forterre, P. & Prangishvili, D. Unification of the globally distributed spindle-shaped viruses of the Archaea. J. Virol. 88, 2354–2358 (2014).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rokyta, D. R., Burch, C. L., Caudle, S. B. & Wichman, H. A. Horizontal gene transfer and the evolution of microvirid coliphage genomes. J. Bacteriol. 188, 1134–1142 (2006).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Dominguez-Huerta, G. et al. Diversity and ecological footprint of Global Ocean RNA viruses. Science 376, 1202–1208 (2022).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Gregory, A. C. et al. Marine DNA viral macro- and microdiversity from Pole to Pole. Cell 177, 1109–1123 (2019).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Gregory, A. C. et al. The gut virome database reveals age-dependent patterns of virome diversity in the human gut. Cell Host Microbe 28, 724–740 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Graham, E. B. et al. A global atlas of soil viruses reveals unexplored biodiversity and potential biogeochemical impacts. Nat. Microbiol. 9, 1873–1883 (2024).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Nayfach, S. et al. Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome. Nat. Microbiol 6, 960–970 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Shi, M. et al. Redefining the invertebrate RNA virosphere. Nature 540, 539–543 (2016).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Schulz, F. et al. Giant virus diversity and host interactions through global metagenomics. Nature 578, 432–436 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Camarillo-Guerrero, L. F., Almeida, A., Rangel-Pineros, G., Finn, R. D. & Lawley, T. D. Massive expansion of human gut bacteriophage diversity. Cell 184, 1098–1109 (2021).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Rand, W. M. Objective criteria for the evaluation of clustering methods. J. Am. Stat. Assoc. 66, 846–850 (1971).

    Article 

    Google Scholar
     

  • Strehl, A. & Ghosh, J. Cluster ensembles—a knowledge reuse framework for combining multiple partitions. J. Mach. Learn. Res. 3, 583–617 (2003).


    Google Scholar
     

  • Larralde, M. Pyrodigal: Python bindings and interface to Prodigal, an efficient method for gene prediction in prokaryotes. J. Open Source Softw. 7, 4296 (2022).

    Article 

    Google Scholar
     

  • Hyatt, D. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinf. 11, 119 (2010).

    Article 

    Google Scholar
     

  • Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Staudt, C. L., Sazonovs, A. & Meyerhenke, H. NetworKit: a tool suite for large-scale complex network analysis. Netw. Sci. 4, 508–530 (2016).

    Article 

    Google Scholar
     

  • Huerta-Cepas, J., Serra, F. & Bork, P. ETE 3: reconstruction, analysis, and visualization of phylogenomic data. Mol. Biol. Evol. 33, 1635–1638 (2016).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Hoang, D. T., Chernomor, O., von Haeseler, A., Minh, B. Q. & Vinh, L. S. UFBoot2: improving the ultrafast bootstrap approximation. Mol. Biol. Evol. 35, 518–522 (2018).

    Article 
    CAS 
    PubMed 

    Google Scholar
     

  • Minh, B. Q. et al. IQ-TREE 2: New models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Kalyaanamoorthy, S., Minh, B. Q., Wong, T. K. F., von Haeseler, A. & Jermiin, L. S. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat. Methods 14, 587–589 (2017).

    Article 
    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Letunic, I. & Bork, P. Interactive Tree of Life (iTOL) v6: recent updates to the phylogenetic tree display and annotation tool. Nucleic Acids Res. 52, W78–W82 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  • Millard, A. et al. taxmyPHAGE: Automated taxonomy of dsDNA phage genomes at the genus and species level. Phage (New Rochelle) 6, 5–11 (2025).

    CAS 
    PubMed 

    Google Scholar
     

  • Bolduc, B. vConTACT3 database v.220. Zenodo https://doi.org/10.5281/zenodo.10035618 (2023).

  • Bolduc, B. vConTACT3 database v.223. Zenodo https://doi.org/10.5281/zenodo.10935512 (2024).

  • Bolduc, B. vConTACT3 database v.223 (software repository). Bitbucket https://bitbucket.org/MAVERICLab/vcontact3/src/master/ (2025).

  • Leave a Comment