Viruses are among the most enigmatic microorganisms in the world, closely linked to human health. They play a crucial role in ecosystems, representing a vast diversity that is still being explored. Recent advancements in macro-transcriptomics have provided robust support for assessing the global diversity of RNA viruses. Initially focused on human and animal viruses, RNA virus research has now expanded to include overlooked invertebrates and environmental samples from various habitats.
Despite significant progress in broadening sampling ranges and refining sequencing technologies, the academic community still heavily relies on the homology of known virus sequences for RNA virus identification. This has left us with limited knowledge about highly differentiated viruses, often referred to as "dark matter."
On October 9, 2024, a team led by Professor Shi Mang from Sun Yat-sen University and Li Zhaorong from Alibaba Cloud published a groundbreaking study titled "Using Artificial Intelligence to Document the Hidden RNA Virosphere" in the prestigious journal Cell. Utilizing artificial intelligence (AI), the research team identified 180 virus supergroups and over 160,000 novel RNA viruses, nearly tripling the known virus diversity.
Significantly, this study revealed viral "dark matter" that traditional research methods had overlooked, substantially expanding our understanding of RNA virus diversity. This breakthrough marks a milestone in the application of deep learning algorithms for virus discovery, creating a new paradigm in virology research.
The study uniquely integrates sequence and structural information to mine 10,487 macro-transcriptomic datasets for viruses, uncovering 513,134 viral genomes representing 161,979 potential virus species and 180 RNA virus supergroups—a ninefold increase in supergroup numbers. Notably, 23 of these supergroups could not be identified through sequence homology, emphasizing the presence of viral dark matter in diverse environments like air, Antarctic sediment, deep-sea hydrothermal vents, activated sludge, and salt-alkali flats.
Remarkably, the research uncovered the largest RNA virus ever documented, with a genome length of 47,250 nucleotides, reshaping our understanding of the RNA "virosphere." The success of this AI-driven exploration signals the beginning of a new era in virus discovery. It not only enhances our comprehension of global RNA virus diversity but also offers fresh insights into the ecological roles of these microorganisms.
Additionally, the findings hold significant implications for public health, biosecurity, and vaccine development, reinforcing humanity's ability to tackle future pandemic risks. With the advent of more efficient sequence recognition technologies and rapid protein structure prediction models, the virology community anticipates utilizing these deep learning frameworks to achieve even broader and more precise virus discovery.