Extending VQDNA/HRQ to Generative and Broader Genomic Applications

Investigate and develop broader applications of VQDNA and HRQ—specifically generative modeling tasks in genomics—and rigorously assess their effectiveness and biological significance beyond the discriminative tasks evaluated in this work.

Background

Empirical analyses show HRQ’s biological significance (e.g., pattern-awareness for SARS-CoV-2 lineages), but the framework is primarily evaluated on discriminative downstream tasks. The authors explicitly point to generation and broader genomic applications as an area deserving further paper.

They state these avenues remain open, indicating the need for systematic development and evaluation of VQDNA/HRQ in generative settings and other genomics applications.

References

There are several limitations in this work: (1) The superiority of VQDNA stems from its genome vocabulary learning, which is an additional training stage with extra costs compared to other models. Thus, there is still room for reducing its computational overhead to boost its applicability. (2) Due to the computational constraints, the model scale of VQDNA has not reached its maximum. How to scale up VQDNA while maintaining the gained merits is worth exploring. (3) As the HRQ vocabulary has shown great biological significance in SARS-CoV-2 mutations, broader applications in genomics with VQDNA, such as generation tasks, deserve to be studied. Overall, all these avenues remain open for our future research.

VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence Modeling (2405.10812 - Li et al., 13 May 2024) in Section 6 (Conclusion and Discussion), Limitations and Future Works