Adapting generative sequence model training to exploit positive-only measurement (q=1)

Develop adaptations of sequence-generative modeling algorithms that directly refine the generative distribution p(x) (including conditioning and reward-based fine-tuning) to exploit measurement allocation q=1, where all sequencing is allocated to active sequences, in large-scale screening experiments so that these algorithms can benefit from the information gains associated with positive-only data collection.

Background

The paper proposes LeaVS, a method for co-designing experiments and inference for large-scale biological screens, and shows theoretically and empirically that allocating all measurements to positive examples (q=1) can maximize information gain when activity is sparse. LeaVS learns p(y|x) using a generative model of the library p(x) to correct for missing negatives.

Beyond modeling p(y|x), many sequence design algorithms refine the generative model p(x) directly (e.g., by conditioning or reward-based fine-tuning). The authors highlight that while LeaVS clarifies how to learn from positive-only measurements for predictive models, it remains unresolved how to adapt these p(x)-refinement algorithms to similarly leverage the q=1 regime for improved data efficiency and performance.

References

It is unclear how best to adapt these algorithms to reap the information gains of setting q=1.

— Accelerated Learning on Large Scale Screens using Generative Library Models (2510.16612 - Weinstein et al., 18 Oct 2025) in Discussion, Future directions

Adapting generative sequence model training to exploit positive-only measurement (q=1)

Background

References

Related Problems