Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLMs are Highly-Constrained Biophysical Sequence Optimizers (2410.22296v3)

Published 29 Oct 2024 in cs.LG and q-bio.QM
LLMs are Highly-Constrained Biophysical Sequence Optimizers

Abstract: LLMs have recently shown significant potential in various biological tasks such as protein engineering and molecule design. These tasks typically involve black-box discrete sequence optimization, where the challenge lies in generating sequences that are not only biologically feasible but also adhere to hard fine-grained constraints. However, LLMs often struggle with such constraints, especially in biological contexts where verifying candidate solutions is costly and time-consuming. In this study, we explore the possibility of employing LLMs as highly-constrained bilevel optimizers through a methodology we refer to as LLM Optimization with Margin Expectation (LLOME). This approach combines both offline and online optimization, utilizing limited oracle evaluations to iteratively enhance the sequences generated by the LLM. We additionally propose a novel training objective -- Margin-Aligned Expectation (MargE) -- that trains the LLM to smoothly interpolate between the reward and reference distributions. Lastly, we introduce a synthetic test suite that bears strong geometric similarity to real biophysical problems and enables rapid evaluation of LLM optimizers without time-consuming lab validation. Our findings reveal that, in comparison to genetic algorithm baselines, LLMs achieve significantly lower regret solutions while requiring fewer test function evaluations. However, we also observe that LLMs exhibit moderate miscalibration, are susceptible to generator collapse, and have difficulty finding the optimal solution when no explicit ground truth rewards are available.

Overview of "LLMs are Highly-Constrained Biophysical Sequence Optimizers"

This paper investigates the capabilities of LLMs as sophisticated tools for optimizing sequences in biophysical contexts, such as protein engineering and molecule design. These tasks often present as black-box discrete sequence optimization challenges, where the primary hurdles include generating biologically plausible sequences that comply with intricate constraints. The paper introduces a methodology called LLM Optimization with Margin Expectation (Llome) as an innovative use of LLMs in bilevel optimization frameworks to address this problem.

Methodology and Contributions

The core approach, Llome, employs LLMs in a bilevel optimization setting, which is characterized by an outer loop of offline optimization and an inner loop of online optimization without direct oracle feedback. The paper proposes a novel framework for handling the specific challenges of constrained sequence optimization in this domain:

  1. Synthetic Test Suite: The authors designed a synthetic test suite mirroring the geometrical complexity of real biophysical problems, which enables rapid evaluation of LLM optimizers without requiring lab validation. These synthetic Ehrlich functions provide a benchmark for assessing the ability of optimization algorithms to handle non-trivial biophysical sequence constraints.
  2. Exploring LLMs as Constrained Bilevel Optimizers: The paper leverages Llome to integrate LLMs effectively within a bilevel optimization loop, demonstrating superior performance over evolutionary methodologies. The focus is on generating lower-regret solutions with fewer oracle evaluations. This highlights the efficiency of LLMs in data-sparse environments common to biological research settings.
  3. Novel Training Objective (MargE): A novel LLM training objective, Margin-Aligned Expectation (MargE), was proposed to bridge the gap between reward-guided learning and reference distribution adherence. MargE is presented as an improvement over existing supervised finetuning (SFT) and direct preference optimization (DPO) techniques, particularly in navigating constrained optimization spaces.

Strong Numerical Results and Observations

The paper reports that, compared to benchmarks set by genetic algorithms, LLMs outfitted with the proposed Llome framework discover solutions with notably lower regret, indicating a higher degree of optimization with fewer evaluations. However, LLMs also face challenges such as moderate miscalibration, the risk of generator collapse, and difficulties in achieving solution optimality without explicit reward signals.

Implications and Future Directions

The findings underscore the potential for LLMs to significantly influence the field of biophysical optimization. They provide a compelling case for viewing LLMs as more than language processors, extending their utility into the field of complex, constraint-laden optimization tasks in biotechnology. Practically, this research suggests a path forward where LLMs are integrated into workflows requiring rapid, data-efficient iteration cycles, potentially revolutionizing fields like drug discovery and synthetic biology.

Theoretically, this work lays a foundation for further exploration into optimizing constrained systems using machine learning models by investigating alternative loss functions like MargE, which align generation capabilities with real-world constraints more effectively.

Conclusion

"LLMs are Highly-Constrained Biophysical Sequence Optimizers" offers a thoughtful and quantitatively robust exploration of how advanced LLMs can be adapted for intricate optimization tasks within the biophysical domain. This research not only broadens the application scope of LLMs but also sets a direction for future studies aiming to harness the full potential of machine learning in constrained optimization settings, which is crucial for advancing scientific discovery and technological innovation.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (116)
  1. Optimus: Scalable optimization modeling with (mi) lp solvers and large language models. arXiv preprint arXiv:2402.10172, 2024.
  2. Lm4opt: Unveiling the potential of large language models in formulating mathematical optimization problems, 2024. URL https://arxiv.org/abs/2403.01342.
  3. T2NER: Transformers based transfer learning framework for named entity recognition. In Dimitra Gkatzia and Djamé Seddah (eds.), Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp.  212–220, Online, April 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.eacl-demos.25. URL https://aclanthology.org/2021.eacl-demos.25.
  4. Population-based black-box optimization for biological sequence design. In International conference on machine learning, pp.  324–334. PMLR, 2020.
  5. Tuning multilingual transformers for language-specific named entity recognition. In Tomaž Erjavec, Michał Marcińczuk, Preslav Nakov, Jakub Piskorski, Lidia Pivovarova, Jan Šnajder, Josef Steinberger, and Roman Yangarber (eds.), Proceedings of the 7th Workshop on Balto-Slavic Natural Language Processing, pp.  89–93, Florence, Italy, August 2019. Association for Computational Linguistics. doi: 10.18653/v1/W19-3712. URL https://aclanthology.org/W19-3712.
  6. Frances H Arnold. Design by directed evolution. Accounts of chemical research, 31(3):125–131, 1998.
  7. Thomas Back. Evolutionary algorithms in theory and practice: evolution strategies, evolutionary programming, genetic algorithms. Oxford university press, 1996.
  8. Flex ddg: Rosetta ensemble-based estimation of changes in protein–protein binding affinity upon mutation. The Journal of Physical Chemistry B, 122(21):5389–5399, 2018.
  9. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pp.  2397–2430. PMLR, 2023.
  10. On the opportunities and risks of foundation models. ArXiv, 2021. URL https://crfm.stanford.edu/assets/report.pdf.
  11. Language models are few-shot learners. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  1877–1901. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf.
  12. Fine-grained controllable text generation using non-residual prompting. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (eds.), Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  6837–6857, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.acl-long.471. URL https://aclanthology.org/2022.acl-long.471.
  13. Open problems and fundamental limitations of reinforcement learning from human feedback, 2023. URL https://arxiv.org/abs/2307.15217.
  14. Deep extrapolation for attribute-enhanced generation. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  14084–14096. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/75da5036f659fe64b53f3d9b39412967-Paper.pdf.
  15. Evoprompting: language models for code-level neural architecture search. In Proceedings of the 37th International Conference on Neural Information Processing Systems, NIPS ’23, Red Hook, NY, USA, 2024a. Curran Associates Inc.
  16. Preference learning algorithms do not learn preference rankings, 2024b.
  17. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
  18. Cerberus transformer: Joint semantic, affordance and attribute parsing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  19649–19658, 2022a.
  19. Benchmarking large language models on controllable generation under diversified instructions. Proceedings of the AAAI Conference on Artificial Intelligence, 38(16):17808–17816, Mar. 2024c. doi: 10.1609/aaai.v38i16.29734. URL https://ojs.aaai.org/index.php/AAAI/article/view/29734.
  20. Towards learning universal hyperparameter optimizers with transformers. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  32053–32068. Curran Associates, Inc., 2022b. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/cf6501108fced72ee5c47e2151c4e153-Paper-Conference.pdf.
  21. Generative models should at least be able to design molecules that dock well: A new benchmark. Journal of Chemical Information and Modeling, 63(11):3238–3247, 2023.
  22. Plug and play language models: A simple approach to controlled text generation. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=H1edEyBKDS.
  23. Controlled text generation via language model arithmetic. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=SLw9fp4yI6.
  24. Reward-augmented decoding: Efficient controlled text generation with a unidirectional reward model. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  11781–11791, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.721. URL https://aclanthology.org/2023.emnlp-main.721.
  25. Xuan-Dung Doan. Vtcc-nlp at nl4opt competition subtask 1: An ensemble pre-trained language models for named entity recognition, 2022. URL https://arxiv.org/abs/2212.07219.
  26. RAFT: Reward ranked finetuning for generative foundation model alignment. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=m7p5O7zblY.
  27. Compositional semantic parsing with large language models. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=gJW8hSGBys8.
  28. A measure-theoretic characterization of tight language models. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  9744–9770, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.543. URL https://aclanthology.org/2023.acl-long.543.
  29. Towards analyzing and understanding the limitations of dpo: A theoretical perspective, 2024. URL https://arxiv.org/abs/2404.04626.
  30. Pal: Program-aided language models. arXiv preprint arXiv:2211.10435, 2022.
  31. Why is constrained neural language generation particularly challenging?, 2022.
  32. Protein design with guided discrete diffusion. Advances in neural information processing systems, 36, 2024.
  33. Towards optimizing with large language models. In Fourth Workshop on Knowledge-infused Learning, 2024a. URL https://openreview.net/forum?id=vIU8LUckb4.
  34. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. In The Twelfth International Conference on Learning Representations, 2024b. URL https://openreview.net/forum?id=ZG3RaNIsO8.
  35. Two optimizers are better than one: Llm catalyst empowers gradient-based optimization for prompt tuning, 2024c. URL https://arxiv.org/abs/2405.19732.
  36. Semantic parsing for task oriented dialog using hierarchical representations. arXiv preprint arXiv:1810.07942, 2018.
  37. Linear programming word problems formulation using ensemblecrf ner labeler and t5 text generator with data augmentations, 2022. URL https://arxiv.org/abs/2212.14657.
  38. Solving math word problems by combining language models with symbolic solvers. arXiv preprint arXiv:2304.09102, 2023.
  39. New desiderata for direct preference optimization. In ICML 2024 Workshop on Models of Human Feedback for AI Alignment, 2024. URL https://openreview.net/forum?id=Fgf0iAOb22.
  40. Investigating the volume and diversity of data needed for generalizable antibody-antigen δ𝛿\deltaitalic_δδ𝛿\deltaitalic_δg prediction. bioRxiv, pp.  2023–05, 2023.
  41. Sanghwan Jang. Tag embedding and well-defined intermediate representation improve auto-formulation of problem description. arXiv preprint arXiv:2212.03575, 2022.
  42. Role of conformational sampling in computing mutation-induced changes in protein structure and stability. Proteins: Structure, Function, and Bioinformatics, 79(3):830–838, 2011.
  43. CTRL - A Conditional Transformer Language Model for Controllable Generation. arXiv preprint arXiv:1909.05858, 2019.
  44. DSPy: Compiling declarative language model calls into state-of-the-art pipelines. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=sY5N0zY5Od.
  45. Understanding the effects of RLHF on LLM generalisation and diversity. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=PXD3FAVHJT.
  46. On reward maximization and distribution matching for fine-tuning language models, 2022a. URL https://openreview.net/forum?id=8f95ajHrIFc.
  47. Rl with kl penalties is better viewed as bayesian inference, 2022b. URL https://arxiv.org/abs/2205.11275.
  48. Large language models as evolution strategies. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’24 Companion, pp.  579–582, New York, NY, USA, 2024. Association for Computing Machinery. ISBN 9798400704956. doi: 10.1145/3638530.3654238. URL https://doi.org/10.1145/3638530.3654238.
  49. Evolution through large models. In Handbook of Evolutionary Machine Learning, pp.  331–366. Springer, 2023.
  50. Exploring mathematical extrapolation of large language models with synthetic data. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar (eds.), Findings of the Association for Computational Linguistics ACL 2024, pp.  936–946, Bangkok, Thailand and virtual meeting, August 2024. Association for Computational Linguistics. doi: 10.18653/v1/2024.findings-acl.55. URL https://aclanthology.org/2024.findings-acl.55.
  51. DExperts: Decoding-time controlled text generation with experts and anti-experts. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli (eds.), Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp.  6691–6706, Online, August 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.acl-long.522. URL https://aclanthology.org/2021.acl-long.522.
  52. Large language models as evolutionary optimizers. In 2024 IEEE Congress on Evolutionary Computation (CEC), pp.  1–8, 2024a. doi: 10.1109/CEC60901.2024.10611913.
  53. Large language models to enhance bayesian optimization. arXiv preprint arXiv:2402.03921, 2024b.
  54. LLM and simulation as bilevel optimizers: A new paradigm to advance physical scientific discovery. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp (eds.), Proceedings of the 41st International Conference on Machine Learning, volume 235 of Proceedings of Machine Learning Research, pp.  33940–33962. PMLR, 21–27 Jul 2024. URL https://proceedings.mlr.press/v235/ma24m.html.
  55. Self-refine: Iterative refinement with self-feedback. Advances in Neural Information Processing Systems, 36, 2024.
  56. Large language models generate functional protein sequences across diverse families. Nature Biotechnology, 41(8):1099–1106, January 2023. ISSN 1546-1696. doi: 10.1038/s41587-022-01618-2. URL http://dx.doi.org/10.1038/s41587-022-01618-2.
  57. Sadhika Malladi. The hidden infinity in preference learning, July 2024. URL https://www.cs.princeton.edu/~smalladi/blog/2024/06/27/dpo-infinity/.
  58. Local latent space bayesian optimization over structured inputs. Advances in neural information processing systems, 35:34505–34518, 2022.
  59. SimPO: Simple preference optimization with a reference-free reward. arXiv preprint arXiv:2405.14734, 2024.
  60. Language model crossover: Variation through few-shot prompting. ACM Trans. Evol. Learn. Optim., September 2024. doi: 10.1145/3694791. URL https://doi.org/10.1145/3694791. Just Accepted.
  61. Puzzlebench: Can llms solve challenging first-order combinatorial reasoning problems?, 2024. URL https://arxiv.org/abs/2402.02611.
  62. Test functions for optimization needs. Test functions for optimization needs, 101:48, 2005.
  63. Llmatic: neural architecture search via large language models and quality diversity optimization. In Proceedings of the Genetic and Evolutionary Computation Conference, pp.  1110–1118, 2024.
  64. Importance of directional feedback for LLM-based optimizers. In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023. URL https://openreview.net/forum?id=QW4eGh5GT3.
  65. Codegen2: Lessons for training llms on programming and natural languages. ICLR, 2023a.
  66. Codegen: An open large language model for code with multi-turn program synthesis. ICLR, 2023b.
  67. A novel approach for auto-formulation of optimization problems. arXiv preprint arXiv:2302.04643, 2023.
  68. Art B. Owen. Monte Carlo theory, methods and examples. https://artowen.su.domains/mc/, 2013.
  69. Vishakh Padmakumar and He He. Does writing with language models reduce content diversity?, 2024. URL https://arxiv.org/abs/2309.05196.
  70. Extrapolative controlled sequence generation via iterative refinement. In International Conference on Machine Learning, pp.  26792–26808. PMLR, 2023.
  71. Smaug: Fixing failure modes of preference optimisation with dpo-positive, 2024. URL https://arxiv.org/abs/2402.13228.
  72. Iterative reasoning preference optimization, 2024. URL https://arxiv.org/abs/2404.19733.
  73. A plug-and-play method for controlled text generation. In Marie-Francine Moens, Xuanjing Huang, Lucia Specia, and Scott Wen-tau Yih (eds.), Findings of the Association for Computational Linguistics: EMNLP 2021, pp.  3973–3997, Punta Cana, Dominican Republic, November 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.findings-emnlp.334. URL https://aclanthology.org/2021.findings-emnlp.334.
  74. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  75. Synchromesh: Reliable code generation from pre-trained language models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=KmtVD97J43e.
  76. Train short, test long: Attention with linear biases enables input length extrapolation. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=R8sQPpGCv0.
  77. Automatic prompt optimization with “gradient descent” and beam search. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  7957–7968, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.494. URL https://aclanthology.org/2023.emnlp-main.494.
  78. Language models are unsupervised multitask learners. 2019.
  79. Direct preference optimization: Your language model is secretly a reward model. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=HPuSIXJaa9.
  80. Scaling laws for reward model overoptimization in direct alignment algorithms, 2024a. URL https://arxiv.org/abs/2406.02900.
  81. From $r$ to $q^*$: Your language model is secretly a q-function. In First Conference on Language Modeling, 2024b. URL https://openreview.net/forum?id=kEVcNxtqXk.
  82. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(1), January 2020. ISSN 1532-4435.
  83. Nl4opt competition: Formulating optimization problems based on their natural language descriptions. In Marco Ciccone, Gustavo Stolovitzky, and Jacob Albrecht (eds.), Proceedings of the NeurIPS 2022 Competitions Track, volume 220 of Proceedings of Machine Learning Research, pp.  189–203. PMLR, 28 Nov–09 Dec 2022. URL https://proceedings.mlr.press/v220/ramamonjison23a.html.
  84. Mathematical discoveries from program search with large language models. Nature, 625(7995):468–475, December 2023. ISSN 1476-4687. doi: 10.1038/s41586-023-06924-6. URL http://dx.doi.org/10.1038/s41586-023-06924-6.
  85. Don’t parse, generate! a sequence to sequence architecture for task-oriented semantic parsing. In Proceedings of the web conference 2020, pp.  2962–2968, 2020.
  86. Graph-based transformer with cross-candidate verification for semantic parsing. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pp.  8807–8814, 2020.
  87. Generating logical forms from graph representations of text and entities. In Anna Korhonen, David Traum, and Lluís Màrquez (eds.), Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp.  95–106, Florence, Italy, July 2019. Association for Computational Linguistics. doi: 10.18653/v1/P19-1010. URL https://aclanthology.org/P19-1010.
  88. Learning contextual representations for semantic parsing with generation-augmented pre-training. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pp.  13806–13814, 2021.
  89. Adalead: A simple and robust adaptive greedy search algorithm for sequence design. arXiv preprint arXiv:2010.02141, 2020.
  90. A long way to go: Investigating length correlations in RLHF, 2024. URL https://openreview.net/forum?id=sNtDKdcI1f.
  91. Position: Leverage foundational models for black-box optimization. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview.net/forum?id=ea2MgKn3sV.
  92. Accelerating bayesian optimization for biological sequence design with denoising autoencoders. In International Conference on Machine Learning, pp.  20459–20478. PMLR, 2022.
  93. Joint universal syntactic and semantic parsing. Transactions of the Association for Computational Linguistics, 9:756–773, 2021.
  94. Learning to summarize with human feedback. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (eds.), Advances in Neural Information Processing Systems, volume 33, pp.  3008–3021. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper_files/paper/2020/file/1f89885d556929e98d3ef9b86448f951-Paper.pdf.
  95. Evaluating large language models on controlled generation tasks. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pp.  3155–3168, Singapore, December 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.emnlp-main.190. URL https://aclanthology.org/2023.emnlp-main.190.
  96. Implicitly guided design with propen: Match your data to follow the gradient, 2024. URL https://arxiv.org/abs/2405.18075.
  97. Understanding the performance gap between online and offline alignment algorithms, 2024. URL https://arxiv.org/abs/2405.08448.
  98. A fresh look at de novo molecular design benchmarks. In NeurIPS 2021 AI for Science Workshop, 2021. URL https://openreview.net/forum?id=gS3XMun4cl_.
  99. T-NER: An all-round python library for transformer-based named entity recognition. In Dimitra Gkatzia and Djamé Seddah (eds.), Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp.  53–62, Online, April 2021. Association for Computational Linguistics. doi: 10.18653/v1/2021.eacl-demos.7. URL https://aclanthology.org/2021.eacl-demos.7.
  100. Secrets of rlhf in large language models part ii: Reward modeling, 2024. URL https://arxiv.org/abs/2401.06080.
  101. Opd@nl4opt: An ensemble approach for the ner task of the optimization problem, 2023a. URL https://arxiv.org/abs/2301.02459.
  102. Gpt-ner: Named entity recognition via large language models, 2023b. URL https://arxiv.org/abs/2304.10428.
  103. Consistency of a recurrent language model with respect to incomplete decoding. In Bonnie Webber, Trevor Cohn, Yulan He, and Yang Liu (eds.), Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.  5553–5568, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.emnlp-main.448. URL https://aclanthology.org/2020.emnlp-main.448.
  104. Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3–4):229–256, May 1992. ISSN 1573-0565. doi: 10.1007/bf00992696. URL http://dx.doi.org/10.1007/BF00992696.
  105. Tener: Adapting transformer encoder for named entity recognition, 2019. URL https://arxiv.org/abs/1911.04474.
  106. Large language models as optimizers. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=Bb4VGOWELI.
  107. Following length constraints in instructions, 2024. URL https://arxiv.org/abs/2406.17744.
  108. Using large language models for hyperparameter optimization. In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023a. URL https://openreview.net/forum?id=FUdZ6HEOre.
  109. Using large language models for hyperparameter optimization. In NeurIPS 2023 Foundation Models for Decision Making Workshop, 2023b. URL https://openreview.net/forum?id=FUdZ6HEOre.
  110. Planning with large language models for code generation. In The Eleventh International Conference on Learning Representations, 2023c. URL https://openreview.net/forum?id=Lr8cOOtYbfL.
  111. Secrets of rlhf in large language models part i: Ppo, 2023a. URL https://arxiv.org/abs/2307.04964.
  112. Secrets of rlhf in large language models part i: Ppo. 2023b.
  113. Toward unified controllable text generation via regular expression instruction. In Jong C. Park, Yuki Arase, Baotian Hu, Wei Lu, Derry Wijaya, Ayu Purwarianti, and Adila Alfa Krisnadhi (eds.), Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pp.  1–14, Nusa Dua, Bali, November 2023c. Association for Computational Linguistics. doi: 10.18653/v1/2023.ijcnlp-main.1. URL https://aclanthology.org/2023.ijcnlp-main.1.
  114. Sequence to sequence reward modeling: Improving rlhf by language feedback, 2024. URL https://arxiv.org/abs/2409.00162.
  115. Controlled text generation with natural language instructions. In Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  42602–42613. PMLR, 23–29 Jul 2023. URL https://proceedings.mlr.press/v202/zhou23g.html.
  116. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593, 2019.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Angelica Chen (22 papers)
  2. Samuel D. Stanton (1 paper)
  3. Robert G. Alberstein (1 paper)
  4. Andrew M. Watkins (3 papers)
  5. Richard Bonneau (13 papers)
  6. Kyunghyun Cho (292 papers)
  7. Nathan C. Frey (19 papers)
  8. Vladimir Gligorijević (5 papers)