Predicting Entire Experiment Outcomes from Raw Code Using Regression Language Models

Determine whether code-based Regression Language Models—encoder–decoder language models that read source code or computation graph text and autoregressively output numeric metrics—can predict the numeric outcomes of entire experiments directly from raw code, extending their applicability from program-level metrics (such as memory usage, kernel latency, and architecture accuracy/latency) to complete experimental pipelines.

Background

The paper introduces Regression LLMs (RLMs) as unified, text-driven regressors that treat numeric prediction as next-token decoding over a specially tokenized number representation. Using a pretrained T5Gemma encoder, the authors demonstrate that a single RLM can simultaneously handle diverse inputs—including high-level code and ONNX graphs—and predict multiple metrics such as memory consumption, kernel latency, and neural network accuracy and latencies.

Despite these successes, the authors explicitly identify a broader ambition: moving beyond program-level metrics to predicting full experimental outcomes directly from raw code. This would generalize code-to-metric regression to encompass entire experimental pipelines, potentially simplifying feature engineering and instrumentation across scientific computing and machine learning workflows.

References

A key open question is whether such code-based RLMs can be more broadly used to predict the numeric outcome of entire experiments from raw code, but we leave this to future work and hope this paper will be a valuable reference for multiple scientific communities in automated machine learning, programming languages, and computer architecture.

— Regression Language Models for Code (2509.26476 - Akhauri et al., 30 Sep 2025) in Section 7. Conclusion

Predicting Entire Experiment Outcomes from Raw Code Using Regression Language Models

Background

References

Related Problems