Verification of Faithfulness in Automated Idea Execution Agents
Develop rigorous methods to verify the faithfulness and correctness of code implementations produced by an LLM-based idea execution agent, ensuring that baseline and proposed methods are implemented as specified and that evaluation metrics are computed correctly, rather than relying solely on final experiment outcomes.
References
Given these errors, we believe more work is needed to carefully verify the code implementations produced by the execution agent rather than blindly trusting their executed results, and we leave such attempts to future work.
— Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers
(2409.04109 - Si et al., 6 Sep 2024) in Appendix, Attempt on Idea Execution Agent