Dice Question Streamline Icon: https://streamlinehq.com

Minimizing the description length of symbolic expressions

Determine a method to minimize the description length (Kolmogorov complexity) of the estimated function in the symbolic regression framework defined by the operator sets U (unary), B (binary), Π (projections), and constants, subject to simultaneously satisfying the loss minimization objective on the training data, as formalized by Goal (3) in the problem statement.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper formalizes symbolic regression with three goals: construct expressions from specified operators, minimize loss on training data, and among all such estimators select the one with the smallest description length (Goal 3, referencing Kolmogorov complexity). The authors focus their algorithmic development and analysis on the loss minimization objective, explicitly deferring the description-length minimization objective.

This open problem seeks a principled approach to minimizing expression length while retaining accuracy within the operator-composition framework established in the paper.

References

However, we do not directly address the goal \ref{goal:kolmogorov} defined before and mostly focus only on the accuracy of the obtained functions. Minimizing the length of the produced expression is left for future works.

A Functional Analysis Approach to Symbolic Regression (2402.06299 - Antonov et al., 9 Feb 2024) in Section 3.2 (Problem Reformulation)