Synthetic Functions: Methods & Applications
- Synthetic functions are mathematically constructed functions that enable controlled experiments in PDE operator learning, logic synthesis, and functional data analysis.
- They utilize analytical inversion, eigenbasis expansion, and SRVF-based methods to generate error-free, scalable datasets for model training and benchmarking.
- These functions optimize performance in neural operators, hardware verification, and privacy-aware analytics despite challenges like exponential normal form compilation and basis selection.
Synthetic functions are mathematically or computationally constructed functions designed for controlled experimentation, benchmarking, model training, or privacy-preserving data analysis. Their construction eschews direct measurement or simulation from a physical or empirical system, instead employing explicit rules or randomized procedures tailored to task-specific requirements, such as learning solution maps for partial differential equations (PDEs), synthesizing logic circuits from formal specifications, or generating privacy-aware representations in functional data spaces.
1. Construction of Synthetic Functions for PDE-Based Learning
In PDE operator learning, synthetic functions provide scalable, error-free training pairs by inverting the conventional numerical workflow. Rather than drawing right-hand-sides from a distribution and solving numerically, the "backward data generation" method samples candidate solutions directly from the function space (e.g., ), then computes analytically or spectrally. For example, on with zero-Dirichlet boundaries, one expresses as finite linear combinations of normalized Laplacian eigenfunctions: with frequencies and variances selected to control smoothness and coverage. The corresponding is obtained by differentiating —ensuring that every pair 0 exactly satisfies the target PDE in the continuum. Thus, synthetic functions enable rapid, large-scale construction of consistent datasets for operator learning architectures, with no forward solve or discretization error except at the grid-representation phase (Hasani et al., 2024).
2. Boolean and Presburger Functional Synthesis
In Boolean and Presburger logics, synthetic functions arise from the process known as "functional synthesis": extracting explicit Skolem functions from declarative, relational input-output specifications.
- For Boolean specifications in CNF, synthesis derives a function 1 guaranteed to satisfy the specification on all feasible inputs (Chakraborty et al., 2018, Akshay et al., 2019, Raja et al., 2022). Approaches include input-output decomposition, decision-list construction from maximal falsifiable/satisfiable clause subsets, and logic-circuit compilation into synthesis-friendly normal forms (e.g., SynNNF).
- In Presburger arithmetic, one seeks Skolem circuits 2 representable as Presburger circuits, whose gates implement affine, maximum, conditional, and division operations, consistent with syntactic or semantic normal forms (PSyNF, PSySyNF) (Akshay et al., 10 Aug 2025).
The synthesis processes yield synthetic functions that serve not only as correct solutions to logical constraints, but also as compact, explicit programs or circuits usable in hardware, software, verification, and controller design.
3. Methods for Functional Data Synthesis
In functional data analysis (FDA), synthetic functions generate representative, privacy-preserving curves from original high-dimensional function datasets. A paramount application is the synthesis of GPS trajectory functions 3 (normalized coordinates and time). Here, synthetic functions are generated by:
- Mapping all original curves to their square-root velocity functions (SRVFs),
- Identifying the 4 nearest neighbors for a given trajectory in SRVF metric space,
- Averaging the SRVFs and reconstructing the synthetic function via an inverse transform: 5 where 6 is the mean SRVF over the neighborhood.
Privacy is quantitatively maintained by ensuring each 7 is not too close to any 8 in the original dataset, and utility is assessed by metrics such as mean squared error or Hausdorff distance relative to the originals (Burzacchi et al., 2024).
4. Complexity, Normal Forms, and Expressive Power
Synthetic function generation in logic-based synthesis is fundamentally constrained by computational complexity:
- Boolean functional synthesis is 9-hard in the general case, but becomes tractable under favorable clause-graph structures or input/output decompositions (Chakraborty et al., 2018).
- Presburger functional synthesis has tight EXPTIME bounds, with the minimal Skolem circuit size for some specifications necessarily exponential (Akshay et al., 10 Aug 2025).
Specialized normal forms such as SynNNF (Boolean), PSyNF (Presburger, semantic), and PSySyNF (Presburger, syntactic) serve dual roles as succinct representations and as algorithms for poly-time synthesis whenever the function admits compact Skolem circuits (Akshay et al., 2019, Akshay et al., 10 Aug 2025). Conversion to such normal forms, though sometimes exponentially expensive, is crucial for efficient synthesis and practical deployment.
| Method | Domain | Normal Form | Synthesis Complexity | Explicit Representation |
|---|---|---|---|---|
| Boolean CNF | 0 | SynNNF, Decision List | Poly/Exp (structure-dependent) | DAG circuit, logic formula |
| Presburger (PA) | 1 | PSyNF, PSySyNF | Exp/Poly (normal form) | Presburger circuit |
| PDE Backward Data | Function spaces | Eigenbasis expansion | Linear in grid size | Spectral or analytic function |
| FDA (GPS) | Curves | SRVF neighborhood | Poly (K-NN, SRVF ops) | Discrete curve (grid rep.) |
5. Applications and Practical Implications
Synthetic functions underpin a diverse array of practical applications:
- Neural operator training: Massive, consistent PDE datasets for operator learning (e.g., Fourier Neural Operator, DeepONet) (Hasani et al., 2024).
- Logic circuit and controller synthesis: Hardware/software solutions correct by construction, minimizing resource use via bounded-size constraints (Raja et al., 2022).
- Functional data privacy: Privacy-centric surrogate datasets for mobility analysis, behavioral science, and beyond (Burzacchi et al., 2024).
- Mathematical benchmarking: Controlled settings for algorithm evaluation, error analysis, and uncertainty quantification.
In all these domains, the ability to tailor the distribution, coverage, and complexity of synthetic functions to the application context enables both theoretical investigation and empirically grounded development.
6. Limitations and Extensions
Despite their advantages, synthetic functions exhibit limitations:
- For logic-based synthesis, normal-form compilation may be exponentially expensive for certain specifications, and no normal form is universally optimal; PSyNF is exponentially more succinct than PSySyNF in some cases (Akshay et al., 10 Aug 2025).
- In neural operator or functional data scenarios, the fidelity of synthetic data relies on the completeness of the function space basis and the representational adequacy of the coefficient or SRVF distributions. Synthetic datasets derived from overly constrained or non-representative priors may hinder generalization or miss critical out-of-distribution phenomena (Hasani et al., 2024, Burzacchi et al., 2024).
Future research directions involve extending synthesis methodologies to richer function spaces, accommodating parameterized families and uncertainty, and optimizing utility/privacy tradeoffs in high-dimensional function settings.
7. Summary and Context
Synthetic functions, as instantiated in PDE learning, logic synthesis, and FDA privacy, are central to modern computational science and engineering. Their generation leverages harmonic, combinatorial, and optimization-theoretic constructions, and is governed by complexity-theoretic as well as practical desiderata. Developing minimax-optimal normal forms, scalable basis-generation, and privacy-preserving function synthesis remains an active and foundational area (Hasani et al., 2024, Chakraborty et al., 2018, Akshay et al., 10 Aug 2025, Akshay et al., 2019, Burzacchi et al., 2024, Raja et al., 2022).