SciToolEval Benchmark Overview

Updated 2 November 2025

The benchmark framework systematically compares 12 tools across 7 programming environments for structural identifiability and observability analysis.
It employs 25 case studies based on varied dynamic models to assess tool correctness, computational performance, and practical feasibility.
Findings indicate modern tools in environments like Julia and Maple outperform legacy options for both local and global identifiability challenges.

The SciToolEval Benchmark is a systematic framework for evaluating software tools performing a priori (structural) identifiability and observability analysis of dynamic models. These analyses are crucial in fields such as systems biology, where the inferability of parameters and states from observed model outputs must be established prior to model calibration. Structural identifiability delineates the conditions under which model parameters and states can be uniquely determined from input–output measurements over finite time, while observability addresses the ability to infer system states similarly. SciToolEval provides a rigorous, reproducible approach to compare tool capabilities, correctness, computational efficiency, and usability across a diverse spectrum of model types and complexities (Barreiro et al., 2022).

1. Methodological Framework

SciToolEval employs a comprehensive benchmarking methodology:

Tool Selection: 12 tools implemented across 7 programming environments (MATLAB, Maple, Mathematica, Julia, Python, Reduce, Maxima) are systematically evaluated. These include STRIKE-GOLDD (FISPO/ProbObsTest), RORC-DF, GenSSI2 (MATLAB); ObservabilityTest, SIAN (Maple); EAR (Mathematica); SIAN, StructuralIdentifiability.jl (Julia); StrikePy (Python); DAISY (Reduce); COMBOS (Maxima, web).
Benchmark Set: 25 case studies derived from 21 models featuring a range of states (2–25), parameters (3–29), and output structures. The set covers both rational (polynomial, rational ODEs), non-rational, and models with unknown inputs.
Evaluation Criteria: Tools are compared based on correctness of identifiability analysis, computational performance (runtime, resource usage), feasibility on varied model sizes, accessibility (installation, documentation), and feature support (symmetries, reparameterizations, unknown inputs).
Technical Reproducibility: All code and case studies are openly available, ensuring transparency and reproducibility (https://github.com/Xabo-RB/Benchmarking_files).

2. Mathematical Foundations

Structural identifiability and observability are rigorously defined:

Structural Identifiability: For parameter vector $\theta$ in the model, identifiability holds if output $y(t)$ given inputs $u(t)$ over finite time uniquely determines $\theta$ .
Structural Local Identifiability (SLI):

$\forall\, \text{almost all } \theta^*,\, \exists\, \mathcal{N}(\theta^*): y(t, \tilde{\theta}) = y(t, \theta^*) \implies \tilde{\theta}_i = \theta^*_i$

Structural Global Identifiability (SGI): Uniqueness is required globally across parameter space.
Observability: State $x_i$ is observable if its trajectory can be inferred from outputs.

Methodologically, two major classes of mathematical techniques are employed:

Differential Geometry Approach (for SLI): Utilizes the Observability Rank Condition (ORC). At $x_0$ ,

$\operatorname{rank}(O(x_0)) = n$

where $O(x_0)$ is constructed from Lie derivatives of the output.

Differential Algebra Methods (for SGI): Relies on computing implicit input-output relations via characteristic set or Gröbner basis algorithms, and checks for parameter injectivity:

$\Phi = \phi(\theta, u, \dot{u}, ..., y, \dot{y}, ...) = 0, \quad \det \frac{\partial \Phi}{\partial \theta} \neq 0$

These fundamental principles underpin tool implementations and inform their applicability to various model classes.

3. Comparative Performance Analysis

SciToolEval's results reveal nuanced tool strengths and limitations:

Model Class	Tools Applicable	Notable Limitations
Rational, no unknown input	All	Varying runtime, size limitations
Non-rational	STRIKE-GOLDD (FISPO), GenSSI	Most tools inapplicable
Unknown inputs	RORC-DF, STRIKE-GOLDD (FISPO/ProbObsTest), StrikePy	Others unable to analyze

Feasibility: All tools can analyze small, rational models; computational bottlenecks and errors appear in larger or non-rational cases.
Correctness: SIAN (Maple/Julia), ObservabilityTest (Maple), RORC-DF (MATLAB), StructuralIdentifiability.jl (Julia), and STRIKE-GOLDD (FISPO) consistently yield accurate diagnoses. GenSSI and DAISY (legacy tools) are less robust for large or complex models.
Efficiency: ObservabilityTest (Maple) is the fastest for local analysis, followed by EAR (Mathematica) and STRIKE-GOLDD (ProbObsTest, MATLAB). For global analysis, SIAN and StructuralIdentifiability (Julia) excel, especially on large models.
Resource Demand: Many older tools (COMBOS, DAISY, StrikePy) fail or timeout on medium/large cases within a 48-hour computation window.

4. Tool Selection Guidelines

The SciToolEval framework proposes principled guidelines:

Model Type
- Rational, no unknown input: Most tools suitable.
- Non-rational: STRIKE-GOLDD (FISPO, MATLAB), GenSSI (MATLAB) required.
- Unknown inputs: RORC-DF, STRIKE-GOLDD (FISPO/ProbObsTest), StrikePy.
Analyis Objective
- Global identifiability: SIAN (Maple/Julia), StructuralIdentifiability.jl (Julia).
- Local identifiability (rational models): ObservabilityTest (Maple), EAR (Mathematica), STRIKE-GOLDD (ProbObsTest, MATLAB).
Programming Environment
- Julia: SIAN (Julia), StructuralIdentifiability.jl.
- Maple: ObservabilityTest, SIAN (Maple).
- MATLAB: STRIKE-GOLDD (FISPO/ProbObsTest), GenSSI2.
- Others: Check for feasibility and correctness.
Feature Requirements
- Symmetries, identifiable reparameterizations: STRIKE-GOLDD (MATLAB), EAR (Mathematica), GenSSI (MATLAB), COMBOS (web/Maxima).
Computational Constraints
- For large models, select newer, efficient tools (SIAN, StructuralIdentifiability).
- Avoid web/legacy tools for large problem sizes.

5. Limitations and Future Directions

Analysis of SciToolEval benchmarking highlights several open challenges and developmental opportunities:

Scalability: Handling models with hundreds or thousands of parameters remains an unsolved problem across all surveyed tools.
Model Generality: There is a need for identifiability tools supporting PDEs, stochastic systems, and complex model archetypes.
Programming Language Evolution: Julia and Python are prioritized for new development to enhance computational performance and usability.
Feature Enhancement: Unified support for non-rational models, unknown inputs, initial conditions, and model type conversions.
User Experience: Improved error diagnostics, comprehensive documentation, and development of GUI/web-based applications are ongoing needs.
Integration: Deeper integration with model construction and calibration pipelines in open-source ecosystems is recommended.

6. Technical Reference: Benchmark Model and Algorithmic Formulation

The general model structure employed: $\Sigma = \left\{ \begin{array}{l} \dot{x} = f(t, x, u, \theta, w) \ y(t) = h(x, u, \theta, w) \ x(0) = x^0(\theta) \end{array} \right.$

Observability Rank Condition (ORC) for local analysis: $\text{If } \operatorname{rank}(O(x_0)) = n, \text{ then locally weak observability at } x_0$

Differential algebraic injectivity (for global analysis): $\Phi = \phi(\theta, u, \dot{u}, \ldots, y, \dot{y}, \ldots) = 0, \quad \det \frac{\partial \Phi}{\partial \theta} \neq 0$

These formulations serve as the basis for tool implementation, spanning both differential-geometric and algebraic approaches.

7. Summary and Impact

SciToolEval provides a robust empirical baseline for tool selection and methodological assessment in structural identifiability and observability analysis. Benchmark findings demonstrate that contemporary tools (SIAN, StructuralIdentifiability, ObservabilityTest) offer superior correctness and computational performance relative to legacy packages, particularly for global identifiability tasks. There is no universally optimal tool; suitability is contingent on model features, intended analysis, and environment. The benchmark underscores the necessity of automated, scalable, generalizable solutions, with Julia emerging as a promising development platform. All codebases and test cases are available for direct access and reproduction, facilitating further research and software development in the field (Barreiro et al., 2022).

PDF Markdown Chat (Pro)

References (1)

Benchmarking tools for a priori identifiability analysis (2022)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to SciToolEval Benchmark.