Over++: High-Performance C++ Frameworks
- Over++ is a suite of high-performance C++ frameworks designed for real-time robotics, parallel computing, and multiphysics simulation through extensible object-oriented design.
- The frameworks leverage advanced C++ techniques and automated Python API generation to enable efficient runtime performance and rapid prototyping across diverse engineering applications.
- Demonstrated through case studies, Over++ achieves low-latency control, near-linear parallel scaling, and scalable multiphysics simulations, making it ideal for scientific and engineering computation.
Over++ is a term used for multiple high-performance C++ frameworks, each distinctly recognized by targeted domains: (1) the Over++ (“o80”) middleware for real-time robotics and hybrid control, (2) Over++ as a parallel C++ object-oriented programming model and runtime, and (3) Over++/BROOMStyx for multiphysics numerical simulation. Despite divergent application contexts, these frameworks share a design philosophy of leveraging advanced C++ object orientation to deliver extensible, high-productivity tools for scientific and engineering computation.
1. Architecture and Core Design Patterns
1.1 Over++ (o80) for Real-Time Robotic Control
Over++/o80 is a C++ middleware facilitating real-time critical systems via a shared-memory model and flexible command framework. Its architecture is bifurcated into the backend (real-time process, C++) and frontend (user-level API with auto-generated Python bindings). The backend is specialized by templated driver classes for hardware or simulation; at each control-loop iteration, it reads commands from shared memory, performs interpolation (w.r.t. duration, speed, iteration count) for actuator targets, and writes out system state. The frontend exposes a high-level Python API for sending commands (blocking or nonblocking) and reading states (past, current, or future), supporting asynchronous access by multiple consumers (Berenz et al., 2023).
1.2 Over++ as a Parallel C++ Framework
Over++ defines an object-oriented model where every C++ object is treated as an autonomous parallel computing unit ("virtual host") potentially mapped to separate nodes. All pointers are generalized to include host ID and object ID, and remote method calls are transparently managed: arguments are serialized and sent as IR instructions, computation proceeds asynchronously, and a lightweight “future” mechanism enforces causal dependency. Compound or barrier statements control collective synchronization. A compiler parses ordinary C++ (with minimal extensions), lowering methods and pointers into an intermediate representation for the runtime, which launches one agent per virtual host using MPI and provides a threading model for dispatch and execution (Givelberg, 2018).
1.3 Over++/BROOMStyx for Multiphysics Simulation
Designed to support monolithic multiphysics coupling, Over++/BROOMStyx is organized as both a library and framework. The architecture adheres to classical object-oriented principles, using singletons for all major managers (Domain, DOF, Material, Numerics, Solution, Output). Discretization strategies (FE, FV, phase-field, etc.) are encapsulated as derived Numerics classes; Materials, SolutionMethods, SparseMatrices, LinearSolvers, OutputWriters, and OutputQuantities are similarly abstracted. Runtime control and configuration are delivered via an ASCII input file interpreted at startup, minimizing the necessity of boilerplate inheritance or recompilation (Sargado, 2019).
2. Synchronization, Parallelism, and Messaging
2.1 o80/Over++ Blocking, Nonblocking, and Bursting Modes
The o80 system implements both traditional blocking command mode and a novel bursting mode ("user-driven loop"). In blocking mode, the backend runs continuously at defined control frequency (e.g., 500 Hz); commands issued by the frontend are processed sequentially, with interpolation over the requested duration anchoring real-time determinism. The frontend blocks until the completion acknowledgment is set in shared memory.
Bursting mode disables continuous execution; instead, the backend waits for explicit burst requests (e.g., frontend.burst(B)), where it executes a batch of iterations as fast as hardware permits. This enables precise external control of the backend's real-time loop from Python, critical for synchronization of physical and simulated hardware, especially in reinforcement learning and hybrid settings. The ratio aligns simulator steps with hardware frequency (Berenz et al., 2023).
2.2 Over++ Parallel C++ Remote Execution
In the Over++ parallel C++ model, method invocation on remote objects is transparently managed: calls are packed into IR messages, executed by the target object's agent, and results flow back via futures/guards, enabling computation/communication overlap. Compound blocks and loop bodies act as synchronization barriers, maximizing asynchronous execution. The runtime, implemented atop MPI (prototype), uses dedicated threads for dispatch and a thread pool for workload execution, with minimal scheduling overhead (Givelberg, 2018).
3. Numerical and Physical Modeling Paradigms
3.1 Multiphysics Coupling with Over++/BROOMStyx
Over++/BROOMStyx enables the monolithic assembly of systems with fundamentally distinct discretizations and PDE types. All problem components contribute to a global system of equations of the form
FE-, FV-, and phase-field contributions are encapsulated in Numerics, and cell-wise assembly is parallelized via OpenMP. Example: implementation of Biot poroelasticity combines FE for displacement and FV for pressure, with the global system assembled by aggregating cell-wise COO-formatted local stencils.
3.2 Plugin Extension and Material Modeling
Physics plugins are created by subclassing abstract Numerics or Material classes and registering them through macros (e.g., registerBroomstyxObject). New physics, constitutive models, and output functionalities are thus incorporated with minimal inheritance and coding footprint, enabling extensibility and rapid prototyping (Sargado, 2019).
4. Configurability, User Interface, and Frontend Automation
4.1 o80/Over++ Python Frontends and Shared Memory
o80 generates Python modules from C++ driver definitions. Users interact with high-level APIs (send, send_nonblocking, burst, get, etc.), abstracting away C++ details and shared memory management. Multiple frontends can concurrently log, visualize, and control, illustrating the system's experiment-centric, multi-consumer design. Real-time experiments combining hardware and simulators (e.g., MuJoCo) along with RL loops can be expressed as single Python scripts, facilitating seamless synchronization and dataflow (Berenz et al., 2023).
4.2 Over++/BROOMStyx Input File Configuration
In Over++/BROOMStyx, all core simulation parameters and topology—mesh reading, discretizations, material models, DOF allocations, solver tolerances—are specified in a single ASCII input file. The system initializes all required objects and managers based on these specifications, without recompilation. This approach supports dynamic runtime composability and rapid experimentation with new physics modules or domain configurations (Sargado, 2019).
5. Performance Characteristics and Representative Applications
5.1 Quantitative Outcomes and Resource Efficiency
- o80 achieves real-time control at 500 Hz (s jitter, Linux RT-Preempt) with sub-millisecond Python frontend overhead and overall CPU usage below 30% for combined real+sim+RL workflows on quad-core hosts. Simulator bursts (10 iterations) complete in ≈0.02 s wall time (Berenz et al., 2023).
- Over++ parallel C++ delivers dramatic code size reductions (e.g., 3D 64 TB FFT: 15,000 lines in C++/MPI vs 500 in Over++; distributed BFS: 2,000 vs 200), and empirical development time reduction by factors of 10–20×. Scalability studies display near-linear strong scaling to 16 hosts, with negligible framework scheduling cost (Givelberg, 2018).
- Over++/BROOMStyx validates against analytical benchmarks (composite torsion, Mandel’s problem in Biot poroelasticity) and demonstrates scalable assembly/solve performance via OpenMP and PARDISO GPU/CUDA backends. Example: 112,806 nodes, ≈295,000 DOFs in phase-field fracture solved by staggered minimization; single outer iteration in parallel takes ≈0.9 s on a 6-core/12-thread CPU (Sargado, 2019).
5.2 Application Case Studies
- o80/Over++ orchestrated a hybrid HYSR table-tennis experiment involving a pneumatic arm (hardware and simulator backends), RL control, and parallel logging/visualization, enabling efficient dual-modality data collection and RL training (Berenz et al., 2023).
- Over++/BROOMStyx monolithic runs include composite torsion analysis, hybrid FE–FV poroelasticity without spurious oscillations, and complex phase-field crack propagation with coupled damage models (Sargado, 2019).
6. Extensibility and Future Directions
Over++ frameworks across domains provide mechanisms for seamless extension: C++ class inheritance with registration macros (BROOMStyx), Python API generation from drivers (o80), and automatic or annotation-driven parallelization of C++ code (parallel Over++). Planned advancements include migration to dedicated NoC backends, hardware-supported IR scheduling, and further automation of load balancing and dynamic domain decomposition. For Over++/BROOMStyx, a distributed-memory MPI implementation is in development, with current core classes already isolating necessary data-exchange points for mesh partitioning and ghost-cell management (Givelberg, 2018, Sargado, 2019).
7. Context in Broader Software and Computational Ecosystem
Over++ frameworks address flexibility, extensibility, and productivity deficiencies observed in legacy toolchains (e.g., ROS2, operator-splitting multiphysics platforms, or verbose parallel C++/MPI). By unifying object-oriented design, meta-programming, and automation (frontend generation or code parallelization), they provide model architectures for future parallel languages, hybrid real/sim environments, and composable, physics-agnostic simulation labs. A plausible implication is that such design strategies can inform emerging parallel OS kernels, where processes are objects and global scheduling/dataflow are managed at the IR or network level (Givelberg, 2018).