Multilevel Process Mining
- Multilevel Process Mining is a set of methods that model and analyze event-driven processes across multiple abstraction layers, enabling comprehensible insights from complex logs.
- It employs hierarchical abstraction, clustering techniques, and object-centric approaches to derive precise, reversible process models from fine-grained data.
- Empirical evaluations demonstrate that MLPM maintains high fitness and precision while reducing model complexity, facilitating scalable, interactive process analysis.
Multilevel process mining (MLPM) encompasses a spectrum of methods for modeling, analyzing, and visualizing event-driven processes at multiple levels of abstraction, structural granularity, or organizational entity. Its principal aim is to provide comprehensible semantics from complex, fine-grained event logs by synthesizing process models that retain fitness and precision while offering compact, interpretable representations. Research in this field spans formal Petri net frameworks, hierarchical abstraction via activity clustering, flexible tree-based log abstraction, integrated conceptual-model architectures, aggregation techniques for cross-domain analysis, object-centric event algebra, and comparative approaches as in IBM's industrial-scale MLPM. The field is unified by the challenge of scaling process discovery, conformance, and exploration to real-world logs with high-dimensionality, concurrency, and rich object interdependencies.
1. Formal Models for Multilevel Process Discovery
Multilevel process models generalize classical Petri net and workflow net (WF-net) formalisms into nested or compositional structures. The two-level hierarchical WF-net introduced by Begicheva et al. defines a high-level WF-net , whose transitions correspond bijectively to subprocess abstractions, each refined by a lower-level net over partitioned activity sets (Begicheva et al., 2023).
Formally, a hierarchical workflow net (HWF-net) is , where:
- is the high-level net over subprocess names,
- each is a WF-net for over activities ,
- maps each high-level activity name to its refinement net.
A run of is constructed by interleaving firings for high-level transitions with executions of their corresponding subnets. The observable semantics of the flattened net equals that of , ensuring flattening preserves behavioral traces.
The FlexHMiner approach generalizes the hierarchy from two to arbitrarily deep trees by constructing a Flexible Activity Tree (FAT) and recursively assigning each node its sublog and discovered subprocess model (Lu et al., 2020).
Object-centric approaches for multilevel mining formalize event logs as OCELs (Object-Centric Event Logs), in which events relate to multiple object instances and types. Relevant process models—object-centric Petri nets or object-centric directly follows graphs (OC-DFGs)—are constructed by transforming the OCEL with operations like drill-down and unfold (Khayatbashi et al., 30 Nov 2024, Benzin et al., 27 Mar 2024).
IBM's Multilevel Process Mining formalism dynamically composes cases from relational event logs by following an ordered traversal of entity types (e.g., Order Receipt Invoice), bridging across levels, and normalizing overlapping events to avoid duplication (Ronzoni et al., 3 Dec 2025).
2. Event Abstraction and Hierarchical Structuring Techniques
Methods for abstraction are diverse:
- Clustering-based abstraction: Activity labels are partitioned using clustering algorithms (e.g., k-means, hierarchical agglomerative, DBSCAN) on feature vectors of activity context, or by manual rules reflecting domain semantics (Begicheva et al., 2023).
- Flexible Activity Tree (FAT): Used in FlexHMiner to structure activities hierarchically, via domain knowledge, random clustering, or flat (trivial) hierarchies. Each internal node defines an aggregation level, and upon abstraction, its behavioral sublog is isolated and modeled (Lu et al., 2020).
INEXA introduces an interactive aggregation technique that records each abstraction operation as an object in the log. Analysts may abstract a SESE (single-entry, single-exit) fragment, entire artifacts, or lifecycles, maintaining an object-centric trace of applied abstractions for explainability and enabling roll-back or drill-down on demand (Benzin et al., 27 Mar 2024).
Object-centric granularity operations—drill-down (split object types by attributes), roll-up (merge child types back), unfold (split event types by object context), and fold (reverse of unfold)—form the basis of navigable multilevel object-centric analysis (Khayatbashi et al., 30 Nov 2024).
IBM MLPM relies on bridge events in multivalued logs, recursively composing multilevel cases by traversing object chains and normalizing duplicate events, thus abstracting redundant, cross-linked events into a unified behavioral model (Ronzoni et al., 3 Dec 2025).
3. Algorithmic Frameworks and Practical Pipelines
Multilevel discovery algorithms must address both abstraction and refinement, handling concurrency, loops, and event generalization.
In Begicheva et al., Algorithm takes as input: a low-level log, partitioned activity clusters, and a flat discovery algorithm with perfect fitness. It iteratively detects and folds loops, maps clusters to high-level transitions, and constructs a two-level HWF-net, guaranteeing that the flattened variant fits the input log exactly (conformance by construction) (Begicheva et al., 2023).
FlexHMiner abstracts logs via recursive upward mapping in the FAT; each subprocess node's traces are projected, start/end-labeled, and modeled independently. Model discovery leverages any available off-the-shelf miner—most commonly Inductive Miner or Split Miner—and each discovered fragment is validated at its own abstraction level (Lu et al., 2020).
INEXA formalizes all abstraction steps as reversible log modifications: every aggregation (sequence, XOR, artifact) is logged as an abstraction object. The abstracted model is produced via overlaying these histories onto the original discovered net, enabling interactive model exploration with perfect traceability (Benzin et al., 27 Mar 2024).
Object-centric operations in (Khayatbashi et al., 30 Nov 2024) are implemented in an open-source Python library, operating on in-memory OCELs by relabeling object/event types. Drill-down and unfold are succinct set-theoretic operations mapping objects and events to split or joined types, with precise algebraic reversibility.
IBM MLPM case composition proceeds with reverse-ordered traversal of entity types, sequential expansion of cases by bridge events, duplicate event normalization, and case-centric process discovery, optimizing both case extraction and statistical summarization for the multilevel log (Ronzoni et al., 3 Dec 2025).
4. Evaluation, Metrics, and Empirical Benchmarks
Rigorous quantitative evaluation underpins the effectiveness of multilevel mining.
- Fitness and Precision: Standard measures such as alignment-based fitness (fraction of log traces explained) and ETC-align precision (fraction of allowed traces present in the log) are computed at each level. Hierarchical approaches frequently report both flat and projected (abstracted) metrics (Begicheva et al., 2023, Lu et al., 2020, Khayatbashi et al., 30 Nov 2024).
- Complexity: The size of models (sum of places and transitions) and Control-Flow Complexity (CFC) are preferred measures of comprehensibility; CFC measures the frequency of branching/joining nodes. Hierarchical models dramatically reduce CFC per level versus flat models (Lu et al., 2020).
- Empirical Results: On public BPI Challenge and real industrial logs, hierarchical/mined models consistently show high fitness (0.97–0.98) and substantially reduced model complexity. On object-centric logs, judicious application of drill-down and unfold increases both fitness and precision by 10–20 percentage points, revealing fine-grained behavioral patterns hidden in coarse models (Khayatbashi et al., 30 Nov 2024).
- Explainability: Interactive abstraction logging in INEXA enables full justification for all model aggregations. Roll-back to more detailed model views is instantaneous, as overlays merely replay abstraction history on the original net (Benzin et al., 27 Mar 2024).
5. Comparative Analysis: Object-Centric vs. Multilevel Approaches
Object-centric process mining (OCPM) and MLPM are distinct yet complementary paradigms.
| Aspect | Multilevel PM (e.g., IBM) | Object-Centric PM |
|---|---|---|
| Case Notion | Dynamic, multilevel case composition | No fixed case; multi-object |
| Data Structure | Flat, multi-ID logs + mapping order | Relational (OCEL) |
| Model Type | Unified, colored graphs | Multi-object Petri nets/DFGs |
| Conformance Checking | Case-level; entire case flagged | Fine-grained per event-object |
| Scalability | Moderate; case size grows with cross-links | Variable; high for large relations |
| Subprocess Insight | End-to-end, object-chained | Strong for cross-object flows |
| Industrial Adoption | IBM, high | OCEL, growing academic presence |
MLPM excels with clearly ordered object levels and strong bridge semantics (e.g., end-to-end O2C or P2P analytics), allowing business users to visualize unified process flow. OCPM is more effective for ad hoc analysis in many-to-many or loosely coupled domains, supporting multiple simultaneous perspectives and granular relationships (Ronzoni et al., 3 Dec 2025).
Recent industrial tools (IBM's Organizational Mining) seek to integrate both paradigms by combining MLPM's case logic with OCEL's relational back end, supporting path-finding across multi-table event data.
6. Applications, Best Practices, and Limitations
Multilevel techniques are applicable wherever process logs contain high event-activity dimensionality, complex object interactions, or require expert-guided abstraction for analysis:
- Large-scale healthcare (ED, inpatient care) (Al-Fedaghi, 2021)
- Manufacturing process orchestration (Benzin et al., 27 Mar 2024)
- Learning management systems, cross-course analysis (Hildebrandt et al., 4 Sep 2024, Khayatbashi et al., 30 Nov 2024)
- Service management and supply-chain analytics (Ronzoni et al., 3 Dec 2025)
Best practices include:
- For clustering-based abstraction, leverage embedded domain semantics or robust unsupervised clustering; avoid excessively large or semantically empty clusters.
- Subprocess decomposition (e.g., in FlexHMiner) should align with natural concurrency and interleaving in the domain (Lu et al., 2020).
- When modeling via object-centric frameworks, minimal drill-down should be applied to prevent log fragmentation; metrics should be validated after each operation.
- For model-first approaches (conceptual modeling), the entire system—static structure, event regions, dynamic behavior, and monitoring—should be integrated for full traceability and self-consistency (Al-Fedaghi, 2021).
Limiting factors include the need for perfect-fitness logs in many abstraction techniques, possible computational bottlenecks in high cross-link scenarios, challenging alignment for multi-instance subprocesses, and the current tooling ecosystem—many advanced features remain in recent academic libraries (Begicheva et al., 2023, Benzin et al., 27 Mar 2024, Khayatbashi et al., 30 Nov 2024). Integrated, data-driven clustering and hyperparameter optimization in event abstraction remain open research directions.
7. Future Directions and Open Research Challenges
Key open problems in multilevel process mining include:
- Endogenous, data-driven optimization of clustering that balances conformance, precision, and model simplicity (Begicheva et al., 2023).
- Extending hierarchical abstraction beyond two levels to support arbitrary nested structures.
- Incorporation of data attributes and performance criteria into abstraction algorithms (e.g., expected durations, outcome-based merges) (Benzin et al., 27 Mar 2024).
- Efficient, scalable handling of noisy or imperfect real-world logs, complemented by robust object-centric alignments.
- Development of hyperparameter tuning strategies to optimize the abstraction-refinement tradeoff.
The field is rapidly advancing towards comprehensive, explainable, and scalable methodologies able to handle enterprise-wide process analytics, with increasing convergence between multilevel, object-centric, and organizational mining frameworks (Ronzoni et al., 3 Dec 2025).