Multilevel Process Mining

Updated 10 December 2025

Multilevel Process Mining is a set of methods that model and analyze event-driven processes across multiple abstraction layers, enabling comprehensible insights from complex logs.
It employs hierarchical abstraction, clustering techniques, and object-centric approaches to derive precise, reversible process models from fine-grained data.
Empirical evaluations demonstrate that MLPM maintains high fitness and precision while reducing model complexity, facilitating scalable, interactive process analysis.

Multilevel process mining (MLPM) encompasses a spectrum of methods for modeling, analyzing, and visualizing event-driven processes at multiple levels of abstraction, structural granularity, or organizational entity. Its principal aim is to provide comprehensible semantics from complex, fine-grained event logs by synthesizing process models that retain fitness and precision while offering compact, interpretable representations. Research in this field spans formal Petri net frameworks, hierarchical abstraction via activity clustering, flexible tree-based log abstraction, integrated conceptual-model architectures, aggregation techniques for cross-domain analysis, object-centric event algebra, and comparative approaches as in IBM's industrial-scale MLPM. The field is unified by the challenge of scaling process discovery, conformance, and exploration to real-world logs with high-dimensionality, concurrency, and rich object interdependencies.

1. Formal Models for Multilevel Process Discovery

Multilevel process models generalize classical Petri net and workflow net (WF-net) formalisms into nested or compositional structures. The two-level hierarchical WF-net introduced by Begicheva et al. defines a high-level WF-net $\tilde{N}$ , whose transitions correspond bijectively to subprocess abstractions, each refined by a lower-level net $N_i$ over partitioned activity sets $A_i$ (Begicheva et al., 2023).

Formally, a hierarchical workflow net (HWF-net) is $\mathbb{N} = (\tilde{N}, N_1, ..., N_k, \ell)$ , where:

$\tilde{N}$ is the high-level net over subprocess names,
each $N_i$ is a WF-net for $\alpha_i$ over activities $A_i$ ,
$\ell$ maps each high-level activity name to its refinement net.

A run of $\mathbb{N}$ is constructed by interleaving firings for high-level transitions with executions of their corresponding subnets. The observable semantics of the flattened net equals that of $\mathbb{N}$ , ensuring flattening preserves behavioral traces.

The FlexHMiner approach generalizes the hierarchy from two to arbitrarily deep trees by constructing a Flexible Activity Tree (FAT) and recursively assigning each node its sublog and discovered subprocess model (Lu et al., 2020).

Object-centric approaches for multilevel mining formalize event logs as OCELs (Object-Centric Event Logs), in which events relate to multiple object instances and types. Relevant process models—object-centric Petri nets or object-centric directly follows graphs (OC-DFGs)—are constructed by transforming the OCEL with operations like drill-down and unfold (Khayatbashi et al., 2024, Benzin et al., 2024).

IBM's Multilevel Process Mining formalism dynamically composes cases from relational event logs by following an ordered traversal of entity types (e.g., Order $\to$ Receipt $\to$ Invoice), bridging across levels, and normalizing overlapping events to avoid duplication (Ronzoni et al., 3 Dec 2025).

2. Event Abstraction and Hierarchical Structuring Techniques

Methods for abstraction are diverse:

Clustering-based abstraction: Activity labels are partitioned using clustering algorithms (e.g., k-means, hierarchical agglomerative, DBSCAN) on feature vectors of activity context, or by manual rules reflecting domain semantics (Begicheva et al., 2023).
Flexible Activity Tree (FAT): Used in FlexHMiner to structure activities hierarchically, via domain knowledge, random clustering, or flat (trivial) hierarchies. Each internal node defines an aggregation level, and upon abstraction, its behavioral sublog is isolated and modeled (Lu et al., 2020).

INEXA introduces an interactive aggregation technique that records each abstraction operation as an object in the log. Analysts may abstract a SESE (single-entry, single-exit) fragment, entire artifacts, or lifecycles, maintaining an object-centric trace of applied abstractions for explainability and enabling roll-back or drill-down on demand (Benzin et al., 2024).

Object-centric granularity operations—drill-down (split object types by attributes), roll-up (merge child types back), unfold (split event types by object context), and fold (reverse of unfold)—form the basis of navigable multilevel object-centric analysis (Khayatbashi et al., 2024).

IBM MLPM relies on bridge events in multivalued logs, recursively composing multilevel cases by traversing object chains and normalizing duplicate events, thus abstracting redundant, cross-linked events into a unified behavioral model (Ronzoni et al., 3 Dec 2025).

3. Algorithmic Frameworks and Practical Pipelines

Multilevel discovery algorithms must address both abstraction and refinement, handling concurrency, loops, and event generalization.

In Begicheva et al., Algorithm $\mathfrak{A}(\mathfrak{D})$ takes as input: a low-level log, partitioned activity clusters, and a flat discovery algorithm with perfect fitness. It iteratively detects and folds loops, maps clusters to high-level transitions, and constructs a two-level HWF-net, guaranteeing that the flattened variant fits the input log exactly (conformance by construction) (Begicheva et al., 2023).

FlexHMiner abstracts logs via recursive upward mapping in the FAT; each subprocess node's traces are projected, start/end-labeled, and modeled independently. Model discovery leverages any available off-the-shelf miner—most commonly Inductive Miner or Split Miner—and each discovered fragment is validated at its own abstraction level (Lu et al., 2020).

INEXA formalizes all abstraction steps as reversible log modifications: every aggregation (sequence, XOR, artifact) is logged as an abstraction object. The abstracted model is produced via overlaying these histories onto the original discovered net, enabling interactive model exploration with perfect traceability (Benzin et al., 2024).

Object-centric operations in (Khayatbashi et al., 2024) are implemented in an open-source Python library, operating on in-memory OCELs by relabeling object/event types. Drill-down and unfold are succinct set-theoretic operations mapping objects and events to split or joined types, with precise algebraic reversibility.

IBM MLPM case composition proceeds with reverse-ordered traversal of entity types, sequential expansion of cases by bridge events, duplicate event normalization, and case-centric process discovery, optimizing both case extraction and statistical summarization for the multilevel log (Ronzoni et al., 3 Dec 2025).

4. Evaluation, Metrics, and Empirical Benchmarks

Rigorous quantitative evaluation underpins the effectiveness of multilevel mining.

Fitness and Precision: Standard measures such as alignment-based fitness (fraction of log traces explained) and ETC-align precision (fraction of allowed traces present in the log) are computed at each level. Hierarchical approaches frequently report both flat and projected (abstracted) metrics (Begicheva et al., 2023, Lu et al., 2020, Khayatbashi et al., 2024).
Complexity: The size of models (sum of places and transitions) and Control-Flow Complexity (CFC) are preferred measures of comprehensibility; CFC measures the frequency of branching/joining nodes. Hierarchical models dramatically reduce CFC per level versus flat models (Lu et al., 2020).
Empirical Results: On public BPI Challenge and real industrial logs, hierarchical/mined models consistently show high fitness ( $\sim$ 0.97–0.98) and substantially reduced model complexity. On object-centric logs, judicious application of drill-down and unfold increases both fitness and precision by 10–20 percentage points, revealing fine-grained behavioral patterns hidden in coarse models (Khayatbashi et al., 2024).
Explainability: Interactive abstraction logging in INEXA enables full justification for all model aggregations. Roll-back to more detailed model views is instantaneous, as overlays merely replay abstraction history on the original net (Benzin et al., 2024).

5. Comparative Analysis: Object-Centric vs. Multilevel Approaches

Object-centric process mining (OCPM) and MLPM are distinct yet complementary paradigms.

Aspect	Multilevel PM (e.g., IBM)	Object-Centric PM
Case Notion	Dynamic, multilevel case composition	No fixed case; multi-object
Data Structure	Flat, multi-ID logs + mapping order	Relational (OCEL)
Model Type	Unified, colored graphs	Multi-object Petri nets/DFGs
Conformance Checking	Case-level; entire case flagged	Fine-grained per event-object
Scalability	Moderate; case size grows with cross-links	Variable; high for large relations
Subprocess Insight	End-to-end, object-chained	Strong for cross-object flows
Industrial Adoption	IBM, high	OCEL, growing academic presence

MLPM excels with clearly ordered object levels and strong bridge semantics (e.g., end-to-end O2C or P2P analytics), allowing business users to visualize unified process flow. OCPM is more effective for ad hoc analysis in many-to-many or loosely coupled domains, supporting multiple simultaneous perspectives and granular relationships (Ronzoni et al., 3 Dec 2025).

Recent industrial tools (IBM's Organizational Mining) seek to integrate both paradigms by combining MLPM's case logic with OCEL's relational back end, supporting path-finding across multi-table event data.

6. Applications, Best Practices, and Limitations

Multilevel techniques are applicable wherever process logs contain high event-activity dimensionality, complex object interactions, or require expert-guided abstraction for analysis:

Large-scale healthcare (ED, inpatient care) (Al-Fedaghi, 2021)
Manufacturing process orchestration (Benzin et al., 2024)
Learning management systems, cross-course analysis (Hildebrandt et al., 2024, Khayatbashi et al., 2024)
Service management and supply-chain analytics (Ronzoni et al., 3 Dec 2025)

Best practices include:

For clustering-based abstraction, leverage embedded domain semantics or robust unsupervised clustering; avoid excessively large or semantically empty clusters.
Subprocess decomposition (e.g., in FlexHMiner) should align with natural concurrency and interleaving in the domain (Lu et al., 2020).
When modeling via object-centric frameworks, minimal drill-down should be applied to prevent log fragmentation; metrics should be validated after each operation.
For model-first approaches (conceptual modeling), the entire system—static structure, event regions, dynamic behavior, and monitoring—should be integrated for full traceability and self-consistency (Al-Fedaghi, 2021).

Limiting factors include the need for perfect-fitness logs in many abstraction techniques, possible computational bottlenecks in high cross-link scenarios, challenging alignment for multi-instance subprocesses, and the current tooling ecosystem—many advanced features remain in recent academic libraries (Begicheva et al., 2023, Benzin et al., 2024, Khayatbashi et al., 2024). Integrated, data-driven clustering and hyperparameter optimization in event abstraction remain open research directions.

7. Future Directions and Open Research Challenges

Key open problems in multilevel process mining include:

Endogenous, data-driven optimization of clustering that balances conformance, precision, and model simplicity (Begicheva et al., 2023).
Extending hierarchical abstraction beyond two levels to support arbitrary nested structures.
Incorporation of data attributes and performance criteria into abstraction algorithms (e.g., expected durations, outcome-based merges) (Benzin et al., 2024).
Efficient, scalable handling of noisy or imperfect real-world logs, complemented by robust object-centric alignments.
Development of hyperparameter tuning strategies to optimize the abstraction-refinement tradeoff.

The field is rapidly advancing towards comprehensive, explainable, and scalable methodologies able to handle enterprise-wide process analytics, with increasing convergence between multilevel, object-centric, and organizational mining frameworks (Ronzoni et al., 3 Dec 2025).

Markdown Upgrade to Chat

References (7)

Discovering Hierarchical Process Models: an Approach Based on Events Clustering (2023)

Discovering Hierarchical Processes Using Flexible Activity Trees for Event Abstraction (2020)

Advancing Object-Centric Process Mining with Multi-Dimensional Data Operations (2024)

INEXA: Interactive and Explainable Process Model Abstraction Through Object-Centric Process Mining (2024)

IBM Multilevel Process Mining vs de facto Object-Centric Process Mining approaches (2025)

Conceptual Model with Built-in Process Mining (2021)

Cross-course Process Mining of Student Clickstream Data -- Aggregation and Group Comparison (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Multilevel Process Mining.