Papers
Topics
Authors
Recent
2000 character limit reached

Object-Centric Process Mining

Updated 10 December 2025
  • Object-centric process mining is an advanced paradigm that models and analyzes multiple interacting business objects using OCEL 2.0-based event data.
  • It employs techniques like Object-Centric Directly-Follows Graphs and Petri Nets to capture inter-object relationships and complex process dynamics.
  • Innovative algorithms for discovery, conformance, and performance analysis enhance scalability and interpretability for multi-dimensional process logs.

Object-centric process mining (OCPM) is an advanced process mining paradigm that models, discovers, and analyzes the intertwined dynamics of multiple interacting business objects within event data. Unlike traditional case-centric approaches, object-centric process mining operates directly on object-centric event data (OCED), which reflects the true multi-object nature of operational processes as captured by formats such as OCEL 2.0. By capturing event-object and object-object relationships natively, OCPM enables the analysis of complex, interrelated behaviors that span multiple functional units and value chain segments, and supports both fine-grained and aggregated process insights (Koren et al., 4 Mar 2024, Berti et al., 2023, Khayatbashi et al., 26 Aug 2025).

1. Formal Foundations: Object-Centric Event Data and Log Structures

The foundational data structure in OCPM is the object-centric event log, formalized as a tuple

L=(E,O,EA,OA,evtype,evid,evtime,objtype,objid,eatype,oatype,eaval,oaval,E2O,O2O)L = (E,\,O,\,EA,\,OA,\,evtype,\,evid,\,evtime,\,objtype,\,objid,\,eatype,\,oatype,\,eaval,\,oaval,\,E2O,\,O2O)

where:

  • EE is a finite set of events and OO is a finite set of objects (with EO=E \cap O = \emptyset).
  • EAattEA \subseteq att and OAattOA \subseteq att are the sets of event- and object-attribute names.
  • Attribute functions specify event and object types, timestamps, and qualifiers.
  • E2OE×(qual×O)E2O \subseteq E \times (qual \times O) encodes event-to-object links with qualifiers (e.g., "created", "used").
  • O2OO×(qual×O)O2O \subseteq O \times (qual \times O) describes explicit object-to-object relations (e.g., "contains").

The dominant logging standard, OCEL 2.0, supports dynamic object attributes, qualified object relations, and temporally evolving object states (Koren et al., 4 Mar 2024, Goossens et al., 21 May 2024).

Events in OCED can reference multiple objects of varying types with arbitrary cardinality. Cardinality constraints and qualifiers enforce traceability, e.g., every event relates to at least one object, and objects may be involved in multiple events or with multiple others.

2. Fundamental Modeling Approaches and Expressive Power

OCPM extends classical process modeling by embracing several high-expressivity paradigms for handling multi-object interactions:

  • Object-Centric Directly-Follows Graphs (OC-DFGs): Typed multigraphs G=(A,OT,F)G = (A, OT, F) where each arc is colored by the object type witnessing a directly-follows relation. This structure supports the identification of multi-type control flows unrealizable in single-case models (Berti et al., 2022, Berti et al., 2023).
  • Object-Centric Petri Nets (OCPNs): These feature typed places holding object-identifying tokens and transitions that synchronize multiple objects. Variable arcs model cardinality flexibility (e.g., one-to-many object participation). The semantics ensures correct firing conditions over bindings of typed objects (Seidel et al., 18 Aug 2025, Berti et al., 2022).
  • Petri Nets with Identifiers (OPIDs): Tokens are tuples of object identifiers, allowing representation of explicit inter-object relationships and supporting synchronization constraints such as stable many-to-one rigidities (Seidel et al., 18 Aug 2025).
  • Declarative Artifact and Behavioral Constraint Models: Declarative rules over object-activity pairs and relations (e.g., object-centric behavioral constraints) formalize temporal or relational requirements between different object classes (Berti et al., 2023).

The ability to represent explicit object-to-object synchronization, relationship constraints, and evolving object states is central for process discovery and conformance analytics that capture the bounding behavior and inter-object dependencies of real processes.

3. Scope Definition, Aggregation, and Multi-Level Analysis

Object-centric event data often encompasses multiple, interrelated processes without explicit process boundaries. Existing event log formats lack direct representation of process "scopes," impeding multi-level analytics. Analysts can address this by defining process scopes as first-class objects of type "process" within the OCEL. A scope is formally specified as: Sp=(Ep,Op,E2OEp×{p},O2O{p}×Op)S_p = (E_p,\, O_p,\, E2O|_{E_p \times \{p\}},\, O2O|_{\{p\} \times O_p}) where EpE_p (events in scope) and OpO_p (objects in scope) are attached via dedicated qualifiers (Khayatbashi et al., 26 Aug 2025).

Scoping is achieved by an analyst-authored enrichment ruleset RR (using a domain-specific language) specifying inclusion and exclusion conditions over event/object attributes and types. The embedding function

f:(OCEL)×RPOCELf: (\text{OCEL}) \times \mathcal{R} \to \text{POCEL}

maps an OCEL and a set of scope rulesets to a scope-enriched OCEL, in which each new scope is an object linked to its constituent events and objects (Khayatbashi et al., 26 Aug 2025).

This mechanism enables:

  • Intra-scope analysis: Scope-specific sublogs support traditional OCPM operations (e.g., process discovery, compliance checking) confined to the scope.
  • Inter-scope analysis: Construction of a directed process interaction graph G=(P,A)G = (P, A), where edges represent shared-object handovers across process scopes (edge (pi,pj,t)(p_i, p_j, t) means a shared object of type tt links pip_i and pjp_j).
  • Multi-level drill-down/roll-up: Analysts may define nested or hierarchical scopes, enabling aggregation to higher-level processes or detailed drill-down into subprocesses (Khayatbashi et al., 26 Aug 2025, Khayatbashi et al., 30 Nov 2024).

This multi-level structuring aligns analysis with real organizational perspectives (e.g., business-unit vs. operational roles) and supports agile "what-if" rescoping without re-exporting raw data.

4. Key Algorithms: Discovery, Conformance, and Performance Analysis

4.1. Process Discovery

The dominant workflow proceeds as follows:

  • Flattening per object type: For each object type, extract the sublog of events referencing at least one object of that type, producing LotL_{ot} for otOTot \in OT.
  • Sub-discovery: Apply standard process discovery algorithms (Inductive Miner, α-miner) to each LotL_{ot}.
  • Collation: Merge the resulting per-type models into an OC-DFG or OCPN, decorating arcs or places by object type and synchronizing transitions as appropriate (Berti et al., 2023, Berti et al., 2022).

4.2. Conformance Checking

OCPM requires new definitions of fitness and precision to account for multi-object synchronization. Following (Adams et al., 2021): fitness(L,OCPNA)=1EeEenL(e)enOCPN(e)enL(e)\text{fitness}(L, OCPN_A) = \frac{1}{|E|} \sum_{e \in E} \frac{|en_L(e) \cap en_{OCPN}(e)|}{|en_L(e)|}

precision(L,OCPNA)=1EfeEfenL(e)enOCPN(e)enOCPN(e)\text{precision}(L, OCPN_A) = \frac{1}{|E_f|} \sum_{e \in E_f} \frac{|en_L(e) \cap en_{OCPN}(e)|}{|en_{OCPN}(e)|}

where enL(e)en_L(e) and enOCPN(e)en_{OCPN}(e) are the set of activities enabled in the log/model after the context of ee.

4.3. Performance Analysis

Distinct OCPM-specific time metrics can be computed via token-based replay on OCPNs:

  • Synchronization time: sync(eo,V)=maxTminT\mathit{sync}(eo, V) = \max T - \min T for the set of related token visits TT.
  • Pooling time and lagging time capture gathering delays for object groups, exposing interaction inefficiencies (e.g., waiting for all items before an order is shipped) (Park et al., 2022).

5. Granularity Operations, Clustering, and Local Modeling

Large-scale or heterogeneous OCELs often lead to complex or "spaghetti" models. Several techniques have been developed to enhance interpretability and adapt granularity:

  • Granularity Operations: Four reversible operations—drill-down, roll-up, unfold, fold—on object and event types support dynamic adjustment between detailed and coarse process views, aiding zoom-in/zoom-out during discovery and supporting segmentation by object-attribute slices (Khayatbashi et al., 30 Nov 2024).
  • Clustering Techniques: Object-centric clustering groups similar objects (e.g., via profile vectors incorporating traces, graph metrics, and attributes) using distance measures (edit, Euclidean, categorical) and standard clustering algorithms (k-means, hierarchical). Results demonstrate drastic reductions in OC-DFG model complexity while maintaining or improving discovery fitness (Ghahfarokhi et al., 2022, Jalali, 2022).
  • Object-Centric Local Process Models (OCLPMs): Algorithmic discovery of frequently recurring multi-object behavioral patterns, realized as OCPN fragments, facilitates focused analysis and pattern mining across highly entangled logs (Peeva et al., 4 Nov 2024).

6. Tooling, Storage, and Data Engineering

Object-centric process mining is supported by a growing ecosystem of open-source tools:

  • OCEL 2.0 Resources: Formal specification, example logs, and library support for OCEL 2.0 are consolidated at (Koren et al., 4 Mar 2024).
  • Analysis Frameworks: Major platforms include OC-PM (web/ProM), ocpa (Python), PM4Py-MDL, and application-specific tools (e.g., local model discovery plugins for ProM) (Berti et al., 2022, Peeva et al., 4 Nov 2024, Khayatbashi et al., 30 Nov 2024).
  • Storage Architectures: Scalable storage is enabled by mapping OCEL to document-oriented databases (e.g., MongoDB), supporting aggregation pipelines for directly-follows discovery and lifecycle extraction on logs with tens of millions of events (Berti et al., 2022). More recently, relational hub-and-spoke architectures with process-agnostic 3NF schemas have been advocated for high-frequency, streaming, and schema-evolving environments (Bosmans et al., 1 Oct 2024).
  • Data extraction methodologies: OCPM² extends the PM² methodology for systematic OCED extraction, emphasizing conceptual modeling, extraction matrices, automated verification, and iterative improvement—crucial for reproducible analysis (Miri et al., 13 Mar 2025).

7. Challenges, Limitations, and Research Directions

Despite rapid methodological and tooling advances, several challenges persist:

  • Process Scope Definition: Automated detection of meaningful scopes remains unsolved; current solutions depend on manual rulesets or future clustering over object-object motifs (Khayatbashi et al., 26 Aug 2025).
  • Relationship Semantics and Synchronization: Standard OCPNs do not enforce intended object relationships, necessitating mappings to OPIDs for explicit synchronization (e.g., stable many-to-one bindings) to avoid underspecification (Seidel et al., 18 Aug 2025).
  • Scalability, Streaming, and Data Evolution: Handling massive, fast-evolving logs with unstructured data (e.g., email, IoT) or high schema change rates is an open area for data engineering and query optimization research (Bosmans et al., 1 Oct 2024).
  • Model Quality and Benchmarks: Generalized quality metrics (fitness/precision) exist, but large-scale, multi-object benchmarks and gold standards are still required for empirical comparison (Adams et al., 2021, Goossens et al., 21 May 2024).
  • Tool Interoperability and Standardization: The coexistence of multiple OCED/OCEL variants and limitations in schema evolution/type inheritance hinder cross-tool compatibility; specification convergence is recommended (Goossens et al., 21 May 2024, Koren et al., 4 Mar 2024).

Emerging trends include: extension of enrichment languages for scopes (temporal/pattern constraints), semi-automated scope suggestion, streaming-ready object-centric repositories, and the integration of knowledge-graph or machine learning techniques for variant mining and predictive analytics.


References:

Definition Search Book Streamline Icon: https://streamlinehq.com
References (18)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Object Centric Process Mining.