Papers
Topics
Authors
Recent
2000 character limit reached

Organizational Mining: Unified Process Analysis

Updated 10 December 2025
  • Organizational Mining Feature is a unified framework integrating multilevel and object-centric paradigms to analyze workflows and interactions across diverse organizational entities.
  • It employs recursive case construction and bridge relations to consolidate multi-level event logs into distortion-free, color-coded process graphs that reveal causal relationships.
  • Leveraging SQL-accelerated data processing and automated log parsing, it enhances scalability and operational efficiency in ERP, compliance, and cross-department analyses.

An organizational mining feature enables process mining tools to discover, analyze, and visualize workflows and interactions spanning multiple organizational entities—such as people, roles, business objects, or cross-functional teams—across various levels of abstraction. Recent academic and industrial advances have focused on unifying multilevel and object-centric paradigms, with IBM’s Organizational Mining serving as a distinct instantiation that synthesizes the best aspects of both Multilevel Process Mining (MLPM) and Object-Centric Process Mining (OCPM) (Ronzoni et al., 3 Dec 2025).

1. Formal Definitions and Core Artifacts

Organizational mining relies on the integration of multi-entity, multi-level event data. In the MLPM formalism, the data foundation is the multilevel event log:

LML=(E,A,P,ID,πact,πtime,{πpidp}pP,B)L_{ML} = \bigl(E,\,A,\,P,\,ID,\,\pi_{act},\,\pi_{time},\,\{\pi_{pid}^p\}_{p\in P},\,B\bigr)

where:

  • EE: Finite event set.
  • AA: Set of activity labels.
  • P=P1,,PkP = \langle P_1,\dots,P_k \rangle: Ordered sequence of entity types (ProcessID types).
  • ID=i=1kIDiID = \bigcup_{i=1}^k ID_i: Disjoint identifier domains for each level.
  • πact\pi_{act}, πtime\pi_{time}: Activity and timestamp functions.
  • πpidp:EIDp{}\pi_{pid}^p: E \to ID_p \cup \{\bot\}: Maps each event to relevant ProcessIDs at levels pp.
  • BE×EB \subseteq E \times E: "Bridge" relations linking events across entity levels, enabling correct assignment of events to multilevel cases.

This formalism allows an event to carry up to two non-\bot ProcessID values: a "native" entity, and a possible bridge to another entity level.

2. Case Construction and Assignment Mechanism

Case discovery in multilevel event logs is recursive. For each highest-level entity PkP_k, a case is formed by chaining all related events across lower entity levels via bridge relations. Formally:

  1. Base cases: Ck(idk)={eEπpidk(e)=idk}C_k(id_k) = \{ e \in E \mid \pi_{pid}^k(e) = id_k \}.
  2. Recursive chaining: For i=k1,,1i = k-1,\ldots,1,

Ci(idi)={eEπpidi(e)=idieCi+1():(e,e)B}C_i(id_i) = \left\{ e \in E \mid \pi_{pid}^i(e) = id_i \wedge \exists e' \in C_{i+1}(\cdot): (e,e') \in B \right\}

  1. Full case: The union c=i=1kCi(idi)c = \bigcup_{i=1}^k C_i(id_i).

The case-mapping function case(e)={cCec}\mathrm{case}(e) = \{ c \in C \mid e \in c \} assigns events to cases, allowing proper deduplication of bridge events for downstream statistics.

3. Architecture, Workflow, and Algorithms

Organizational mining in the IBM implementation comprises three main phases:

  1. Data preparation and log parsing: Assign πpidp\pi_{pid}^p and construct BB.
  2. Case composition: Chain ProcessIDs via bridges recursively.
  3. Model discovery: Extended α-style mining generates a unified process graph GMLG_{ML}, with vertices representing activities and edges computed by a causal strength metric:

σ(a,b)={cCe1,e2c:πact(e1)=a,πact(e2)=b,t(e1)<t(e2)}\sigma(a,b) = \left|\left\{ c \in C \mid \exists e_1, e_2 \in c: \pi_{act}(e_1) = a,\, \pi_{act}(e_2) = b,\, t(e_1) < t(e_2) \right\}\right|

Edges are retained above a threshold, yielding a single, color-coded, end-to-end process graph.

The workload is SQL-accelerated; event log storage, parsing, and computation utilize window functions and bitmap indexes in a relational schema.

4. Comparative Analysis: Multilevel vs. Object-Centric Paradigms

Organizational mining was designed to unify the strengths of both MLPM and OCPM:

Aspect Multilevel PM Object-Centric PM
Case notion Recursive chaining of entity levels No “case”; events may involve multiple objects
Data model Flat event table with ProcessID columns/bridges Event log + object tables per type (OCEL)
Model output Unified process graph, colored by entity Object-centric Petri nets, BPMN, causal nets
Statistics Deduplication by merged events in cases Cross-object metrics via multisets of object links
Conformance Deviation in any entity fails entire case Checks per object or link
Scalability Scales via SQL, challenged by deep/horizontal chains Heavy for many-to-many; optimized in object-centric libs
Use cases End-to-end, cross-entity workflows Multi-process, ad hoc, cross-object analytics
Limitation May obscure intra-entity subprocesses Lacks unified cross-object KPIs

The objective of organizational mining is to retain unified, distortion-free end-to-end graphs and accurate statistics of MLPM, while leveraging OCPM’s relational, flexible data modeling and preparation pipelines.

5. Illustrative Example and Empirical Characteristics

Consider a process with hierarchical entities: Order (P1P_1), Receipt (P2P_2), Invoice (P3P_3). In the provided example:

  • The raw log shows bridge events linking Receipts to a single Invoice.
  • The case chaining algorithm collapses 11 event rows into 10 unique events in the process model, correctly merging duplicate events from cross-links.
  • Entity statistics per case are deduplicated: #{Order}=2,#{Receipt}=3,#{Invoice}=2\#\{\text{Order}\}=2,\,\#\{\text{Receipt}\}=3,\,\#\{\text{Invoice}\}=2.
  • Throughput time between activities such as "Order Creation" and "Goods Receipt Confirmed" is computed by path enumeration within chained cases.

This ensures no artificial inflation or event duplication across levels, a requirement unmet by prior non-unified approaches.

6. Evolution into Organizational Mining: Productization and Improvements

Based on limitations identified in MLPM—such as scalability with increased entity levels/fan-out, laborious log preparation, and conformance criteria being overly rigid—IBM evolved the framework into the Organizational Mining feature:

  • Retains: mathematically rigorous, unified model and correct statistics from MLPM.
  • Incorporates: object-centric, relational storage and event log architecture from OCPM, avoiding manual flattening.
  • Automates: log parsing, path discovery, and cross-log correlation using SQL metadata.
  • Enhances: performance on large, distributed datasets (via NextGen engine) and enables simplified data preparation and cross-process analytics.

7. Practical Implications, Use Cases, and Best Practices

Organizational mining is best suited for processes that require seamless, end-to-end visibility across multiple organizational units or business object types, such as Procure-to-Pay, Order-to-Cash in ERP scenarios, and compliance monitoring spanning departments. Key operational guidelines include:

  • Define clear entity ordering from highest to lowest level.
  • Ensure each event captures only its direct and (optionally) bridge entity IDs.
  • Explicitly tag bridge events during ETL or log extraction.
  • Leverage relational indexes and SQL-based filtering to optimize scalability.
  • Configure throughput-time calculations carefully in the presence of repeated events for the same entity.

This unification effectively delivers a distortion-free, explainable, and scalable view of organizational workflows, supporting both operational efficiency and regulatory compliance (Ronzoni et al., 3 Dec 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Organizational Mining Feature.