BlazingAML: High-Throughput Anti-Money Laundering (AML) via Multi-Stage Graph Mining

Published 14 Apr 2026 in cs.DC | (2604.12241v1)

Abstract: Money laundering detection faces challenges due to excessive false positives and inadequate adaptation to sophisticated multi-stage schemes that exploit modern financial networks. Graph analytics and AI are promising tools, but they struggle with the fuzziness of laundering patterns, which exhibit structural and temporal variations. Conventional data mining techniques require the detailed enumeration of pattern variants, which not only complicates the analyst's task to specify them, but also leads to large run-time overheads and difficulty training accurate AI models. The paper presents BlazingAML, a scalable AML system design that introduces: 1. A novel multi-stage framework for expressing fuzzy money laundering patterns 2. A domain-specific compiler that transforms high-level pattern descriptions into high-performance code for CPU and GPU back-ends The multi-stage abstraction decomposes complex laundering schemes into logical stages connected by graph operations, enabling diverse patterns to be expressed using unified primitives while capturing structural and temporal fuzziness. The compiler applies sophisticated optimizations, eliminating manual parallel programming requirements for financial analysts. Evaluation on IBM AML datasets shows BlazingAML achieves the same F1 score as state-of-the-art approaches while delivering 210x and 333x higher speedup on CPU and GPU respectively, with superior scalability.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper introduces a multi-stage graph mining framework that declaratively expresses fuzzy, layered laundering patterns, achieving up to 333× throughput improvement.
It leverages a domain-specific compiler to optimize code generation for CPU and GPU, efficiently managing structural and temporal fuzziness in complex networks.
Empirical results demonstrate real-time detection capabilities with near-linear scalability on up to 256 CPU threads and multiple GPUs for large-scale transaction graphs.

BlazingAML: A Multi-Stage, High-Throughput Framework for Anti-Money Laundering Graph Mining

Introduction and Motivation

The detection of money laundering within large-scale financial networks presents acute challenges due to the evolving sophistication of laundering schemes and the immense volume of transactional data. Traditional rule-based approaches produce excessive false positives and often lack the adaptability required to counteract modern, multi-stage laundering tactics. Recent work has turned to graph analytics and hybrid graph+AI systems, but these too are hampered by the structural and temporal "fuzziness" inherent in money laundering patterns. Rigid subgraph mining and exact pattern matching approaches scale poorly as pattern complexity and fuzziness increase.

BlazingAML addresses these deficiencies by introducing a declarative multi-stage graph mining framework and an optimizing domain-specific compiler, enabling analysts to concisely express fuzzy, layered laundering patterns. This system produces highly efficient mining code for both CPU and GPU, going beyond the constraints of motif-based query languages and existing graph mining engines. Empirically, BlazingAML achieves up to 333 $\times$ throughput improvement over leading baselines, demonstrating both scalability and real-world deployability without loss of detection accuracy.

Figure 1: Overview and core contributions of BlazingAML, depicted within a standard AML analytics pipeline centered on multi-stage graph mining and downstream AI classification.

Fuzzy Pattern Expression via Multi-Stage Decomposition

A key contribution of BlazingAML is its unified stage-based abstraction for describing illicit transaction patterns. Unlike earlier motif-centric languages or systems with limited support for structural or temporal variation, the multi-stage model decomposes complex laundering strategies into logically sequential graph operations. Each stage represents a propagation or transformation of money flow with explicit support for:

Structural fuzziness: Accommodating variable numbers of intermediate accounts and uncertain motif sizes.
Temporal fuzziness: Enabling partial orderings and relaxed or windowed temporal constraints rather than strict edge orderings.

For instance, the scatter-gather pattern—common in layered laundering—can be described succinctly by combining neighbor expansion and intersection primitives across sequential stages. This approach generalizes naturally to other patterns, such as cycles and deep fan-in/fan-out, while supporting interchangeable stages and dynamic constraint configuration.

Figure 2: Representative laundering patterns—cycles, scatter-gather, and fan-out structures—illustrate the diversity and complexity of layering schemes in financial graphs.

Figure 3: Visualizes structural and temporal fuzziness, underscoring the variability and partial ordering that must be captured in realistic AML scenarios.

Figure 4: Stage-based expressions of scatter-gather and 4-cycle patterns demonstrate the unifying power and clarity of the multi-stage framework.

Compiler Architecture and Optimization Strategies

BlazingAML decouples logical pattern specification from low-level data processing through a domain-specific compiler. Analysts declare AML patterns in a high-level format, specifying per-stage inputs, operations, constraints, and outputs. The compiler parses these specifications and generates architecture-specific, cache-optimized C++ or CUDA code, implementing several hardware-aware optimizations:

Power-law-aware memory layouts: Addressing high-degree vertices typical in transactional graphs.
Workload balancing: Degree-based partitioning for heterogeneous parallel hardware.
Hybrid CPU-GPU scheduling: Assigning shallow traversals to GPUs for maximal parallelism, while deep traversals can be offloaded to CPUs for memory efficiency.

The system supports seamless integration with upstream (streaming) and downstream (ML-based classification) analytics, facilitating real-time deployment. The modular pipeline ensures both rapid prototyping and flexible deployment across heterogeneous cloud or on-premise environments.

Figure 5: Pseudo-code generated by the compiler for scatter-gather and 4-cycle pattern mining, exemplifying the automated translation of logical pattern descriptions into efficient execution kernels.

Empirical Performance and Scalability

Comprehensive evaluation on standardized IBM AML datasets compares BlazingAML with state-of-the-art baselines, notably GFP (IBM) and FraudGT (graph transformer-based models).

Throughput: BlazingAML achieves up to 210 $\times$ speedup on CPU and 333 $\times$ on GPU over GFP for core patterns such as scatter-gather and cycle mining.
Scalability: Near-linear parallel scaling up to 256 CPU threads or multiple GPUs, with consistent performance advantages across five orders of magnitude in graph size.
Classification Efficacy: When used as feature extraction for a gradient boosting classifier, BlazingAML maintains identical F1 scores to GFP; the combination of structural count features and statistical learning yields peak detection accuracy, especially as more topological descriptors are incorporated.

Notably, while FraudGT achieves somewhat higher F1 scores due to more complex ML pipelines, BlazingAML delivers 4.9 $\times$ higher throughput, indicating a more favorable tradeoff for real-time, production-scale AML monitoring.

Figure 6: End-to-end normalized throughput for scatter-gather pattern mining, demonstrating substantial speedups relative to GFP.

Figure 7: Cycle pattern mining throughput, confirming scalability and performance improvements on both CPU and GPU.

Figure 8: Combined throughput for fan-in and fan-out pattern mining, highlighting efficient code generation and parallel load balancing.

Figure 9: Stack pattern throughput normalized to single-threaded baselines, underscoring strong scale-up even for complex recursive patterns.

Figure 10: Scalability on synthetic transaction graphs up to 100M edges, showing robust multi-threaded and GPU scaling.

Figure 11: BlazingAML achieves 4.9 $\times$ higher mining throughput than FraudGT, validating superiority for high-volume, time-sensitive financial crime detection.

Theoretical Implications and Future Directions

The introduction of a unified, high-level pattern language and optimizing compiler marks an important step in systematizing AML detection in large transaction graphs. The stage-based abstraction reconciles the need for expressive, domain-relevant queries with the realities of high-volume graph analytics. This paradigm encourages the generation, sharing, and rapid iteration of pattern libraries, catalyzing both algorithmic and regulatory collaboration across institutions.

The approach offers extensibility towards:

Adaptive ML integration: Integration with advanced GNNs or attention-based models for incremental or semi-supervised learning on pattern-derived features.
Streaming analytics: Native support for continuous graph updates and real-time risk scoring.
Cross-domain applications: Transferability of the multi-stage compiler paradigm to other anomaly detection contexts, such as cyber-physical networks and supply chain provenance.

Further enhancement could come from automated pattern discovery or reward-driven pattern evolution, potentially coupled with user-in-the-loop exploration interfaces and formal security auditing of pattern libraries.

Conclusion

BlazingAML advances anti-money laundering analytics by formalizing fuzzy AML graph pattern expression and bridging the expertise gap between financial analysts and high-performance systems developers. Its multi-stage, stage-compiler architecture delivers substantial gains in throughput and scalability, empowering rapid response to emergent laundering behaviors without sacrificing detection accuracy. The work underlines the efficacy of hybrid analytics pipelines and establishes a reference point for future graph-based anomaly detection frameworks.

Markdown Report Issue