LDBC Graphalytics Benchmarks

Updated 17 January 2026

LDBC Graphalytics Benchmarks is an industrial-grade benchmark suite that evaluates graph analysis platforms with extensive, reproducible performance metrics.
It features a diverse workload including full-graph algorithms like BFS, PageRank, WCC, CDLP, LCC, and SSSP to stress different hardware and software architectures.
Its open specification and detailed reporting protocols enable fairness, extensibility, and precise measurement of processing time, throughput, scalability, and robustness.

The LDBC Graphalytics benchmarks constitute an industrial-grade standard for evaluating and comparing graph analysis platforms. Developed under the auspices of the Linked Data Benchmark Council (LDBC), Graphalytics is designed to exercise diverse system bottlenecks, provide robust algorithmic validation, and facilitate rigorous, reproducible performance measurements across a wide ecosystem of software and hardware architectures. The suite’s open specification, extensive dataset variety, and detailed performance reporting protocol have positioned it as a central reference point in graph systems research and engineering (Iosup et al., 2020).

1. Design Objectives and Benchmark Philosophy

LDBC Graphalytics was architected around explicit requirements for fairness, coverage, and comprehensive evaluation. The suite must:

Support any graph-processing platform and hardware configuration, avoiding bias toward particular execution or data models.
Expose all critical performance bottlenecks encountered in practical graph analytics: irregular memory access, degree skew, computation-communication tradeoff, and robustness against performance variability and failure.
Achieve a balance of comprehensiveness and feasible runtime, allowing standard trials (six algorithms × five graphs × three repetitions) to complete on commodity hardware within practical wall-clock bounds.
Embrace open, modular, and continuously renewable engineering, including extensibility for new algorithms, datasets, and system drivers (Iosup et al., 2020).

2. Workload Suite: Algorithms and Computational Patterns

The test suite encompasses six deterministic, full-graph kernels that have been selected both for their practical prevalence and for stressing distinct hardware and architectural properties. These are:

Breadth-First Search (BFS): Single-source distance labeling via layer-wise expansion; characterized by memory-bound, low arithmetic intensity, and highly irregular bandwidth.
PageRank (PR): Iterative fixpoint computation modeling Markovian random walks, with high communication-to-computation ratio and global synchronization requirements.
Weakly Connected Components (WCC): Iterative label propagation treating edges as undirected, driven by integer-only, bulk-synchronous operations and global convergence properties.
Community Detection via Label Propagation (CDLP): Iterative histogram-based propagation, with deterministic tie-breaking, exercising data-local communication versus fragmentation.
Local Clustering Coefficient (LCC): Per-vertex triangle counting; highly memory- and compute-bound, emphasizing cache and fine-grain parallelism.
Single-Source Shortest Paths (SSSP): Dijkstra-style computed labeling under positive weights, invoking priority scheduling and dynamic computational frontiers (Iosup et al., 2020).

The mathematical formalizations appear verbatim in the specification (e.g., the PageRank update formula), ensuring unambiguous workload semantics.

3. Datasets: Real-World Graphs and Synthetic Generators

Graphalytics mandates a pluralistic dataset roster comprising real-world graphs and carefully parameterized synthetic generators:

Real-World Graphs: Representative datasets span knowledge graphs (e.g., wiki-talk, cit-patents), social networks (com-friendster, twitter_mpi), and domain-specific graphs (dota-league, kgs), with vertices ranging from 61K to over 65M and edges up to 1.97B.
Synthetic Generators:
- Graph500 (RMAT): Power-law degree distribution, tunable scale (e.g., $n=2^{22\dots 30}$ ), emphasizing high-degree skew.
- LDBC Datagen (“social network” model): Parameterized by scale and clustering, generating realistic structural motifs and distributions.

All datasets are formatted via the EVLP standard, supporting streamlined ingestion and validation (Iosup et al., 2020).

A representative table of datasets evaluated in recent system studies is as follows:

| Abbr. | Dataset | |V| | |E| | |-------|--------------------------|:----------:|:--------:| | FB0 | datagen-9_0-fb (LDBC) | 12.8M | 1.05B | | G500 | graph500-26 (LDBC) | 32M | 1.05B | | WB | webbase-2001 (real) | 118M | 1.71B | | UK | uk-2005 (real) | 39.5M | 1.57B | | CF | com-friendster (real) | 65.6M | 1.81B | | TW | twitter-2010 (real) | 41.7M | 1.47B |

Editor’s term: All values as reproduced in (He et al., 2023).

4. Execution Methodology and Architecture

The core harness of Graphalytics orchestrates the full experiment lifecycle: data formatting, system loading, algorithmic execution, output validation, and reporting. The platform driver API abstracts platform-specific integration, while a validator ensures that all outputs match deterministic reference results within epsilon tolerances (for real-valued outputs) or permutational equivalence (for labelings).

A standard evaluation trial is defined by:

Explicit exclusion of graph loading time from reported “processing time” metrics (only kernel runtime is evaluated).
Multiple repetitions per workload and dataset, enabling the computation of variability, scaling, and robustness statistics (Iosup et al., 2020).

The workflow pseudo-code directly from the specification:

loadBenchmark(config.json)
for each dataset D in config.datasets:
  DataManager.formatGraph(D)
  Driver.loadGraph(D)
  for each (alg, params) in config.algorithms:
    for rep in 1..config.repetitions:
      startTimer()
      Driver.runAlgorithm(alg, params, D, timeout)
      T = stopTimer()
      output = Driver.fetchOutput()
      Validator.check(output, Reference[D][alg])
      Reporter.recordRun(D,alg,rep,T,success)
  Driver.unloadGraph(D)
Reporter.generateReport()

(Iosup et al., 2020)

5. Metrics: Performance, Scalability, and Robustness

Graphalytics mandates a detailed set of quantitative metrics, measured and reported by the test harness:

Processing time $T_p$ : Wall-clock duration for kernel execution, excluding data loading and I/O.
Throughput metrics: Edges per second (EPS: $m/T_p$ ); edges + vertices per second (EVPS: $(n+m)/T_p$ ).
Cost metrics: Three-year total cost of ownership (TCO) and price-per-performance ( $\mathrm{PPP}=\mathrm{TCO}/\mathrm{EVPS}$ ) as required by LDBC Byelaws.
Scalability metrics:
- Strong scaling: $\mathrm{Speedup}(p)=T_p(1)/T_p(p)$ .
- Weak scaling efficiency: $\mathrm{Efficiency}(p)=T_p(1)/(p\cdot T_p(p))$ .
Robustness metrics: Failure rate, coefficient of variation (CV) of $T_p$ , SLA compliance rate (proportion of runs meeting timeout constraints).

Performance is evaluated via a standard protocol on defined datasets and kernel/dataset combinations, across varying hardware scales and configurations (Iosup et al., 2020).

6. Empirical Evaluations and System Comparisons

Recent papers have leveraged Graphalytics to benchmark both graph analytics platforms and graph database systems:

GraphScope Flex (He et al., 2023):
- Ran PageRank and BFS on a suite of synthetic (LDBC “datagen,” Graph500) and real-world (webbase, uk-2005, com-friendster, twitter-2010, arabic-2005, ogbn-products, ogbn-papers100M) graphs, mostly with $|E| \sim 1$ B.
- On CPUs (Xeon Platinum 8269CY, 96 GB RAM), achieved average speedup of $25.1\times$ (max $55.7\times$ ) over PowerGraph, $2.3\times$ (max $3.4\times$ ) over Gemini.
- On GPUs (8 × V100, NVLink), achieved average $3.3\times$ (max $9.9\times$ ) over Gunrock/Groute.
- Performance is attributed to advanced inter-GPU communication, dynamic work stealing, and data-local execution strategies.
GraphAlg/AvantGraph (Graaf et al., 10 Jan 2026):
- Ran BFS, PageRank, CDLP, WCC, SSSP on all “S”-scale Graphalytics graphs (up to 13M vertices, $\sim$ 50M edges).
- Demonstrated shortest code complexity for algorithm implementations (2–9× more concise than SQL/Python; up to 13× vs. Pregel/Java).
- On key workloads, achieved best-in-class runtimes for PageRank, SSSP, and WCC; competitive on BFS; CDLP performance lagged due to hash-aggregation bottlenecks.
- Showed that relational algebra–based approaches with in-place aggregation and loop-invariant code motion are performant and robust for graph analytics in database environments.

These comparative results validate both the stringency of the benchmark and its ability to distinguish algorithmic, architectural, and engineering strengths across platforms (He et al., 2023, Graaf et al., 10 Jan 2026).

7. Impact, Validation, and Ongoing Renewal

Graphalytics remains extensible and “living”: its component drivers, reference outputs, and datasets can be conveniently updated via a public process. Its rigor in output validation (exact match, equivalence, or $\varepsilon$ -match as appropriate), robust modular software engineering, and deep metric reporting have led to broad adoption and citation in both academic and industrial research. By balancing diversity, reproducibility, and objectivity, LDBC Graphalytics continues to facilitate fair, repeatable cross-system comparisons, driving technological advancement and standardization within the graph systems community (Iosup et al., 2020).

A plausible implication is that future revisions will further extend the benchmark suite as both hardware platforms and graph analytics methodologies evolve.

Markdown Upgrade to Chat

References (3)

The LDBC Graphalytics Benchmark (2020)

GraphScope Flex: LEGO-like Graph Computing Stack (2023)

Algorithm Support for Graph Databases, Done Right (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to LDBC Graphalytics Benchmarks.