ASP-Bench: Memory & Logic Benchmarks

Updated 8 February 2026

ASP-Bench is a dual-purpose benchmark suite that evaluates both memory-system performance and natural-language to ASP translation.
In its memory benchmark, it utilizes polyhedral code generation and modular driver templates to isolate bottlenecks and optimize kernel performance.
For neurosymbolic tasks, it maps natural language problem statements to ASP using diverse reasoning aspects to assess computational complexity.

ASP-Bench denotes two distinct, technically rigorous benchmarks in computational research: (1) a memory-system benchmarking framework for application-specific access pattern analysis (Lakshminarasimhan et al., 2018), and (2) a neurosymbolic benchmark for translating natural language problem descriptions to Answer Set Programs, encapsulating a spectrum of logic programming and reasoning complexities (Szeider, 1 Feb 2026). Both are explicitly named “ASP-Bench” in their respective sources, but serve fundamentally different research communities. This entry provides a comprehensive treatment of each benchmark, their methodologies, core constructs, and empirical insights.

1. Application-Specific Memory Subsystem Benchmark (ASP-Bench / AdaptMemBench)

ASP-Bench (alternatively “AdaptMemBench”) is a modular, application-specific memory-subsystem benchmark framework designed for systematic exploration of memory-related optimizations beyond canonical streaming or stride access patterns (Lakshminarasimhan et al., 2018). Its principle objective is to enable domain experts to isolate, parameterize, and empirically profile computational kernels representative of production scientific codes, thereby elucidating memory-system bottlenecks and guiding optimization strategies.

The framework is structured around four pipeline stages:

Pattern specification: The user defines data layouts, access macros, and iteration domains via header files (.h) and ISCC input (.in) files expressing Presburger sets and transformations.
Polyhedral code generation (optional): ISCC/ISL emits C loops with applied domain transformations (loop interchange, tiling, etc.).
Driver assembly: A selection of kernel-independent driver templates combine with the generated kernel code to create an executable benchmark.
Compilation and execution: The system compiles the driver, executes over a sweep of working-set sizes (spanning L1 to DRAM), collects timing and PAPI-derived hardware counters, and outputs standardized metadata and results.

2. Driver Templates and Execution Methodology

ASP-Bench offers three interchangeable measurement templates:

Unified Data Spaces: Threads share arrays, using OpenMP “parallel for” (default schedule: static, user-configurable via macro). Facilitates easy instantiation for kernels where false sharing is not critical or working sets are small.
Independent Data Spaces: Each thread is allocated private arrays, eliminating false sharing and thread contention. Implemented with a single OpenMP parallel region and per-thread ntimes iteration.
PAPI Measurement: Augments either template to collect hardware performance counters, with events (e.g., L1_DCM, CA_SHR) configurable at runtime.

Command-line control is uniform:

1	./benchmark --size <bytes> --threads <T> --ntimes <repetitions> [--papi-events ...]

This design isolates measurement and parallelism logic from kernel implementation, greatly facilitating reproducible, cross-kernel benchmarking (Lakshminarasimhan et al., 2018).

3. Configuration, Code Generation, and Performance Modeling

Benchmarks are configured through pattern specification and code generation:

Header (.h) and input (.in) files: Headers contain allocation and macro definitions; .in files, in ISCC syntax, describe iteration spaces and loop transformations.
Polyhedral transformations: Domains and mappings such as
1 2 3
Domain_run := [n] -> { S[i] : 0 <= i < n }; T_int := { [i,j] -> [j,i] }; codegen(T_int * Domain_run);
are compiled to C loops encapsulating complex access schedules.

Performance is quantified using the canonical bandwidth formula:

$B_{\text{achieved}} = \frac{(R + W)\,\times N \,\times s}{T} \quad [\mathrm{bytes/s}]$

where $R, W, s, N, T$ denote read operations, write operations, element size, number of elements, and execution time, respectively. Arithmetic intensity is also optionally characterized as

$I = \frac{\text{floating‐point ops}}{\text{bytes transferred}}$

but ASP-Bench prioritizes memory bandwidth quantification (Lakshminarasimhan et al., 2018).

4. Empirical Case Studies: STREAM Triad and Jacobi Stencils

Two detailed empirical studies illustrate the framework’s flexibility:

STREAM Triad: Implemented both in unified and independent data space templates. The latter nearly doubled L1 bandwidth (≈80 GB/s) compared to the unified configuration due to elimination of false sharing. By reordering multiple streams and employing “interleaved” variants, further bandwidth gains (up to 1.4×) were observed.
Multidimensional Jacobi Stencils: 1D, 2D, and 3D Jacobi kernels generated in polyhedral form. Explicit padding to cache-line granularity eradicated false sharing effects, reflecting ≈2× improvements in L1 bandwidth. Attempts at spatial tiling (full 3D or 2D blocking) did not yield bandwidth improvement on large-L3, many-core CPUs, indicating that more advanced temporal tiling or wavefront approaches are necessary for further optimization (Lakshminarasimhan et al., 2018).

5. Benchmark for Natural Language to ASP Translation (ASP-Bench)

A distinct benchmark—also titled “ASP-Bench”—focuses on the evaluation of end-to-end systems translating natural-language (NL) problem specifications into executable Answer Set Programs (Szeider, 1 Feb 2026). It is designed for neurosymbolic engineering and automated modeling research.

Scope: 128 instances (64 base problems × [easy, hard] variants), with diverse domains: logic puzzles, graph-theoretic tasks, scheduling, allocation, spatial and temporal reasoning, optimization, and planning.
Input/Output: Each instance provides an NL problem statement and a JSON specification of required output, targeting solution objects such as assignments, orderings, solution costs, or move sequences.
Solving and Verification: NL is mapped to ASP code (using the clingo Python API), answer sets are computed, solution atoms are extracted into JSON, and all solutions are validated semantically by Python-based validators.

6. Language Features and Reasoning Aspects in ASP-Bench

This benchmark systematically exercises the breadth of clingo-style ASP constructs:

Normal rules
Choice rules
Integrity constraints
Aggregates and conditional literals
Optimization (minimize/maximize)
Frame axioms for temporal reasoning
Recursion for constructs such as reachability
Spatial neighborhood encodings

Each hard variant instance is annotated with up to seven independent “reasoning aspects”:

Aspect	Criterion (Π(P))
OPT	Use of #minimize / #maximize
TEMP	Explicit time or ordered rules
DEFAULT	Soft constraints or preference rules
RESOURCE	Aggregates or resource limits
RECUR	Recursive definitions
SPATIAL	Grid or neighborhood modeling
QUANT	≥7 integrity constraints

This explicit factoring enables systematic analysis of modeling hardness as a function of ASP language features and problem structure (Szeider, 1 Feb 2026).

7. Automated Agentic Solution Methodology and Empirical Analysis

An autonomous modeling baseline is established using a Reason and Act (ReAct) agent framework:

Iterative loop: Reason (NL and modeling step planning) → Act (invoke clingo, test or full solve) → Observe (analyze outputs/errors) → Refine.
Metric: Number of python_exec calls until the reference validator reports “PASS” (i.e., correct semantics).
Results:
- Average calls: ≈4.8 for easy, ≈7.7 for hard variants.
- Hardest instances: DNA Sequence Assembly (26.0 calls), Metroidvania Generation (21.0), demonstrating that domain intricacy, not just reasoning aspect count, dominates modeling difficulty.
- Per-call time: Decreases from ≈22s (easy) to ≈15s (hard), with I/O context ratio rising from 24:1 to 45:1.
- Full saturation: Achieved for all 128 instances via multiple independent agent runs (Szeider, 1 Feb 2026).

Key insight: There is minimal correlation between the number of reasoning aspects and observed hardness—problems such as Nonogram, though annotated with fewer aspects, are considerably more demanding due to grid and constraint complexity.

8. Significance and Future Research Directions

ASP-Bench (as both memory and logic benchmarks) provides comprehensive, reproducible methodologies for probing system bottlenecks and end-to-end neurosymbolic modeling challenges. In memory-system studies, it enables rapid code generation, kernel evaluation, and optimization hypothesis testing for real application motifs. In natural-language logic modeling, it sets a rigorous standard for semantic as opposed to purely syntactic correctness, supporting nuanced performance diagnostics and comparative evaluation.

Proposed future directions include the deployment of prompt-optimization technologies to reduce agentic effort in NL-to-ASP translation (e.g., via DSPy), progressive prompt disclosure, benchmark extensions to increase computational and reasoning challenge, and investigations into encoding variants that optimize solver performance—bridging natural language understanding, symbolic modeling, and efficient reasoning (Lakshminarasimhan et al., 2018, Szeider, 1 Feb 2026).

Markdown Upgrade to Chat

References (2)

AdaptMemBench: Application-Specific MemorySubsystem Benchmarking (2018)

ASP-Bench: From Natural Language to Logic Programs (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to ASP-Bench.