Tensor Computing Interface (TCI)

Updated 4 January 2026

Tensor Computing Interface (TCI) is a standardized API that unifies tensor operations across heterogeneous platforms, enabling efficient tensor-network applications.
It uses a zero-overhead design with compile-time templates, ensuring performance parity with native framework implementations on CPUs, GPUs, and supercomputers.
The interface offers over 50 free functions and a BLAS-like xGETT for binary tensor contraction, promoting portability and seamless backend integration.

The Tensor Computing Interface (TCI) is a standardized, application-oriented API designed to abstract and unify tensor operations across heterogeneous software and hardware platforms. TCI enables portable, high-performance tensor-network (TN) applications in domains such as quantum science and machine learning. It addresses widespread deficiencies in interoperability and code portability by decoupling application logic from underlying tensor-computing frameworks (TCFs), exposing a lightweight, expressive set of core primitives for tensor manipulation, linear algebra, and network contraction. Complementing this, the xGETT interface delivers BLAS-like semantics for binary tensor contraction, providing a rigorous foundation for future TCI standardization (Sun et al., 30 Dec 2025, Hörnblad, 2024).

1. Motivation and Architectural Principles

Traditional tensor-network workflows are tightly coupled to framework-specific APIs—e.g. ITensor, Cytnx, cuTENSOR—leading to prohibitive costs in migration, maintenance, and hardware targeting. TCI mitigates these issues via three strategies:

API Abstraction: Presents a minimal, expressive interface in standard C++ (C++17), supporting both object- and free-function paradigms for tensor manipulation.
Zero-Overhead Design: Utilizes compile-time template resolution for all tensor traits and types, avoiding runtime virtual dispatch and aligning performance with native TCF backends.
Cross-Platform Portability: Allows migration between CPUs, GPUs, supercomputer clusters, and emerging tensor-acceleration hardware (e.g. TPUs) by changing only typedefs and header includes (Sun et al., 30 Dec 2025).

This paradigm enables TN codes, such as those for quantum-circuit simulation or variational algorithms, to be developed independently of the computational back end.

2. Type System and Object Model

TCI introduces a unified type system that defines tensor objects and their metadata abstractly. The central element is the abstract tensor type TenT, together with a traits class tensor_traits<TenT>. The traits interface exposes:

Member Type	Description	Example Type
ten_t	Concrete tensor type	Cytnx::UniTensor
shape_t	Bond-dimension tuple	std::vector<int>
elem_t	Scalar data type	float, double
context_handle_t	Backend resource handle	CPU/GPU structs

All tensor coordinates, shapes, and accessors are indexed according to bond order. Tensor objects can represent dense or (future extension) block-sparse, symmetry-aware forms (Sun et al., 30 Dec 2025). Concrete types and aliases are provided by TCFs at compile time via the tci/tci.h header.

For a tensor $A \in \mathbb{K}^{d_0 \times \cdots \times d_{r-1}}$ , the following are defined:

$order(A) = r$
$shape(A) = (d_0, \dots, d_{r-1})$
Elements accessed by $\mathbf{i} = (i_0, \dots, i_{r-1})$

3. Core API and Functional Categories

TCI supplies approximately fifty free functions, grouped as follows (Sun et al., 30 Dec 2025):

Read-only Queries: order, shape, size, size_bytes, get_elem.
Construction/Destruction: allocate, zeros, fill, random, eye, copy, move, clear.
I/O Operations: load, save.
Manipulation: reshape, transpose (bond permutations), expand/shrink, extract_sub, replace_sub, concatenate, stack, for_each (element traversal).
Linear Algebra Primitives: diag, norm, normalize, scale, trace (partial), exp/inverse (on matricized blocks), contract (Einstein summation), linear_combine, svd, trunc_svd, qr/lq, eigvals/eigh.
Miscellaneous/Debug: to_range, show, close (elementwise equality checks), convert (cross-backend), version.

The principal contraction routine contract(ctx, A, labels_A, B, labels_B, C, labels_C) supports Einstein-index labeling, enabling flexible contractions of arbitrary TN diagrams. All functions accept a context handle carrying backend-specific resources.

4. BLAS-like Interface for Binary Tensor Contraction

The xGETT interface, proposed as the nucleus of a future TCI standard, formalizes binary tensor contraction with strict semantics for mode extents, strides, contractions, and output permutations (Hörnblad, 2024).

The contraction formula is: $C_{i_1 \cdots i_{r} j_1 \cdots j_{s}} = \sum_{k_1=1}^{K_1} \cdots \sum_{k_t=1}^{K_t} A_{i_1 \cdots i_{r} k_1 \cdots k_{t}} \times B_{k_1 \cdots k_{t} j_1 \cdots j_{s}}$

xGETT call signature: $\text{xGETT}(RANKA, EXTA, INCA, A, RANKB, EXTB, INCB, B, CONTS, CONTA, CONTB, PERM, INCC, C)$ Each argument configures the contraction:

RANKA, RANKB: tensor orders.
EXTA, EXTB: mode sizes.
INCA, INCB, INCC: memory strides per mode.
CONTS: number of contracted modes.
CONTA, CONTB: arrays specifying which modes to contract.
PERM: permutation for free indices in output.
C: pointer to result tensor.

The PERM array reorders free output modes, supporting arbitrary output layouts. (Example: For $r=2, s=2$ , PERM=[2,0,3,1] produces output mode ordering [β₁, α₁, β₂, α₂].)

Reference implementations use stride-driven coordinate loops and avoid packing, while high-performance variants block and pack sub-tensors for cache efficiency and vectorization (Hörnblad, 2024).

5. Back-End Integration and Portability

Each TCF provides a C++ header (tci/tci.h) and concrete tensor type mapping to TenT. TCI API calls are compiled to these types, yielding direct calls to the underlying TCF implementations (e.g. Cytnx::contract, cuTENSOR’s cuTensorNet). Context handles encapsulate resources for CPU or GPU computation, supporting hardware-specific scheduling.

TCI is designed to have negligible abstraction overhead (confirmed empirically: <2% in benchmarks), matching native TCF performance whether on Intel Xeon, NVIDIA H100, Apple M2 Ultra, or large-node supercomputing clusters (Sun et al., 30 Dec 2025).

Migrating TN code between back ends requires only a change of typedef and include path.

6. Performance Benchmarks and Implementation

Representative benchmarks include iTEBD ground-state simulations and 2dTNS-based belief propagation (Sun et al., 30 Dec 2025):

iTEBD code for transverse-field Ising model exhibits CPU performance scaling as $\chi^3$ ; TCI abstraction overhead is negligible compared to native APIs.
For large bond dimensions ( $\chi > 1000$ ), GPU implementations yield order-of-magnitude speedups under TCI compared to CPU.
On supercomputers (Fugaku, A64FX), TCI-based codes perform identically to native implementations for $\chi > 64$ , with only minimal divergence at very small size.
Reference xGETT (single-threaded, stride-based) achieves 5 Gflop/s (memory-bound); optimized blocked code (TBLIS, 8 threads) reaches $>$ 100 Gflop/s (Hörnblad, 2024).

This demonstrates that abstraction via TCI (and xGETT) does not incur meaningful performance penalties.

7. Roadmap Toward a Full TCI Standard

The foundational xGETT work and the application-oriented TCI API are converging toward a robust, BLAS-like standard for multilinear algebra (Hörnblad, 2024, Sun et al., 30 Dec 2025). Key elements of this vision include:

Naming conventions and kernel suite: e.g. SGETT/DGETT/CGETT/ZGETT for contractions, xTRANSP for bond permutation, xREDUCE for reduction, and higher-level decompositions (SVD, HOSVD, EIG).
Blocking and packing hints: explicit or autotunable parameters guide backend implementations.
Compatibility layers: link-time aliasing and wrapper classes for seamless legacy BLAS/Tensor migration.
Bindings for high-level languages: Python, Julia, MATLAB, supporting JAX, PyTorch, ITensors.jl, TensorKit.jl integration.
Advanced features: Asynchronous APIs, automatic differentiation, symmetry-aware tensors, and support for block-sparse layouts.
Extensibility: Targeting emerging architectures and specialty hardware.

This suggests that widespread adoption of TCI will catalyze development of portable, efficient tensor computations across quantum, AL/ML, and scientific domains, paralleling the historical impact of BLAS in matrix-centric workflows.

Markdown Report Issue Upgrade to Chat

References (2)

Tensor Computing Interface: An Application-Oriented, Lightweight Interface for Portable High-Performance Tensor Network Applications (2025)

BLAS-like Interface for Binary Tensor Contractions (2024)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tensor Computing Interface (TCI).