Papers
Topics
Authors
Recent
Search
2000 character limit reached

Tensor Computing Interface (TCI)

Updated 4 January 2026
  • Tensor Computing Interface (TCI) is a standardized API that unifies tensor operations across heterogeneous platforms, enabling efficient tensor-network applications.
  • It uses a zero-overhead design with compile-time templates, ensuring performance parity with native framework implementations on CPUs, GPUs, and supercomputers.
  • The interface offers over 50 free functions and a BLAS-like xGETT for binary tensor contraction, promoting portability and seamless backend integration.

The Tensor Computing Interface (TCI) is a standardized, application-oriented API designed to abstract and unify tensor operations across heterogeneous software and hardware platforms. TCI enables portable, high-performance tensor-network (TN) applications in domains such as quantum science and machine learning. It addresses widespread deficiencies in interoperability and code portability by decoupling application logic from underlying tensor-computing frameworks (TCFs), exposing a lightweight, expressive set of core primitives for tensor manipulation, linear algebra, and network contraction. Complementing this, the xGETT interface delivers BLAS-like semantics for binary tensor contraction, providing a rigorous foundation for future TCI standardization (Sun et al., 30 Dec 2025, Hörnblad, 2024).

1. Motivation and Architectural Principles

Traditional tensor-network workflows are tightly coupled to framework-specific APIs—e.g. ITensor, Cytnx, cuTENSOR—leading to prohibitive costs in migration, maintenance, and hardware targeting. TCI mitigates these issues via three strategies:

  • API Abstraction: Presents a minimal, expressive interface in standard C++ (C++17), supporting both object- and free-function paradigms for tensor manipulation.
  • Zero-Overhead Design: Utilizes compile-time template resolution for all tensor traits and types, avoiding runtime virtual dispatch and aligning performance with native TCF backends.
  • Cross-Platform Portability: Allows migration between CPUs, GPUs, supercomputer clusters, and emerging tensor-acceleration hardware (e.g. TPUs) by changing only typedefs and header includes (Sun et al., 30 Dec 2025).

This paradigm enables TN codes, such as those for quantum-circuit simulation or variational algorithms, to be developed independently of the computational back end.

2. Type System and Object Model

TCI introduces a unified type system that defines tensor objects and their metadata abstractly. The central element is the abstract tensor type TenT, together with a traits class tensor_traits<TenT>. The traits interface exposes:

Member Type Description Example Type
ten_t Concrete tensor type Cytnx::UniTensor
shape_t Bond-dimension tuple std::vector<int>
elem_t Scalar data type float, double
context_handle_t Backend resource handle CPU/GPU structs

All tensor coordinates, shapes, and accessors are indexed according to bond order. Tensor objects can represent dense or (future extension) block-sparse, symmetry-aware forms (Sun et al., 30 Dec 2025). Concrete types and aliases are provided by TCFs at compile time via the tci/tci.h header.

For a tensor AKd0××dr1A \in \mathbb{K}^{d_0 \times \cdots \times d_{r-1}}, the following are defined:

  • order(A)=rorder(A) = r
  • shape(A)=(d0,,dr1)shape(A) = (d_0, \dots, d_{r-1})
  • Elements accessed by i=(i0,,ir1)\mathbf{i} = (i_0, \dots, i_{r-1})

3. Core API and Functional Categories

TCI supplies approximately fifty free functions, grouped as follows (Sun et al., 30 Dec 2025):

  • Read-only Queries: order, shape, size, size_bytes, get_elem.
  • Construction/Destruction: allocate, zeros, fill, random, eye, copy, move, clear.
  • I/O Operations: load, save.
  • Manipulation: reshape, transpose (bond permutations), expand/shrink, extract_sub, replace_sub, concatenate, stack, for_each (element traversal).
  • Linear Algebra Primitives: diag, norm, normalize, scale, trace (partial), exp/inverse (on matricized blocks), contract (Einstein summation), linear_combine, svd, trunc_svd, qr/lq, eigvals/eigh.
  • Miscellaneous/Debug: to_range, show, close (elementwise equality checks), convert (cross-backend), version.

The principal contraction routine contract(ctx, A, labels_A, B, labels_B, C, labels_C) supports Einstein-index labeling, enabling flexible contractions of arbitrary TN diagrams. All functions accept a context handle carrying backend-specific resources.

4. BLAS-like Interface for Binary Tensor Contraction

The xGETT interface, proposed as the nucleus of a future TCI standard, formalizes binary tensor contraction with strict semantics for mode extents, strides, contractions, and output permutations (Hörnblad, 2024).

The contraction formula is: Ci1irj1js=k1=1K1kt=1KtAi1irk1kt×Bk1ktj1jsC_{i_1 \cdots i_{r} j_1 \cdots j_{s}} = \sum_{k_1=1}^{K_1} \cdots \sum_{k_t=1}^{K_t} A_{i_1 \cdots i_{r} k_1 \cdots k_{t}} \times B_{k_1 \cdots k_{t} j_1 \cdots j_{s}}

xGETT call signature: xGETT(RANKA,EXTA,INCA,A,RANKB,EXTB,INCB,B,CONTS,CONTA,CONTB,PERM,INCC,C)\text{xGETT}(RANKA, EXTA, INCA, A, RANKB, EXTB, INCB, B, CONTS, CONTA, CONTB, PERM, INCC, C) Each argument configures the contraction:

  • RANKA, RANKB: tensor orders.
  • EXTA, EXTB: mode sizes.
  • INCA, INCB, INCC: memory strides per mode.
  • CONTS: number of contracted modes.
  • CONTA, CONTB: arrays specifying which modes to contract.
  • PERM: permutation for free indices in output.
  • C: pointer to result tensor.

The PERM array reorders free output modes, supporting arbitrary output layouts. (Example: For r=2,s=2r=2, s=2, PERM=[2,0,3,1] produces output mode ordering [β₁, α₁, β₂, α₂].)

Reference implementations use stride-driven coordinate loops and avoid packing, while high-performance variants block and pack sub-tensors for cache efficiency and vectorization (Hörnblad, 2024).

5. Back-End Integration and Portability

Each TCF provides a C++ header (tci/tci.h) and concrete tensor type mapping to TenT. TCI API calls are compiled to these types, yielding direct calls to the underlying TCF implementations (e.g. Cytnx::contract, cuTENSOR’s cuTensorNet). Context handles encapsulate resources for CPU or GPU computation, supporting hardware-specific scheduling.

TCI is designed to have negligible abstraction overhead (confirmed empirically: <2% in benchmarks), matching native TCF performance whether on Intel Xeon, NVIDIA H100, Apple M2 Ultra, or large-node supercomputing clusters (Sun et al., 30 Dec 2025).

Migrating TN code between back ends requires only a change of typedef and include path.

6. Performance Benchmarks and Implementation

Representative benchmarks include iTEBD ground-state simulations and 2dTNS-based belief propagation (Sun et al., 30 Dec 2025):

  • iTEBD code for transverse-field Ising model exhibits CPU performance scaling as χ3\chi^3; TCI abstraction overhead is negligible compared to native APIs.
  • For large bond dimensions (χ>1000\chi > 1000), GPU implementations yield order-of-magnitude speedups under TCI compared to CPU.
  • On supercomputers (Fugaku, A64FX), TCI-based codes perform identically to native implementations for χ>64\chi > 64, with only minimal divergence at very small size.
  • Reference xGETT (single-threaded, stride-based) achieves 5 Gflop/s (memory-bound); optimized blocked code (TBLIS, 8 threads) reaches >>100 Gflop/s (Hörnblad, 2024).

This demonstrates that abstraction via TCI (and xGETT) does not incur meaningful performance penalties.

7. Roadmap Toward a Full TCI Standard

The foundational xGETT work and the application-oriented TCI API are converging toward a robust, BLAS-like standard for multilinear algebra (Hörnblad, 2024, Sun et al., 30 Dec 2025). Key elements of this vision include:

  • Naming conventions and kernel suite: e.g. SGETT/DGETT/CGETT/ZGETT for contractions, xTRANSP for bond permutation, xREDUCE for reduction, and higher-level decompositions (SVD, HOSVD, EIG).
  • Blocking and packing hints: explicit or autotunable parameters guide backend implementations.
  • Compatibility layers: link-time aliasing and wrapper classes for seamless legacy BLAS/Tensor migration.
  • Bindings for high-level languages: Python, Julia, MATLAB, supporting JAX, PyTorch, ITensors.jl, TensorKit.jl integration.
  • Advanced features: Asynchronous APIs, automatic differentiation, symmetry-aware tensors, and support for block-sparse layouts.
  • Extensibility: Targeting emerging architectures and specialty hardware.

This suggests that widespread adoption of TCI will catalyze development of portable, efficient tensor computations across quantum, AL/ML, and scientific domains, paralleling the historical impact of BLAS in matrix-centric workflows.

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Tensor Computing Interface (TCI).