Papers
Topics
Authors
Recent
Search
2000 character limit reached

CrossPoint: Distributed Switch Architecture

Updated 5 February 2026
  • CrossPoint is a networking architecture that uses per-crosspoint buffering and decentralized scheduling to overcome scalability and latency challenges.
  • It employs the DISQUO algorithm, which leverages local queue data and implicit message passing to maintain 100% throughput with O(1) complexity.
  • The design eliminates head-of-line blocking and output contention, making it ideal for large-scale, high-speed network fabrics.

CrossPoint refers, in high-performance networking contexts, to architectures and algorithms centered around switches with per-crosspoint buffering and distributed control. The term is anchored by the crosspoint-buffered switch, wherein each intersection of the input–output crossbar contains a small FIFO or single-cell buffer, and efficient scheduling is achieved through decentralized, message-limited algorithms. CrossPoint designs resolve the scalability, speed, and complexity barriers of earlier crossbar, input-queued, or output-queued switch fabrics by localizing both buffer resources and scheduling computations, making them especially suited for high-throughput, low-latency, large-scale switches (Ye et al., 2014).

1. Crosspoint-Buffered Switch Architecture

A (N×N)(N\times N) crosspoint-buffered switch introduces a FIFO queue or single-cell buffer at each crosspoint (i,j)(i,j) of the input–output fabric. Each input ii maintains NN virtual output queues VOQijVOQ_{ij}, one per output. Arriving cells at input ii for output jj are admitted to VOQijVOQ_{ij}; subject to a local schedule, such cells are moved (if possible) to the crosspoint buffer CBijCB_{ij}. In each time slot, at most one cell departs or arrives per port. The per-crosspoint buffer (often K=1K=1 cell) decouples the input and output arbitration, enabling inputs and outputs to operate with limited coordination.

Key relationships:

  • Admissibility: jλij<1\sum_j \lambda_{ij} < 1, iλij<1\sum_i \lambda_{ij} < 1 for all i,ji,j (arrival rates λij\lambda_{ij}).
  • Queue evolution for each (i,j)(i,j):

Qij(n+1)=max{Qij(n)SijI(n),0}+Aij(n)Q_{ij}(n+1) = \max\left\{ Q_{ij}(n) - S^I_{ij}(n), 0 \right\} + A_{ij}(n)

where SijIS^I_{ij} is the decision to move a cell from VOQijVOQ_{ij} to CBijCB_{ij}.

  • Crosspoint occupancy Bij(n)B_{ij}(n) tracks the state at the end of each slot (Ye et al., 2014).

2. Scheduling Problem and DISQUO Algorithm

The main challenge is to design scheduling algorithms that guarantee:

  • 100% throughput (i.e., stability for any admissible traffic),
  • Low per-port computational complexity (ideally O(1)O(1)),
  • Minimal or no centralized coordination.

The DISQUO (Distributed Scheduling with Quasi-Ordered Updates) algorithm achieves these by maintaining a partial matching X(n)X(n) — a schedule with at most one active crosspoint per row (input) and per column (output) at every slot nn. The schedule is updated in three phases:

Algorithmic Steps:

  1. Global Coordination via Local Randomness: All ports compute the same pseudo-random permutation H(n)H(n), identifying which crosspoints may be updated.
  2. Local Updates: Inputs and outputs update their selection of crosspoints in X(n)X(n) using local queue lengths Qij(n)Q_{ij}(n), previous buffer occupancies Bij(n1)B_{ij}(n-1), and activation probabilities pij=exp(Wij)/(1+exp(Wij))p_{ij} = \exp(W_{ij}) / (1+\exp(W_{ij})) with Wij=f(Qij(n))W_{ij} = f(Q_{ij}(n)).
  3. Implicit Message Passing: No direct message passing is required. The buffer state (occupied or vacant) acts as a handshake, revealing the peer's intentions via the physical buffer itself.

Pseudocode (Input side, simplified):

1
2
3
4
5
6
7
if (i,j) in X(n-1):
    X_ij(n) = 1 with prob p_ij; else 0
elif no other X_i(j')=1:
    if previous CB_ij was empty:
        X_ij(n) = 1 with prob p_ij
    else:
        X_ij(n) = 0
The output side mirrors the logic, using observed arrivals to infer the input's actions. The weight function f(x)=log(1+x)f(x)=\log(1+x) is recommended (Ye et al., 2014).

3. Throughput and Stability Analysis

The scheduling process X(n)X(n) constitutes a Markov chain on the set of partial matchings. Its instantaneous stationary distribution takes the product-form: πn(X)=1Znexp((i,j)XWij(n))\pi_n(X) = \frac{1}{Z_n} \exp\left( \sum_{(i,j)\in X} W_{ij}(n) \right) Mixing time arguments (Glauber dynamics) and Lyapunov drift show that even though Wij(n)W_{ij}(n) evolves, when queues are large, the empirical distribution concentrates around high-weight matchings.

A necessary condition for 100% throughput is that in most slots, the selected matching X(n)X(n) satisfies: (i,j)X(n)Wij(n)(1ϵ)W(n)\sum_{(i,j)\in X(n)} W_{ij}(n) \geq (1-\epsilon) W^*(n) where W(n)W^*(n) is the maximum sum over all feasible matchings. Under this condition, classic arguments (Tassiulas–Ephremides) guarantee stability: supnE[i,jQij(n)2]<\sup_n \mathbb{E} \left[\sqrt{\sum_{i,j} Q_{ij}(n)^2}\right] < \infty (Ye et al., 2014).

4. Complexity and Message-Passing

DISQUO realizes per-port O(1)O(1) complexity: each port updates just one crosspoint, computes a local activation probability, and checks a 1-bit buffer state per slot. No global matching—of O(N3)O(N^3) complexity—occurs. All communication is implicit: the inputs and outputs interpret the buffer state (empty/full) as a handshake, entirely avoiding explicit message exchange. This is achieved by all ports sharing a synchronized random seed to generate H(n)H(n). Optionally, ports may broadcast a coarse quantization of QmaxQ_{\max} infrequently for rigorous threshold setting; in practice, this can be omitted without causing instability (Ye et al., 2014).

5. Comparison with Prior Scheduling and Switch Designs

The crosspoint-buffered switch with DISQUO contrasts sharply with previous approaches:

Approach Complexity Centralization Throughput Buffer/Speedup
Centralized MWM O(N3)O(N^3) Yes 100% No special buffer, no speedup
CIOQ w/speedup O(N)O(N) Yes 100% Needs speedup (>1>1), more buffers
Maximal MWM via VOQs High Yes 100% Centralized, high per-slot cost
CSMA-like wireless O(1)O(1) Distributed 100% Relies on carrier-sensing
RR-RR, LQF-RR, RR–SBF, DRRM O(1)O(1) Distributed <100% Needs speedup under nonuniform load
CrossPoint (DISQUO) O(1)O(1) Distributed 100% K=1K=1 crosspoint buffer, no speedup

DISQUO is the first distributed, message-passing-free scheduler to provide 100% throughput with a single buffer per crosspoint and no increase in switch speed (Ye et al., 2014).

6. Significance and Impact

By shifting all buffering and scheduling to crosspoint-local resources, CrossPoint architectures eliminate head-of-line blocking and output contention without the overhead and bottlenecks of global matching or speedup. They admit full spatial parallelism and lend themselves to efficient VLSI or high-speed hardware realization, supporting strict throughput and stability even in the presence of dynamic, nonuniform traffic.

This approach directly addresses the core bottleneck in switch scaling—namely, how to achieve maximal utilization with minimal state, localized control, and no centralized bottleneck or per-flow locking—in large, multi-terabit switch fabrics. Research subsequent to (Ye et al., 2014) and related technical reports has extended these concepts to more generalized crosspoint architectures, deeper buffering hierarchies, advanced scheduling policies (e.g., deflection, in-order delivery), and hybrid memory/storage integration.

7. Future Directions and Generalizations

Current directions include:

  • Crosspoint architectures for photonic and optical switching matrices, where distributed spatial and wavelength-selective switches exploit similar topologies (see space-wavelength crosspoint networks).
  • Crosspoint-queued switches for in-network compute fabrics, which integrate deep-buffered, distributed logic for packet processing and flexible function execution.
  • Integration with machine learning accelerators, where crosspoint-based in-memory compute schemes exploit the same localized, parallel architecture for linear algebra operations.
  • Expanding to non-Bernoulli, bursty, or adversarial traffic model robustness, and exploring minimal buffer sizing for strict stochastic guarantees.

Research continues to address device-level scaling, including ultra-dense memristor-based crosspoint cells, 3D integration, and hybrid analog-digital schemes, directly inheriting the architectural principles established in crosspoint-buffered switch scheduling (Ye et al., 2014).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CrossPoint.