CrossPoint: Distributed Switch Architecture

Updated 5 February 2026

CrossPoint is a networking architecture that uses per-crosspoint buffering and decentralized scheduling to overcome scalability and latency challenges.
It employs the DISQUO algorithm, which leverages local queue data and implicit message passing to maintain 100% throughput with O(1) complexity.
The design eliminates head-of-line blocking and output contention, making it ideal for large-scale, high-speed network fabrics.

CrossPoint refers, in high-performance networking contexts, to architectures and algorithms centered around switches with per-crosspoint buffering and distributed control. The term is anchored by the crosspoint-buffered switch, wherein each intersection of the input–output crossbar contains a small FIFO or single-cell buffer, and efficient scheduling is achieved through decentralized, message-limited algorithms. CrossPoint designs resolve the scalability, speed, and complexity barriers of earlier crossbar, input-queued, or output-queued switch fabrics by localizing both buffer resources and scheduling computations, making them especially suited for high-throughput, low-latency, large-scale switches (Ye et al., 2014).

1. Crosspoint-Buffered Switch Architecture

A $(N\times N)$ crosspoint-buffered switch introduces a FIFO queue or single-cell buffer at each crosspoint $(i,j)$ of the input–output fabric. Each input $i$ maintains $N$ virtual output queues $VOQ_{ij}$ , one per output. Arriving cells at input $i$ for output $j$ are admitted to $VOQ_{ij}$ ; subject to a local schedule, such cells are moved (if possible) to the crosspoint buffer $CB_{ij}$ . In each time slot, at most one cell departs or arrives per port. The per-crosspoint buffer (often $K=1$ cell) decouples the input and output arbitration, enabling inputs and outputs to operate with limited coordination.

Key relationships:

Admissibility: $\sum_j \lambda_{ij} < 1$ , $\sum_i \lambda_{ij} < 1$ for all $i,j$ (arrival rates $\lambda_{ij}$ ).
Queue evolution for each $(i,j)$ :

$Q_{ij}(n+1) = \max\left\{ Q_{ij}(n) - S^I_{ij}(n), 0 \right\} + A_{ij}(n)$

where $S^I_{ij}$ is the decision to move a cell from $VOQ_{ij}$ to $CB_{ij}$ .

Crosspoint occupancy $B_{ij}(n)$ tracks the state at the end of each slot (Ye et al., 2014).

2. Scheduling Problem and DISQUO Algorithm

The main challenge is to design scheduling algorithms that guarantee:

100% throughput (i.e., stability for any admissible traffic),
Low per-port computational complexity (ideally $O(1)$ ),
Minimal or no centralized coordination.

The DISQUO (Distributed Scheduling with Quasi-Ordered Updates) algorithm achieves these by maintaining a partial matching $X(n)$ — a schedule with at most one active crosspoint per row (input) and per column (output) at every slot $n$ . The schedule is updated in three phases:

Algorithmic Steps:

Global Coordination via Local Randomness: All ports compute the same pseudo-random permutation $H(n)$ , identifying which crosspoints may be updated.
Local Updates: Inputs and outputs update their selection of crosspoints in $X(n)$ using local queue lengths $Q_{ij}(n)$ , previous buffer occupancies $B_{ij}(n-1)$ , and activation probabilities $p_{ij} = \exp(W_{ij}) / (1+\exp(W_{ij}))$ with $W_{ij} = f(Q_{ij}(n))$ .
Implicit Message Passing: No direct message passing is required. The buffer state (occupied or vacant) acts as a handshake, revealing the peer's intentions via the physical buffer itself.

Pseudocode (Input side, simplified):

if (i,j) in X(n-1):
    X_ij(n) = 1 with prob p_ij; else 0
elif no other X_i(j')=1:
    if previous CB_ij was empty:
        X_ij(n) = 1 with prob p_ij
    else:
        X_ij(n) = 0

The output side mirrors the logic, using observed arrivals to infer the input's actions. The weight function

f(x)=\log(1+x)

is recommended (Ye et al., 2014).

3. Throughput and Stability Analysis

The scheduling process $X(n)$ constitutes a Markov chain on the set of partial matchings. Its instantaneous stationary distribution takes the product-form: $\pi_n(X) = \frac{1}{Z_n} \exp\left( \sum_{(i,j)\in X} W_{ij}(n) \right)$ Mixing time arguments (Glauber dynamics) and Lyapunov drift show that even though $W_{ij}(n)$ evolves, when queues are large, the empirical distribution concentrates around high-weight matchings.

A necessary condition for 100% throughput is that in most slots, the selected matching $X(n)$ satisfies: $\sum_{(i,j)\in X(n)} W_{ij}(n) \geq (1-\epsilon) W^*(n)$ where $W^*(n)$ is the maximum sum over all feasible matchings. Under this condition, classic arguments (Tassiulas–Ephremides) guarantee stability: $\sup_n \mathbb{E} \left[\sqrt{\sum_{i,j} Q_{ij}(n)^2}\right] < \infty$ (Ye et al., 2014).

4. Complexity and Message-Passing

DISQUO realizes per-port $O(1)$ complexity: each port updates just one crosspoint, computes a local activation probability, and checks a 1-bit buffer state per slot. No global matching—of $O(N^3)$ complexity—occurs. All communication is implicit: the inputs and outputs interpret the buffer state (empty/full) as a handshake, entirely avoiding explicit message exchange. This is achieved by all ports sharing a synchronized random seed to generate $H(n)$ . Optionally, ports may broadcast a coarse quantization of $Q_{\max}$ infrequently for rigorous threshold setting; in practice, this can be omitted without causing instability (Ye et al., 2014).

5. Comparison with Prior Scheduling and Switch Designs

The crosspoint-buffered switch with DISQUO contrasts sharply with previous approaches:

Approach	Complexity	Centralization	Throughput	Buffer/Speedup
Centralized MWM	$O(N^3)$	Yes	100%	No special buffer, no speedup
CIOQ w/speedup	$O(N)$	Yes	100%	Needs speedup ( $>1$ ), more buffers
Maximal MWM via VOQs	High	Yes	100%	Centralized, high per-slot cost
CSMA-like wireless	$O(1)$	Distributed	100%	Relies on carrier-sensing
RR-RR, LQF-RR, RR–SBF, DRRM	$O(1)$	Distributed	<100%	Needs speedup under nonuniform load
CrossPoint (DISQUO)	$O(1)$	Distributed	100%	$K=1$ crosspoint buffer, no speedup

DISQUO is the first distributed, message-passing-free scheduler to provide 100% throughput with a single buffer per crosspoint and no increase in switch speed (Ye et al., 2014).

6. Significance and Impact

By shifting all buffering and scheduling to crosspoint-local resources, CrossPoint architectures eliminate head-of-line blocking and output contention without the overhead and bottlenecks of global matching or speedup. They admit full spatial parallelism and lend themselves to efficient VLSI or high-speed hardware realization, supporting strict throughput and stability even in the presence of dynamic, nonuniform traffic.

This approach directly addresses the core bottleneck in switch scaling—namely, how to achieve maximal utilization with minimal state, localized control, and no centralized bottleneck or per-flow locking—in large, multi-terabit switch fabrics. Research subsequent to (Ye et al., 2014) and related technical reports has extended these concepts to more generalized crosspoint architectures, deeper buffering hierarchies, advanced scheduling policies (e.g., deflection, in-order delivery), and hybrid memory/storage integration.

7. Future Directions and Generalizations

Current directions include:

Crosspoint architectures for photonic and optical switching matrices, where distributed spatial and wavelength-selective switches exploit similar topologies (see space-wavelength crosspoint networks).
Crosspoint-queued switches for in-network compute fabrics, which integrate deep-buffered, distributed logic for packet processing and flexible function execution.
Integration with machine learning accelerators, where crosspoint-based in-memory compute schemes exploit the same localized, parallel architecture for linear algebra operations.
Expanding to non-Bernoulli, bursty, or adversarial traffic model robustness, and exploring minimal buffer sizing for strict stochastic guarantees.

Research continues to address device-level scaling, including ultra-dense memristor-based crosspoint cells, 3D integration, and hybrid analog-digital schemes, directly inheriting the architectural principles established in crosspoint-buffered switch scheduling (Ye et al., 2014).

Markdown Report Issue Upgrade to Chat

References (1)

Distributed Scheduling Algorithms for Crosspoint-Buffered Switches (2014)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CrossPoint.

CrossPoint: Distributed Switch Architecture

1. Crosspoint-Buffered Switch Architecture

2. Scheduling Problem and DISQUO Algorithm

Algorithmic Steps:

Pseudocode (Input side, simplified):

3. Throughput and Stability Analysis

4. Complexity and Message-Passing

5. Comparison with Prior Scheduling and Switch Designs

6. Significance and Impact

7. Future Directions and Generalizations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

CrossPoint: Distributed Switch Architecture

1. Crosspoint-Buffered Switch Architecture

2. Scheduling Problem and DISQUO Algorithm

Algorithmic Steps:

Pseudocode (Input side, simplified):

3. Throughput and Stability Analysis

4. Complexity and Message-Passing

5. Comparison with Prior Scheduling and Switch Designs

6. Significance and Impact

7. Future Directions and Generalizations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research