Papers
Topics
Authors
Recent
Search
2000 character limit reached

Global Control Store (GCS) in Ray

Updated 13 April 2026
  • Global Control Store (GCS) is a distributed, fault-tolerant, and horizontally scalable key-value store designed for managing Ray's control metadata.
  • It decouples the control state from schedulers and workers, supporting sub-millisecond latencies and robust lineage-based fault tolerance.
  • GCS uses sharding and chain-replication techniques to ensure consistent, reactive metadata updates under massive distributed loads.

The Global Control Store (GCS) is the distributed, fault-tolerant, and horizontally scalable control-plane storage subsystem designed for the Ray execution engine. Ray targets emerging AI workloads that produce extremely high task and object throughput, such as reinforcement learning applications spawning millions of tasks per second on large-scale clusters. GCS abstracts and manages the lineage, object locations, actor metadata, and function registration, serving as the authoritative store for all control state in the Ray system. By decoupling control state from schedulers and workers, GCS eliminates scheduling bottlenecks and enables robust, fine-grained, and scalable management of distributed tasks, actors, and objects (Moritz et al., 2017).

1. Motivation, Requirements, and Design Objectives

Ray's unified task-parallel and actor-based execution model places unprecedented requirements on the control plane. Fine-grained scheduling, rapid metadata dissemination, and lineage-based system recovery drive the need for a global state store capable of sub-millisecond latencies and millions of operations per second. Centralized masters become bottlenecks under these loads, particularly in scheduling decisions and metadata lookups.

GCS was designed to address:

  • High throughput and low latency: Support millions of key-value operations per second with single- to sub-millisecond latencies.
  • Horizontal scalability: Achieve scalable control-plane storage through sharding, decoupling state storage from both workers and schedulers.
  • Fault tolerance with strong consistency: Ensure deterministic replay and exactly-once semantics for lineage entries, while surviving shard failures with minimal client-observed delay.

A key motivation is robust support for lineage-based fault tolerance: tasks in Ray are stateless and idempotent, while actors are stateful and require checkpointing. Comprehensive, durable lineage and metadata management in GCS is essential for transparent task re-execution and actor recovery upon node or component failure, without application code changes (Moritz et al., 2017).

2. Data Model and Metadata Schema

GCS is fundamentally a sharded and replicated key-value store, augmented with publish-subscribe channels for reactive metadata updates. The schema comprises four principal tables:

Table Key Value / Purpose
FunctionTable FunctionID Serialized remote-function or actor-constructor code and resource requirements
TaskTable TaskID {dependencies: List<ObjectID>, return_ids: List<ObjectID>, spec: TaskSpec, state, attempt_count}
ObjectTable ObjectID {locations: Set<NodeID>, size_bytes: Int, creation_task: TaskID}; pub-sub notification
ActorTable ActorID {home_node: NodeID, checkpoint_location: Optional<ObjectID>, last_method_seq: Int}

The formal mappings are:

  • ObjectTable: O2NodeID×NO \rightarrow 2^{\text{NodeID}} \times \mathbb{N}
  • TaskTable: T(List O)×(List O)×TaskSpec×TaskState×NT \rightarrow (\text{List}\ O) \times (\text{List}\ O) \times \text{TaskSpec} \times \text{TaskState} \times \mathbb{N}
  • ActorTable: ANodeID×Option O×NA \rightarrow \text{NodeID} \times \text{Option}\ O \times \mathbb{N}

Each table is logically partitioned into SS shards by shard_index(key)=hash(key) mod Sshard\_index(key) = hash(key)\ \bmod\ S.

3. System Architecture and Sharding

Ray deploys a GCS cluster alongside its schedulers, object stores, and worker processes. The GCS is organized into SS logical shards, each implemented as a chain of Redis instances (r1r2rm)(r_1 \rightarrow r_2 \rightarrow \ldots \rightarrow r_m). This architecture follows a lightweight chain-replication protocol, ensuring strong consistency for each key.

Client Access Pattern:

  • Each Ray process (driver, worker, or scheduler) interacts with GCS via a client library.
  • The client:

    1. Computes shard_idx=hash(key) mod Sshard\_idx = hash(key)\ \bmod\ S
    2. Locates the primary (head of chain r1r_1) for the shard
    3. Issues a unary RPC (GetGet or T(List O)×(List O)×TaskSpec×TaskState×NT \rightarrow (\text{List}\ O) \times (\text{List}\ O) \times \text{TaskSpec} \times \text{TaskState} \times \mathbb{N}0) to T(List O)×(List O)×TaskSpec×TaskState×NT \rightarrow (\text{List}\ O) \times (\text{List}\ O) \times \text{TaskSpec} \times \text{TaskState} \times \mathbb{N}1
    4. For T(List O)×(List O)×TaskSpec×TaskState×NT \rightarrow (\text{List}\ O) \times (\text{List}\ O) \times \text{TaskSpec} \times \text{TaskState} \times \mathbb{N}2, T(List O)×(List O)×TaskSpec×TaskState×NT \rightarrow (\text{List}\ O) \times (\text{List}\ O) \times \text{TaskSpec} \times \text{TaskState} \times \mathbb{N}3 forwards the request down the chain; the tail (T(List O)×(List O)×TaskSpec×TaskState×NT \rightarrow (\text{List}\ O) \times (\text{List}\ O) \times \text{TaskSpec} \times \text{TaskState} \times \mathbb{N}4) acknowledges up the chain on completion
  • Subscriptions are handled through Redis pub-sub on the corresponding shard's channel.

Scalability is achieved by increasing T(List O)×(List O)×TaskSpec×TaskState×NT \rightarrow (\text{List}\ O) \times (\text{List}\ O) \times \text{TaskSpec} \times \text{TaskState} \times \mathbb{N}5, redistributing key ranges as load demands. Stateles schedulers and object stores read or cache metadata as needed, without central bottlenecks (Moritz et al., 2017).

4. Distributed Algorithms: Fault Tolerance and Consistency

GCS enforces per-shard chain replication for durability and strong per-key consistency. Write and read protocols are explicitly defined:

  • Write protocol (put):

T(List O)×(List O)×TaskSpec×TaskState×NT \rightarrow (\text{List}\ O) \times (\text{List}\ O) \times \text{TaskSpec} \times \text{TaskState} \times \mathbb{N}6

  • Read protocol: Clients typically issue reads to the primary, benefiting from up-to-date, consistent state.

This scheme enables:

  • Consistent and deterministic lineage replay
  • Exactly-once semantics for metadata entries
  • Transparent shard/replica recovery with minimal delay

Pub-sub notification allows clients (e.g., workers expecting a particular object) to react immediately when objects become available or their locations change.

5. Interaction with Ray Components and Execution Engine

Schedulers in Ray are stateless and interact with GCS to fetch live metadata required for scheduling decisions. This decoupling removes the “critical path” dependency on schedulers for task dispatch and object management, allowing rapid and distributed scaling. Upon failures, all restarted components (drivers, workers, schedulers) can read necessary lineage and object/actor metadata from the durable GCS for reconstruction and recovery.

Tasks—being stateless and idempotent—are tracked via lineage stored in the TaskTable, enabling exact replay. Actors—being stateful—are recovered via ActorTable entries, which record last checkpoint locations and execution sequence numbers. ObjectTable subscription channels notify interested parties when objects appear or disappear from the cluster, enabling event-driven distributed coordination (Moritz et al., 2017).

6. Scalability and Performance Characteristics

GCS supports scaling beyond 1.8 million tasks per second with sub-millisecond to low single-digit millisecond control state operation latencies, as required for contemporary reinforcement learning workloads. Horizontal scalability is realized through the increase in shard count and chain reconfiguration. Strong consistency, fine-grained parallelism, and rapid pub-sub updates ensure the GCS does not become a limiting factor under massive distributed load.

A plausible implication is that GCS’s architectural patterns could generalize to other high-throughput distributed control-store scenarios, especially those requiring strong metadata consistency, lineage-based recovery, and fine-grained reactivity (Moritz et al., 2017).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Global Control Store (GCS).