Central Repository & Selection Function (CRSF)

Updated 12 November 2025

CRSF is a dual-domain architecture that integrates a centralized repository with a selection function for both conversational AI evaluation and 6G service optimization.
It standardizes benchmarking and resource allocation using systematic data pipelines or binary linear programming to address constraints like QoS and capacity.
Real-world implementations show improved metrics in ML (accuracy in response selection) and network systems (enhanced QoS and resource utilization).

The Central Repository and Selection Function (CRSF) is a term with precise technical meaning in two distinct domains: conversational response selection in large-scale machine learning evaluation pipelines, and beyond-connectivity service discovery and selection in 6G network systems. In both contexts, CRSF denotes an architecture in which a centralized repository of entities (either conversational examples or networked services) is paired with a selection function (either a scoring model or an assignment optimizer) to enable reproducible benchmarking, optimal allocation, or automated discovery.

1. Definition and Foundational Concepts

The concept of CRSF encompasses two main implementations:

Conversational ML Evaluation (Henderson et al., 2019): Here, the Central Repository is a large, standardized data store of context–response pairs, and the Selection Function is a formal task for response selection, evaluated under strict metrics.
6G Network Service-Oriented Architecture (Sharma et al., 5 Nov 2025): In advanced network infrastructures, the CRSF is a 6G Network Function (NF) responsible for maintaining a global service registry and solving, in real time, an assignment problem for requests to service functions (SFs) subject to multi-dimensional Quality of Service (QoS) constraints.

Both systems use the central repository for storage and aggregation, while the selection function enacts decision procedures against this repository, implemented either as a machine learning model or as a Binary Linear Program (BLP) optimizer.

2. CRSF in Machine Learning: Conversational Modeling

The Central Repository in (Henderson et al., 2019) is materialized as a GitHub-hosted project that provides preprocessing scripts, dataset schemas, and benchmarking infrastructure. The repository standardizes three large-scale conversational corpora (Reddit, OpenSubtitles, AmazonQA), exposed in sharded TFRecord files with unified schema:

Each example is encoded with UTF-8 text features: context, response, and optional metadata.
The repository exposes deterministic data splits, consistent filtering (length, deletion markers), and modular pipelines implemented in Apache Beam and Google Cloud Dataflow.

The Selection Function is formalized as a response selection task: for each test pair $(c_i, r_i)$ , generate a candidate set $C_i$ of 100 responses (1 positive, 99 negatives), and select $\hat{r}_i = \arg\max_{r\in C_i} s(c_i, r)$ using a scoring model $s$ . The evaluation metric is $1$-of-$100$ accuracy:

$\mathrm{Acc}_{1\text{-of-}100} = \frac{1}{N} \sum_{i=1}^{N} I_i \,,\ \text{where}\ I_i = \begin{cases} 1 & \text{if } \hat{r}_i = r_i\ 0 & \text{otherwise} \end{cases}$

Baseline implementations include tf-idf, BM25, and embedding-based dot product models, with a competitive neural dual-encoder architecture that uses self-attention, layer normalization, and in-batch negatives for training.

This architecture standardizes reproducibility and benchmarking by coupling openly versioned data pipelines with a reference selection function and evaluation metric.

3. CRSF in 6G Networks: Service Discovery and QoS-Aware Selection

In the context of 6G networks (Sharma et al., 5 Nov 2025), the CRSF is deployed in the 6G core as a logical Network Function which aggregates beyond-connectivity service advertisements from multiple subnetworks and exposes a unified discovery and selection API over service-based architecture protocols. The CRSF consists of two tightly coupled components:

Repository Database: Stores a catalog of all active SFs in each subnetwork, including administrative domain, service categories, vectorized QoS parameters $p_{n,m}$ , and available capacity $C_m$ .
Selection Engine: On each scheduling interval, solves a batch assignment problem where each incoming request $r$ is to be assigned to one SF $m$ , maximizing a priority-weighted QoS objective, under latency and capacity constraints.

APIs for service registration (SF $\to$ CRSF), discovery & selection (AF $\to$ CRSF via NRF), and standard subscription (AF $\to$ SF) are all specified using HTTP/2 and existing 3GPP service endpoints, extended to support the enhanced registry and selection workflow.

The optimization problem solved by the CRSF is:

$\max_{\{\alpha_{r,m}\}} \sum_{r\in\mathcal{R}} \sum_{m\in\mathcal{M}} \alpha_{r,m} S_{r,m} \mathcal{Q}_{r,m}$

subject to:

Latency: $\sum_{m} \alpha_{r,m} L_{r,m} \le \sum_{k} \gamma_{r,k} T_k$ for all $r$ ,
Capacity: $\sum_{r} \sum_k \alpha_{r,m} \gamma_{r,k} U_k \le C_m$ for all $m$ ,
Assignment: $\sum_{m} \alpha_{r,m} \le 1$ for all $r$ ,
Binary assignment: $\alpha_{r,m} \in \{0,1\}$ .

This joint optimization over batch assignments replaces ad hoc selection and enables global resource-aware service sharing.

4. Algorithmic Details and System Implementations

Machine Learning CRSF

Data Generation: Apache Beam/Google Cloud Dataflow pipelines, command-line customizable via flags for filtering, splitting, etc.
Evaluation: Reference scripts (evaluate_baselines.py) compute the $1$-of-$100$ accuracy metric by arranging batches of 100 candidate responses and verifying which model-supplied score is maximal.
Model Training: Dual-encoder neural networks leveraging tokenized unigrams/bigrams, hash-based ID mapping, embedding, attention, dense layers, and training with margin-softmax cross-entropy and in-batch negatives.

Example code for TensorFlow evaluation:

import tensorflow as tf
feature_spec = {
  "context": tf.io.FixedLenFeature([], tf.string),
  "response": tf.io.FixedLenFeature([], tf.string),
}
def parse_example(ex):
  feats = tf.io.parse_single_example(ex, feature_spec)
  return feats["context"], feats["response"]
ds = (tf.data.TFRecordDataset(tf.io.gfile.glob(pattern))
      .map(parse_example)
      .shuffle(10000)
      .batch(100))
for batch in ds:
  contexts, responses = batch
  scores = model.score(contexts, responses)
  # compute top-1 hits and aggregate...

6G Network CRSF

Repository Operations: CRUD via northbound API endpoints for registering, updating, or querying SFs.
Optimization Engine: Exact BLP solved by Gurobi at the scheduling interval granularity, feasible for problem sizes up to several hundred requests and SFs per batch, typically solved in tens of milliseconds to seconds.
Workflows: Network Functions register their updated QoS and capacity; Application Functions (AFs) invoke cross-domain discovery via standardized protocol calls; decisions are relayed back for direct subscription.

The paper does not implement heuristic or large-scale approximations but notes this as a future direction for ultra-large deployments.

5. Performance Metrics and Comparative Evaluation

Conversational ML CRSF

Primary metric: $1$-of-$100$ accuracy (Recall@1 for 100 candidates).
Baselines: tf-idf, BM25, fixed embeddings (USE, ELMo, BERT), trainable mapping variant.
Neural encoder: Demonstrates improved standard evaluation performance when trained on full dataset with margin-based loss and large batch sizes.

Network CRSF for 6G

Metrics: Aggregate priority-weighted QoS, Assignment Success Rate (ASR), Average per-request QoS.
Simulation Setting: 5–20 SFs, 10–200 requests per scheduling slot, realistic QoS parameterization from 3GPP standards, 100 Monte Carlo trials per experiment.
Results:
- Aggregate QoS gain: 20–35% over priority-only baseline with rising request load; 15–30% higher across growings SF count (from 5 to 20).
- ASR: Slightly (1–3%) lower under overload, but both approaches 100% as resource pool grows.
- Per-request QoS: At maximum assignment, CRSF gives ≈10% gain at SFs=20.
- Capacity effect: Increasing $C_m$ from 30 to 50 raises CRSF-benchmark advantage from ≈12% to ≈40%.
This suggests that joint assignment can yield substantial global utility gains compared to local or purely priority-based allocation.

6. Extensibility, Standardization, and Future Directions

The CRSF architecture in ML is agnostic to the corpus source so long as the data format and selection API is maintained; this allows rapid reproducibility and adoption of new conversational domains, as well as future extension to other structured language tasks.

In network systems, the CRSF is designed as a microservice-compatible, modular NF, and is intended for eventual 3GPP standardization:

Service-type micro-SBA functions can be independently added.
API surfaces are based on extensions to Nnrf_ServiceDiscovery and TS 29.510, ensuring compatibility with evolving 6G SBA stacks.
QoS-aware selection algorithms are directly extensible to other domain-specific parameters, e.g., AI/ML analytics, localization, or energy management, by updating the repository schema and assignment logic.
The architecture positions CRSF for inclusion in 6G local area network paper items, and future releases may incorporate learning-based or federated optimization for dynamic, real-time selection.

7. Significance and Research Directions

CRSF in both machine learning and network service sharing provides a principled, reproducible foundation for large-scale open benchmarking or resource allocation tasks that require centralized aggregation of heterogeneous candidates and formalized, constraint-driven selection. In ML, this enables direct, apples-to-apples response selection evaluation at the scale of hundreds of millions of examples. In 6G, CRSF supports cross-domain collaboration, efficient resource pooling, and programmable, extensible service architectures beyond traditional connectivity.

A plausible implication is that, as resource sharing and intelligent orchestration become more central in large-scale AI and networking, the CRSF paradigm—centralized registration with mathematically sound, globally optimal selection—will be a fundamental component of both reproducible research and carrier-grade distributed systems.

PDF Markdown Chat (Pro)

References (2)

A Repository of Conversational Datasets (2019)

CRSF: Enabling QoS-Aware Beyond-Connectivity Service Sharing in 6G Local Networks (2025)

Follow Topic

Get notified by email when new papers are published related to Central Repository and Selection Function (CRSF).