TelecomInstruct Dataset Overview

Updated 22 January 2026

TelecomInstruct is a structured, large-scale corpus of 80K–120K instruction-response pairs that captures real telecom workflows, configurations, and automation tasks.
It integrates digital twin simulations, technical documents, community Q&A, and SME curation to deliver context-rich, accurate telecom data.
It is pivotal for domain-specific LLM fine-tuning, enhancing model benchmarking on tasks like configuration, troubleshooting, and protocol planning.

The TelecomInstruct dataset is a structured, large-scale corpus specifically constructed for instruction tuning and evaluation of LLMs in the telecommunications domain. Developed by multiple research groups for use with domain-adapted LLMs such as TSLAM-Mini (Ethiraj et al., 10 May 2025), TelcoLM (Barboule et al., 2024), and TelecomGPT (Zou et al., 2024), TelecomInstruct systematically addresses gaps in general-purpose LLMs by encapsulating complex, real-world telecom use cases, protocol procedures, mathematical modeling, and automation tasks.

1. Dataset Scope, Structure, and Taxonomy

TelecomInstruct comprises between 80,000 and 120,000 instruction–response pairs, depending on instantiation, with each sample purpose-built to reflect actual workflows, Q&A tasks, configuration scenarios, and mathematical modeling relevant to telecom engineering and operations (Ethiraj et al., 10 May 2025, Barboule et al., 2024, Zou et al., 2024).

The categorical structure in the TSLAM-Mini version encompasses 20 use-case categories with approximately 5,000 samples each, representing the following high-level topics:

Category Index	Domain Example	Representative Tasks
1	Network Fundamentals, L2 Switching	VLAN config, STP troubleshooting
2–5	IP Routing, MPLS, Services, QoS	OSPF/BGP configs, MPLS label operations
6–7	Network Security	ACL logic, SIEM logs, VPN debugging
8–9	Network Management	SNMP poll scripts, syslog analysis
10	Network Automation	Ansible playbooks, CI/CD pipeline authoring
11–13	OSS/BSS & Integration	Inventory mapping, TM Forum SID process
14–16	RAN and Core Networks	eNB/gNB setup, AMF/UPF packet tracing
17–18	Satellite/Transport Networks	Link budgets, OTN frame decoding
19	Cloud Networking/Virtualization	SDN/NFV policy design
20	Ethical AI	Bias analysis, privacy violations

Other variants (e.g., in TelcoLM or TelecomGPT) group by task type—MCQ Q&A, code generation, mathematical inference, protocol planning—for comprehensive domain coverage (Barboule et al., 2024, Zou et al., 2024).

A formal data-schema excerpt:

$\begin{align} \mathcal{D} &= \{(x_i, y_i)\}_{i=1}^{N},\quad x_i \in \text{Prompts},\; y_i \in \text{Responses} \ \text{category}(x_i) &\in \{1, \dots, 20\} \ \text{split}(x_i) &\in \{\mathrm{train}, \mathrm{val}, \mathrm{test}\} \end{align}$

2. Data Sourcing and Generation Methodology

The dataset construction process integrates multiple pipelines to achieve granularity and applicability across telecom scenarios:

Digital Twin Simulation (TSLAM-Mini): TelecomInstruct leverages NetoAI DigiTwin to generate device logs, CLI transcripts, and telemetry. Virtual elements (switches, RAN, satellite links) are subjected to realistic scripted events (e.g., link failures, traffic surges) to provide context-rich, high-fidelity samples (Ethiraj et al., 10 May 2025).
Technical and Standards Documents: Extensive ingestion from IETF/3GPP RFCs, ITU, ETSI, IEEE, technical whitepapers. Parsing pipelines (Nougat, PDFMiner, HTML cleaners) and semantic filters extract context-preserving, instruction-relevant spans (Barboule et al., 2024, Zou et al., 2024).
Community Q&A and Code: StackExchange, StackOverflow, GitHub, and academic repositories are mined, filtered, and deduplicated. LLMs such as Zephyr-7B are used to ensure domain relevance (Barboule et al., 2024).
SME Augmentation: Network subject-matter experts author instruction templates, validate outputs (~10% SME review in TSLAM-Mini; consistent batch reviews in TelecomGPT), and manually craft hard examples for protocol corner cases (Ethiraj et al., 10 May 2025, Zou et al., 2024).

Automated and manual curation are balanced—approximately 80% of samples are generated via automated scripting and filtered technical ingestion, with the remainder hand-authored to address specification ambiguities, protocol integration, and multi-domain scenarios.

Example preprocessing pseudocode for chat conversion in TSLAM-Mini:

Algorithm 1: Chat Template Application for Instruction Fine-Tuning Data Preprocessing

Input: example { 'input': user_query, 'output': reference_response }
EOS_TOKEN = "<|endoftext|>"
system_msg    = "<|system|>\nYou are a helpful telecom expert assistant.<|end|>\n"
user_msg      = "<|user|>\n" + example['input'] + "<|end|>\n"
assistant_msg = "<|assistant|>\n" + example['output'] + "<|end|>\n"
chat_template = system_msg + user_msg + assistant_msg + EOS_TOKEN
example['text'] = chat_template
Output: modified example with key 'text'

(Ethiraj et al., 10 May 2025)

3. Instruction Taxonomies and Response Schema

Instruction prompts encompass three main formats:

Configuration tasks: e.g., “Configure OSPF multi-area with authentication.”
Troubleshooting scenarios: e.g., “Interface Gi0/1 flaps—identify root cause.”
Conceptual/theoretical questions: e.g., “Explain BGP path-selection process step by step.” (Ethiraj et al., 10 May 2025)

Responses emphasize explicit, structured reasoning:

Stepwise procedures or bulleted lists.
CLI snippets or code samples with realistic prompts.
Tabular, JSON, or YANG artifact outputs as context demands.
Stated assumptions or operational defaults wherever implicit in practice.

The format across dataset variants follows simple JSON or chat-based templates, with typical fields:

Key	Description
instruction	Natural language task description
input	Context, scenario, or data snippet (optional; varies by task type)
output	Reference ground-truth response or solution
source/type	(Optional) Data provenance and formal task-type
domain_tags	(Optional) High-level domain classification

Training splits and tokenization reflect the LLM base model (Phi-4 Mini, LLaMA, etc.), e.g., TSLAM-Mini uses SentencePiece (200K vocab, 8K context), TelcoLM uses LLaMA-2 tokenizer (Ethiraj et al., 10 May 2025, Barboule et al., 2024).

4. Quality Control, Evaluation, and Annotation

TelecomInstruct integrates automated and human-in-the-loop quality control:

Manual SME Review: 5–10% of samples in each category are double-blind reviewed for semantic and factual accuracy. Corner cases or speculative outputs are scrutinized for correctness (e.g., multi-AS BGP corner cases, protocol edge conditions) (Ethiraj et al., 10 May 2025, Zou et al., 2024).
Automated Judging: Large LLMs (Qwen3-235B-A22B, GPT-4) serve as adjudicators, scoring outputs on instruction following, linguistic quality, technical relevance, and accuracy. Numeric Likert scaling (0–10) or metric-based (ROUGE-L, METEOR, MOS) benchmarks are used.
Targeted Thresholds: TSLAM-Mini targets a mean judge score ≥8.0 per category, no category below 7.5 (Ethiraj et al., 10 May 2025).
Preference Optimization (TelecomGPT): Post-instruct-tuning, low-scoring pairs are used in Direct Preference Optimization (DPO) to enforce precise, context-relevant responses (Zou et al., 2024).

TelecomInstruct sets a precedent in using both LLM-based and expert-based review loops for technical knowledge domains with high semantic density.

5. Applications in Model Adaptation and Benchmarking

TelecomInstruct is integral to instruction tuning and benchmarking of telecom-specialized LLMs:

Domain-Specific Fine-Tuning: Enables supervised adaptation via PEFT, notably Quantized Low-Rank Adaptation (QLoRA), allowing resource-efficient deployment on edge or on-premise hardware (Ethiraj et al., 10 May 2025, Zou et al., 2024).
Pretraining and Instruction-Tuning Pipelines: Central corpus for stage-2 fine-tuning in models such as TelecomGPT and TSLAM-Mini, directly shaping their command of troubleshooting, scripting, code infilling, math modeling, and protocol conformance tasks (Zou et al., 2024).
Benchmarking Impact:
- MCQ (TeleQnA) accuracy gain: +15–18 points (over base LLaMA-2-7B) (Barboule et al., 2024).
- Open-generation task gains: ROUGE-L +0.11, METEOR +0.11–0.12, MOS +0.7 (relative 30% improvements).
- No significant performance degradation on general MCQs when mixing telco‐ and general-purpose instructions (Barboule et al., 2024).

Summary table of impact (TelcoLM ablation, MCQ accuracy):

Adaptation	Telco-MCQ Accuracy	∆ vs. Base
Base LLaMA-2-7B	41.3 %	–
IAPT (telco+gen)	58.7 %	+17.4
DAPT+IAPT (telco+gen)	59.1 %	+17.8

(Barboule et al., 2024)

6. Access, Licensing, and Community Position

TelecomInstruct datasets are typically distributed as JSONL files, with each record containing key-value pairs for instruction, input, output, and metadata. Stratified splits (e.g., 80/10/10 train/validation/test, or 90/5/5 in TSLAM-Mini) are maintained, with SME-validated test sets reserved for final benchmarking (Ethiraj et al., 10 May 2025, Zou et al., 2024).

Licensing varies: the TSLAM-Mini dataset is proprietary to NetoAI, shareable under a CC-BY-NC-ND-4.0 style agreement for research partnerships (Ethiraj et al., 10 May 2025). Community datasets aligned with LLaMA or OpenTelecom base models may have analogous or more permissive academic-use restrictions (Barboule et al., 2024, Zou et al., 2024).

7. Significance in the LLM and Telecom Ecosystem

TelecomInstruct directly addresses the lack of instruction-tuning corpora for domains with dense procedural and protocol content such as telecommunications. Its integration of digital-twin simulation, SME curation, and rigorous benchmarking is cited as enabling operationally useful LLMs capable of automating config generation, troubleshooting, OSS/BSS flows, code/infrastructure automation, and standards conformance validation (Ethiraj et al., 10 May 2025, Barboule et al., 2024, Zou et al., 2024).

Furthermore, empirical results indicate that domain-adapted LLMs utilizing TelecomInstruct can match or exceed generalist models on telecom tasks while preserving performance on general tasks. The corpus supports the development of new benchmarking standards (e.g., Telecom Math Modeling, Protocol Planning), facilitating transparent progress measurement across telecom LLM research.

This corpus is foundational to the instruction-tuning and assessment of telecom-specific model architectures and methods, positioning it as a reference dataset in the intersection of telecommunications and language modeling research.