Papers
Topics
Authors
Recent
Search
2000 character limit reached

Semantics Over Syntax: Uncovering Pre-Authentication 5G Baseband Vulnerabilities

Published 5 Apr 2026 in cs.CR | (2604.04283v1)

Abstract: Modern 5G user equipment (UE) processes Radio Resource Control (RRC) configuration messages during early control-plane exchanges, before authentication and integrity protection are established. Prior work for testing 5G UEs has largely focused on constructing syntactically invalid inputs. In contrast, we show that syntactically valid but semantically inconsistent messages, which violate specification-level field constraints or cross-field dependencies, can drive baseband implementations into invalid states, triggering assertion failures or modem crashes. These findings reveal semantic inconsistencies in pre-authentication signaling as a critical yet underexplored attack surface in 5G UE implementations. To address this gap, we present Constraint-Guided Semantic Testing (ConSeT), a framework that systematically extracts specification-level constraints and leverages them to generate targeted semantic violations for testing 5G UEs. ConSeT decodes RRC messages into structured fields, derives schema-based rules, infers cross-field dependencies using a LLM in an evidence-bounded manner, and produces syntactically valid test cases that intentionally violate semantic constraints. We evaluate ConSeT on both commercial and open-source 5G UEs. On commercial smartphones, it uncovers 7 previously unknown vulnerabilities through responsible disclosure, including 3 high-severity CVEs, affecting 64 chipset models and over 542 commercially available smartphone models. On the open-source OAI UE, ConSeT additionally triggers 29 distinct crash sites.

Summary

  • The paper presents a constraint-guided semantic testing framework (CGST) that systematically extracts and violates ASN.1 semantic constraints to reveal pre-authentication 5G baseband vulnerabilities.
  • It combines deterministic rule extraction from 3GPP specifications with LLM-guided inference to generate minimally violating test cases, triggering modem crashes and denial-of-service issues.
  • Empirical tests on commercial and open-source 5G UEs demonstrate the discovery of seven new vulnerabilities and 29 crash sites, underscoring the need for enhanced semantic validation.

Uncovering Pre-Authentication 5G Baseband Vulnerabilities through Constraint-Guided Semantic Testing

Introduction

Pre-authentication processing in 5G User Equipment (UE) exposes an under-studied but critical attack surface: unprotected Radio Resource Control (RRC) configuration messages. While previous works have majorly focused on syntactic anomalies or memory corruption, their narrow scope largely overlooks vulnerabilities arising from semantic inconsistencies—cases where ASN.1-encoded messages are structurally valid but violate specification-defined semantic constraints. This paper presents a systematic methodology and toolchain, Constraint-Guided Semantic Testing (CGST, anonymized in the draft as \ProjectName{}), that extracts these semantic rules from 3GPP specifications and synthesizes test cases that intentionally violate them. The paper demonstrates that this approach yields previously unknown vulnerabilities with substantial practical implications for both commercial and open-source 5G UE implementations (2604.04283). Figure 1

Figure 1: The 5G connection procedure, highlighting the pre-authentication phase targeted in semantic testing.

Semantic Constraint Taxonomy and Threat Model

A four-class taxonomy is introduced to structure field-level constraints:

  1. Field Value Ranges: Numeric or enumerant bounds, explicitly specified in ASN.1 schemas.
  2. Field Presence Constraints: Optionals governed by “need codes” (e.g., maintain or remove on absence).
  3. Intra-IE Field Dependencies: Relations among fields within an information element (IE), typically implicit in normative text.
  4. Inter-IE Field Dependencies: Constraints crossing IE boundaries, such as cross-references and relational invariants.

The adversary model presumes a rogue gNodeB controlled via off-the-shelf SDRs; the attacker exploits the plain-text, unauthenticated phase to inject messages that are syntactically correct but semantically inconsistent. This enables systematic, OTA-triggered robustness testing across a spectrum of real and emulated UEs. Figure 2

Figure 2: Threat model assuming adversarial control over a rogue gNodeB in the pre-authentication phase.

CGST Framework: Architecture and Methodology

CGST decodes every RRC message into both a hierarchical IE tree (preserving ASN.1 semantics) and a flat, field-annotated view for direct mutation. It extracts constraints via two paths:

  • Deterministic Extraction: For value range and presence constraints, rules are parsed directly from TS 38.331.
  • LLM-Guided Extraction: For intra- and inter-IE dependencies, a LLM (LLM, specifically GPT-4o) is seeded with strong evidence packs—ASN.1 definitions, normative snippets, cross-references—from 3GPP 38.2xx documents. Strict filtering and normalization yield a compact domain-specific language (DSL) rule set that explicitly encodes field dependencies and value alignments. Figure 3

    Figure 3: Design overview of CGST illustrating dual-abstraction decoding, constraint extraction from spec and natural language text, rule synthesis, and mutation pipeline.

Automated test case generation leverages the DSL: for each rule, CGST computes a minimally violating message that remains schema-conformant but breaks exactly one semantic predicate. This enables precise attribution between input perturbation and observed failures. Figure 4

Figure 4: Example of evidence-bound DSL rule induction, showing extraction of a cross-IE dependency from 3GPP normative text into an executable predicate.

Case Study: Semantic Constraint Violations

The evaluation both in simulation (OpenAirInterface—OAI) and real-world OTA campaigns (on multiple commercial 5G smartphones) is comprehensive. Several classes of errors are showcased:

Value Range Violations

CGST modifies specific fields just beyond declared ASN.1 bounds (e.g., startPosition ∉ [0,5]), yielding messages that are structurally valid but semantically incorrect. Devices process these messages until failures arise in logic tethered to the constraint, resulting in crashes or persistent unresponsiveness. Figure 5

Figure 5: Value range constraint violation delivered to OAI UE and commercial devices leads to crash and denial-of-service.

Presence Constraint Violations

By leveraging “need code” annotations that specify behavior for absent optionals, CGST induces configurations the implementation fails to handle (e.g., absence of spatialRelationInfo). The essential insight is that the presence/absence semantics, if mishandled, can cause fatal dereferences or inconsistent state without syntactic errors. Figure 6

Figure 6: Presence constraint violation—removing an optional field with a need code annotation—causes a deterministic crash.

Intra-IE/Inter-IE Dependency Violations

Rules such as “field X must be greater than field Y” (intra-IE) or “field A in IE1 matches field B in IE2” (inter-IE) are induced from cross-document 3GPP prose and exercised via targeted, evidence-bound mutations. These typically surface as assertion failures in both open-source and proprietary device logs. Figure 7

Figure 7: Intra-IE field constraint example, showing dependency between SRS-related fields extracted as a DSL predicate.

Figure 8

Figure 8: Inter-IE field constraint example—coherence between SearchSpace.controlResourceSetId and ControlResourceSet.controlResourceSetId.

Empirical Results

Commercial Devices

Testing eight 5G commercial smartphones across four chipset families, CGST identifies seven previously unknown vulnerabilities (three assigned CVEs) affecting at least 64 chipset models and 542 phone SKUs. The failures predominantly manifest as modem crashes and persistent denial-of-service, necessitating reboots. Remarkably, each failing input is mapped to a single violated semantic constraint, underscoring precise root-cause attribution. Figure 9

Figure 9: Smartphone brands affected by confirmed flaws—demonstrating the widespread impact of identified vulnerabilities.

Open-Source OAI UE

On OAI UE, 29 distinct crash sites are induced, spanning all four constraint classes. GDB-assisted triage localizes root causes, confirming that semantic misconfigurations in RRC setup induce faults in MAC, PHY, and simulation logic distinct from those surfaced by syntactic grammars or memory fuzzing.

Baselines

Compared to exhaustive enumeration and grammar-based mutation, constraint-guided semantic testing yields orders of magnitude higher coverage/failure per test (1458 constraint-guided vs 30,600 enumerated for sampled field pairs), exposes qualitatively richer bugs, and is operationally viable under OTA resource budgets.

Analysis and Discussion

Impact and Implications

The findings establish that semantic inconsistencies in 5G RRC handling are pervasive and easily reachable through unauthenticated message injection. Neither current OTA-based fuzzers nor conformance checkers robustly model or test these classes of dependencies, implying that a wide swath of the deployed smartphone baseband stacks remains at risk from straightforward, targeted DoS.

CGST’s methodology—combining schema parsing, evidence-bound LLM inference, and DSL normalization—demonstrates that even with natural-language specifications, it is feasible to automatically extract and operationalize complex, cross-field protocol invariants at scale. This technique is generalizable (preliminary evidence on NGAP and F1AP, Table included in the paper) to other ASN.1-driven protocols assuming reformulation of message and field selection.

LLM Admissibility and Hallucination Mitigation

The LLM is constrained to operate only on syntactic and phrase-level evidence slices, always required to yield explicit citations and run through verification gates. Ablation studies confirm that cross-document natural-language evidence is essential for high recall, while over-generation and hallucination risks are sharply reduced by slice-based evidence packaging, citation enforcement, and strict DSL normalization.

Limitations

Coverage of multi-message, capability, or stateful protocol-level constraints is limited due to a focus on per-message semantic conditions. Some specification ambiguities and errors in extracting mathematical table relations may persist. Future directions include extending the DSL expressivity to temporal and state/feature negotiation constraints, as well as scaling the evidence extraction pipeline for broader ASN.1 ecosystems.

Conclusion

The study reveals that semantic inconsistencies—structurally syntactic but semantically invalid configurations—constitute a broad, exploitable attack surface in pre-authentication 5G basebands. By systematically extracting, expressing, and testing semantic constraints using a hybrid deterministic and LLM-guided methodology, the authors have demonstrated effective discovery and attribution of critical vulnerabilities in both open-source and commercial UEs. The practical implication is an urgent need for protocol implementers and standard bodies to integrate explicit semantic validation layers in RRC and analogous 5G protocol stanzas. Techniques exemplified by CGST are poised for adoption in both assurance tooling and next-generation security-oriented test generation.

References

All claims, methodology, and results referenced are detailed in "Semantics Over Syntax: Uncovering Pre-Authentication 5G Baseband Vulnerabilities" (2604.04283).

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.