SysVec: Unified Vector Systems

Updated 29 September 2025

SysVec is a vector-based framework that encodes system-level controls into latent vectors across diverse domains such as LLMs, vector search, and RISC-V simulation.
It enhances security and efficiency by mitigating prompt leakage in LLMs and reducing I/O overhead in large-scale vector search through optimized sampling and reordering techniques.
SysVec also supports advanced vectorization in RISC-V simulation and lattice QCD applications, demonstrating scalable performance and dynamic adaptability in modern computing systems.

SysVec refers to several independent but convergently named systems and frameworks across the domains of vector architectures, large-scale vector search, and LLM system prompt handling. In all usages, SysVec encapsulates the embedding or systematization of "system"-level controls or primitives as vector-based entities, whether at the architectural, algorithmic, or representational level.

1. System Vectors in LLMs: Encoding Prompts as Latent Representations

SysVec, as introduced by "You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors" (Cao et al., 26 Sep 2025), addresses the longstanding vulnerability of prompt leakage in LLMs by removing raw textual system prompts from the explicit context. Instead, it injects the semantic effect of the system prompt as a learned vector $v_{sys}$ at an intermediate layer of the model's hidden states. The traditional model, $y = f_\theta(s \oplus x)$ (with $s$ as the system prompt and $x$ as the user prompt), is replaced by a pipeline in which

$f(x, v_{sys}) = f_\theta^{(\ell+1:L)}\big(f_\theta^{(1:\ell)}(x) + \alpha v_{sys}\big),$

where $\ell$ is the insertion layer and $\alpha$ is a scaling factor.

To ensure $v_{sys}$ induces the same behavioral bias as $s$ , a preference-based optimization is used. For each user input $x$ , outputs generated with and without $s$ — $y_w = f_\theta(s \oplus x)$ and $y_l = f_\theta(x)$ , respectively—are used in a loss of the form:

$L(v_{sys}) = -\mathbb{E}_{x,y_w,y_l}\big[\log \sigma\big(\beta \big( \log P(y_w | f_\theta^{(1:\ell)}(x) + v_{sys}) - \log P(y_w | f_\theta^{(1:\ell)}(x)) - ( \log P(y_l | f_\theta^{(1:\ell)}(x) + v_{sys}) - \log P(y_l | f_\theta^{(1:\ell)}(x)) ) \big) \big)\big],$

where $\sigma$ is the sigmoid and $\beta$ is a tuning parameter. This encourages outputs with $v_{sys}$ to match the "preferred" (system-guided) behavior.

Experimental studies report the following:

SysVec minimizes leakage as measured by Prompt Leaking Similarity (PLS) and semantic similarity (SS) across attack variants (naive, ignore, completion, remember-the-start, and their combinations) on models such as Llama-2-7B, Llama-3-8B, Mistral-7B.
Instruction-following utility is preserved, as indicated by near-identical scores to classic system-prompted models on benchmarks such as MMLU.
Superior long-context retention is demonstrated in multi-turn conversations, where SysVec maintains high Response Utility Score, unlike textual prompt baselines, which experience marked degradation.
Inference overhead is reduced, as redundant processing of the system prompt text is avoided; the vector is injected once regardless of sequence length.

The removal of raw text in the context makes SysVec resistant to a broad class of prompt injection attacks, as the adversary cannot extract $s$ from any repeated or manipulated model outputs.

2. SysVec in Dynamic Vector Search Systems

LSM-VEC (Zhong et al., 22 May 2025) embodies a "SysVec" approach in the disk-based vector search setting, coupling a hierarchical proximity graph index with Log-Structured Merge-tree (LSM-tree) storage. The system stores higher-layers (less than 1% of nodes) of the proximity graph in memory for rapid navigation, with the bulk disk-resident graph maintained by the LSM-tree, supporting efficient out-of-place updates.

The search module adopts sampling-based probabilistic traversal:

$\text{Cost}_{full} = T \cdot (t_n + d \cdot t_v), \quad \text{Cost}_{sampling} = T \cdot (t_n + \rho d t_v), \quad \Delta = T (1-\rho) d t_v,$

where $T$ is the number of nodes visited, $d$ is node degree, $t_v$ is the time to access a vector, and $\rho$ is the sampling ratio. The system attains equivalent high recall with reduced I/O, as randomized expansion is guided by projection-based similarity. Additional connectivity-aware graph reordering—periodically executed during compaction—places frequently accessed nodes contiguously, minimizing random access.

Empirical findings indicate:

LSM-VEC offers ~88.4% Recall 10@10, reducing query latency (4.7–4.9 ms), and update latency (4.6–4.9 ms) compared to DiskANN and SPFresh, while lowering memory footprint by over 66.2%.
The design supports real-time dynamic ANN for billion-scale workloads, enabling applications such as continuous retrieval-augmented generation, evolving recommendation, and multimodal search.
The probabilistic retrieval process is underpinned by random-projection SimHash filtering:

$\text{Hash}(x) = [\text{sgn}(x^\top a_1), \dots, \text{sgn}(x^\top a_m)], \quad \#\text{Col}(q, u) = \frac{1}{2}(m + \text{Hash}(q)^\top \text{Hash}(u)),$

with collision threshold $T_{\epsilon}^{SimHash}$ regulating candidate selection.

Connectivity-aware physical ordering is formalized as maximizing

$F(\phi) = \sum_{0 < \phi(v) - \phi(u) \leq w} S(u, v),$

where $S(u,v)$ blends graph connectivity and dynamic sampling statistics.

3. SysVec in RISC-V Vector Architecture Simulation

The SysVec system described in "A RISC-V Simulator and Benchmark Suite for Designing and Evaluating Vector Architectures" (Lazo et al., 2021) is a parameterized extension to the gem5 simulator supporting the RISC-V vector extension (RVV). Here, "system vector" (SysVec) refers to a flexible, decoupled vector execution engine:

The scalar core fetches, decodes, and retires instructions, delegating vector operations (arithmetic, load/store) to an independent unit.
The simulator is parameterized by Maximum Vector Length (MVL), number of lanes, vector register file size, issue queue organizations, and interconnection network design (ring vs crossbar).
It supports arbitrary MVLs (from 8 to 256 elements or even 512–16384 bits) and both single- and multi-lane (e.g., 8-lane) vector units.

A seven-application vectorized benchmark suite stresses diverse hardware modules and memory access patterns, including Blackscholes (dense regular DLP), Canneal (irregular DLP), Jacobi-2D, Particle Filter, Pathfinder, Streamcluster, and Swaptions. All applications are vector-length agnostic, using vector intrinsics.

Performance studies highlight:

Near-linear speedups for high DLP workloads with increasing lanes/MVL; startup and interconnect/port limitations dominate efficiency for workloads with irregular access or smaller vector lengths.
Memory subsystems (cache size and configuration) critically affect the attainable vectorial speedup, as seen in significantly improved Swaptions throughput when increasing L2 cache from 256 KB to 1 MB.
The suite enables evaluation from embedded (low MVL, low register count) to HPC (large MVL, multi-lane, deep register file) scenarios.

4. Advanced Vectorization in Lattice QCD Supercomputing

"SVE-enabling Lattice QCD Codes" (Meyer et al., 2019) describes an ARM-based approach to "system vectorization" using the Scalable Vector Extension (SVE) ISA. The key capability is vector-length agnostic (VLA) SIMD, via vector units 128–2048 bits wide, operating with predication for tail handling.

For particle physics simulations (notably Lattice QCD), tight computational kernels such as the Dirac operator evaluation,

$\psi'_x = \sum_{\mu=1}^4 \big\{ U_{x,\mu}(1+\gamma_\mu)\psi_{x+\hat{\mu}} + U^\dagger_{x-\hat{\mu},\mu}(1-\gamma_\mu)\psi_{x-\hat{\mu}} \big\},$

are mapped to SVE instructions, utilizing structure loads (for interleaved complex-color-spin layouts) and hardware-accelerated complex arithmetic (e.g., FCMLA fusion).

Practical challenges include:

Data layout adaptation, as SVE's sizeless types cannot be members of structures or unions, necessitating array-based storage with compile-time fixed SVE_VECTOR_LENGTH.
Compiler limitations; e.g., ARM clang 18.x may fail to yield optimal FCMLA translation, motivating the use of explicit SVE ACLE intrinsics.
Portability trade-offs; hard-coding SVE vector length improves performance on fixed hardware at the cost of cross-platform flexibility.

Performance gains manifest in reduced loop overhead (due to hardware tail predication), improved matrix–vector multiply throughput, and greater FLOP/Watt efficiency.

5. Synthesis and Future Directions

Across domains, SysVec denotes a principle of representing "system-level" information as vector constructs—whether as learned internal representations, architectural modules, or disk-based indexes. In LLMs, SysVec secures proprietary instruction sets against leakage and enables robust long-context instruction following. In large-scale vector search, "SysVec" solutions efficiently support real-time insertions/deletions and billion-scale queries by merging hierarchical indexing with adaptive I/O organization. Within system architecture, SysVec frameworks allow fine-grained, parameterized simulation and benchmarking of vector engines for both embedded and HPC settings.

A plausible implication is the transferral of ideas between these contexts. For example, the intermediate-layer vector injection of SysVec in LLMs may inspire analogous runtime control methods in hardware or database systems. Inversely, the parameterization and out-of-place update strategies of vector search systems could inform more dynamic, update-tolerant LLM architectures.

Research directions include:

In LLMs: more robust optimization of $v_{sys}$ for complex behavioral and multi-task system prompts; automated dynamic scaling or modulation based on dialogue state.
In vector search: tighter hardware–software integration (e.g., GPU-accelerated index maintenance) and ML-guided sampling/adaptation.
In vector architecture: further full-system simulation (including OS/hypervisor interactions), and exploration of truly reconfigurable or adaptive vector hardware responsive to workload analysis.

SysVec, in all usages, exemplifies the centrality of vectorized representations and processing in modern computational systems, highlighting the need for secure, efficient, and extensible encoding of system-level actions and constraints.