Software Security Knowledge Boundary
- Software Security Knowledge Boundary is the conceptual demarcation where known security evidence, assumptions, and validation methods become insufficient for ensuring safe system operations.
- It spans multiple dimensions including assessor competence, structural attack surfaces, semantic gaps in validation, and organizational as well as lifecycle boundaries.
- Practical implications involve enhancing vulnerability scoring accuracy, refining code review practices, and evolving AI-assisted development to bridge security knowledge gaps.
Software security knowledge boundary denotes the limit at which available security knowledge, assumptions, or evidence cease to be sufficient for reliable assessment or safe operation. Recent work uses the concept in several closely related senses: as a boundary in assessor competence during vulnerability scoring, as a boundary between trust and enforceable security, as the exposed boundary of a software system’s attack surface, as the gap between explicit functional specifications and implicit security intent in AI-assisted development, and as the semantic gap that remains after syntactic validation at trust boundaries (Allodi et al., 2018, Pathan, 2012, Moshtari et al., 2021, Grynets et al., 29 May 2026, Kim et al., 2 Jul 2026). Taken together, these works treat the boundary not as a single line between “secure” and “insecure,” but as a set of limits on what is known, what is made explicit, what is validated, and what can be acted upon.
1. Conceptual foundations
One line of work distinguishes trust from security. In this view, trust is a vague term, an initial belief or assumption about an entity, and often the precondition for interaction, whereas security is the operational mechanism that ensures legitimacy and preserves privacy, authenticity, authority, integrity, and non-repudiation. Boundary lines preserve security; if they are crossed without legitimacy, the result is a security breach. The boundary therefore functions as a line of admission: entities inside it are legitimate and accepted, and entities outside it are unauthorized or insufficiently trusted (Pathan, 2012).
A second line of work defines the boundary in terms of attack surface. The notion of Attack Surface refers to the critical points on the boundary of a software system which are accessible from outside or contain valuable content for attackers. Grounded-theory analysis of vulnerability and weakness reports organizes those boundary components into three categories—Entry Points, Targets, and Mechanisms—across Code, Program, System, and Network levels. In this formulation, the boundary is not merely a perimeter but a layered distribution of reachable interfaces, valuable assets, and enabling conditions (Moshtari et al., 2021).
A third and more formal line of work defines a Trust Boundary Semantic Gap (TBSG). A TBSG exists when an artifact crosses a trust boundary, passes correctly implemented syntactic validation, and the assertions established by that pass are still insufficient for the receiving domain’s security requirements. The paper states this as
Here, is what the boundary check actually proves, and is what the receiving domain requires to process the artifact safely. MDTBSG organizes such unresolved assumptions into four dimensions—Identity, Spatial, Temporal, and Interpretation—thereby shifting attention from failed checks to security-relevant properties that remain unestablished after checks succeed (Kim et al., 2 Jul 2026).
These formulations are compatible rather than mutually exclusive. This suggests that the software security knowledge boundary can be understood simultaneously as an epistemic boundary for human judgment, a structural boundary in system exposure, and a semantic boundary in what validation does and does not establish.
2. Human expertise and the boundary of assessment competence
The most direct empirical treatment of the term appears in work on vulnerability assessment using CVSS v3 as a prima facie proxy for software security skills. In a natural experiment with 73 participants—35 students with no security training, 19 students with specific security education, and 19 security professionals with several years of experience but no academic-level security specialization—participants assessed 30 vulnerabilities in 90 minutes using only the CVE description, an introductory seminar on CVSS v3, and a metric summary sheet. Of 2190 assessments, 1924 valid assessments were used in analysis. The benchmark was the official assessments produced by the CVSS SIG, and the main analysis used mixed-effects regression models. The central result is that subjects with security knowledge are significantly more accurate at the assessment than subjects with no security knowledge on all metrics, and overall one security-knowledge group was between 30% to 60% less likely than the untrained group in making an error. At the same time, the professional group does not appear to perform significantly better than the security-educated student group except for one metric, and the combination of skills explains most of the subjects’ variance. Within the professional group, attack expertise decreases error by almost 60% for Attack Vector and Attack Complexity, and years of experience increases accuracy by roughly 20% per year for certain metrics (Allodi et al., 2018).
The same boundary appears in modern code review, but in organizational rather than purely cognitive form. A two-step investigation based on 10 interviews and 182 survey responses found that organizations frequently expect developers to ensure security during code review, yet most developers do not immediately report security as a primary review focus. Only 9 of 182 survey respondents mentioned security in open-ended descriptions of code review focus, and only 2 of 10 interviewees mentioned it without prompting. At the same time, 126 respondents said their company or project expects developers to ensure security, 166 agreed that it is part of the developer’s job to ensure the security of the application, and only 38 said their company or project provides security training. The most frequently reported challenge was lack of knowledge or training, with 44 respondents naming it as the main challenge, alongside third-party libraries, subtle vulnerabilities, and interactions among parts of code that could have security implications (Braz et al., 2022).
A related qualitative study in financial services describes the boundary as a division of labor and expertise. Security experts are expected to know threats and attack paths, security requirements, secure design, vulnerability-focused code review, testing methods and tooling, and compliance and risk interpretation, while developers are expected to understand security enough to build features securely, avoid common mistakes, write secure code, react to feedback from security specialists, and integrate security activities into day-to-day development. The study reports that developers are often not security specialists; secure engineering therefore depends on collaboration, training, documentation, tooling, and process design. Teams compensate for limited expertise through reliance on security experts, checkpoint-based reviews, tool-based enforcement, checklists and standards, and restriction or centralization of security-sensitive tasks (Arora et al., 2021).
Across these studies, the boundary is not a simple novice–expert divide. It is the point at which generic IT knowledge, generic review responsibility, or coarse labels such as “security professional” no longer predict good security judgment. Assessment quality depends on skill composition, contextual knowledge, and the support structures that allow specialized knowledge to cross into routine engineering practice.
3. Boundary representations in attack surfaces, program graphs, and security knowledge bases
Attack-surface research turns the boundary into an explicit taxonomy. Using Straussian Grounded Theory on 1,444 vulnerability and weakness records—810 CVE-related vulnerability reports and 634 CWE entries—the proposed model derives Entry Points, Targets, and Mechanisms as the core categories of attack surface components. Entry Points answer where attackers get in, Targets answer what assets or components they affect, and Mechanisms answer how vulnerabilities emerge. The model spans Code, Program, System, and Network levels, and the comparison with prior literature shows that in the best case previous works cover only 50% of the attack surface components at network level and only 6.7% of the components at code level. More specifically, prior work covers only about 10% of code-level entry points, 3.4% of code-level targets, and 10% of code-level mechanisms (Moshtari et al., 2021).
Program Knowledge Graphs move the boundary from taxonomy into implementation-level checking. Instead of treating software security knowledge as something that lives only in CVE and CWE datasets, the approach embeds public weakness knowledge into a graph that also contains fine-grained program structure and behavior, including call graphs and, in principle, control-flow and data-flow graphs. The implementation uses a property-graph style model in Neo4j with Cypher queries. In evaluation, 15 C/C++ code examples from CWE pages spanning 8 weakness categories were queried, and the method identified 14 out of 15. The miss, CWE-401, is explicitly attributed to the fact that call graphs alone are insufficient and that this weakness requires data-flow graphs (Xie et al., 2023).
SynAT extends security knowledge bases by moving the boundary outward from curated databases toward crowd discussions. On 5,070 Stack Overflow security posts, the approach combines LLM-based scope restriction, transition-based joint event and relation extraction, and heuristic attack-tree synthesis. The original dataset comprises 5,070 posts, 1,354 attack trees, 3,010 events, and 7,756 relations; Cohen’s Kappa for annotation is 0.87. After Feb 1, 2023, 192 security posts yielded 121 synthesized attack trees. Of these, 40.50% were proposed in Stack Overflow earlier than in CVE, 53.72% were earlier than in CAPEC, 16.53% may be new to CVE, and 19.83% may be new to CAPEC. In HUAWEI’s private attack-tree database, practitioners accepted 102 of the 121 synthesized trees, an 84.30% acceptance rate, increasing the database by 42.32% overall (Jiang et al., 5 Feb 2026).
These representational approaches differ in granularity, but they share a common claim: security knowledge is bounded by what the model can name, encode, or query. Attack-surface taxonomies show what should be considered; program graphs show what can be checked in real code; attack-tree synthesis shows how knowledge outside official databases can be structured and admitted into them.
4. Organizational and lifecycle boundaries
In practice, software security knowledge is also bounded by organizational structure, development tempo, and the form in which security evidence becomes actionable. The financial-services study reports that teams bridge the gap between security experts and generalist developers through semi-structured collaboration patterns rather than through universal expert competence. Requirements and design are described as the phase with the highest need for security knowledge, implementation as the phase where developers must act on that knowledge most directly, and testing and maintenance as phases where teams frequently rely on tooling, outside alerts, or specialist intervention. The study further reports that generic security advice is insufficient and that organizations need project-specific guidance, examples tied to the codebase, patterns for common tasks, and clear role boundaries for escalation (Arora et al., 2021).
The MOSS perspective generalizes this organizational boundary to contemporary software ecosystems. It argues that the software security knowledge boundary has moved outward and become fluid because modern projects are assembled from third-party components, updated weekly or even more often, and affected by security-relevant changes in subcomponents controlled by external parties. Traditional process-based assurance relies on rigid security gates, heavyweight techniques such as architecture risk analysis and code V&V, stable artefacts at fixed points in the lifecycle, and supervision by a few highly-skilled security specialists. The proposed shift is to artefact-based security evaluation over Models, Source code, Container images, and Services across the continuous lifecycle “Design, Develop, Deploy, Evaluate and back.” The paper argues for lightweight, intelligent, fully- or semi-automated screening tests of security-relevant events, enabling incremental re-certification at scale (Pashchenko et al., 2021).
A central issue in this literature is actionability. The MOSS analysis states that some outputs remain too coarse to guide action: ML tools that say a line contributes “x%” to vulnerabilities, or that a file “may contain a vulnerability,” are not very actionable. Dependency tools likewise may produce too much noise or insufficient context to determine exploitability or remediation priority. This suggests that the knowledge boundary is not only about detection. It is also the point at which raw signals fail to become decision-relevant guidance.
5. AI-assisted development and model-specific security boundaries
AI-assisted development introduces a new and explicit form of software security knowledge loss. In specification-driven generation, the central problem is that functional requirements are usually explicit, while security requirements are often under-specified. The proposed response is a Multilayer Specification Security Model connecting system context, threats and risks, security requirements, implementation rules, controls, verification scenarios, and evidence, together with a Security Knowledge Transition Method that transforms business and technical specifications into a validated security-enriched generation contract. The paper formalizes the workflow as
where is specification input, is the Multilayer Specification Security Model, is generated implementation, and is verification evidence. In backend generation for a tennis court booking API, evaluated against a hidden 221-test black-box API suite, modal failures decreased from 50 in the baseline to 42 with ASVS and 36 with the Multilayer Security Model. Pass rates were 77.38% for the baseline, 81.00% for ASVS, and 83.71% for the Multilayer Security Model, with the strongest gains in Business Logic, Admin Safety, JWT Token, and Schema Strictness (Grynets et al., 29 May 2026).
A narrower but complementary boundary appears in LLM use of Java security APIs. The replication on JCA and JSSE APIs defines the boundary as the point at which a model may know an API in a general sense yet still fail to apply the security constraints needed to use it safely. The benchmark comprises 12 task functionalities, 3 semantically equivalent wordings per functionality, and 30 samples per prompt, for 1080 generated outputs per model-condition combination. Baseline misuse persists: GPT-5.5 produced 1019 valid programs and 646 misuse-free valid programs, for a misuse rate among valid programs of 36.60%; Llama-3.3-70B-Instruct produced 973 valid programs and 404 misuse-free valid programs, for 58.48%; historical GPT-4 had 65.91%. External security knowledge substantially improves the measured outcome, but the strongest knowledge type is model-dependent: secure code examples are the most effective single knowledge type for Llama-3.3-70B-Instruct, while explicit misuse patterns eliminate all detected security API misuses among valid GPT-5.5 programs, although some outputs remain invalid due to compilation errors or target-API mismatches (Lu et al., 29 May 2026).
The most explicit cognitive formalization of the boundary appears in evaluation of software security comprehension in LLMs. Using Bloom’s Taxonomy, the paper defines the software security knowledge boundary as the highest cognitive level of software security understanding that a model can sustain reliably and consistently. With ordered levels and threshold , the boundary is
0
Across 1,172 tasks spanning remembering, understanding, applying, analyzing, evaluating, and creating, the study reports strong lower-level performance and marked degradation at higher levels. Table 6 reports GPT-5-Mini at Create at all thresholds, while Llama-3.1 falls to None at 1. The paper also identifies 51 recurring misconception patterns, ranging from factual confusions at lower Bloom levels to flawed architectural and design judgments at evaluating and creating levels (Siddiq et al., 24 Dec 2025).
These AI-focused studies converge on a common conclusion: security failures often arise not because models lack all relevant information, but because security intent is implicit, generic, or poorly transferred. Generic standards help, but application-specific constraints, threat-linked requirements, and model-aware knowledge selection improve security knowledge preservation more effectively.
6. Supply chains, operational scope, and the moving research frontier
In software supply chains, the knowledge boundary becomes recursive, population-dependent, and ecosystem-scale. A defense-oriented evaluation introduces AStRA, a graph-based model with four vertex types—Principals, Artifacts, Resources, and Steps—arranged as a directed acyclic graph with causal relations: principals use resources, resources carry out steps, and steps consume and produce artifacts. On this basis, the work defines principal objectives 2–3, artifact objectives 4–5, resource objectives 6–7, step objectives 8–9, and topology objectives 0–1, and validates the model against case studies such as left-pad and SolarWinds as well as 72 attacks in the IQT Labs dataset. The boundary, in this view, lies between isolated attack descriptions and a complete, objective-driven, compositional theory of supply chain defense (Ishgair et al., 2024).
A parallel systematization of knowledge defines three orthogonal secure design properties for software supply chains: transparency, validity, and separation. It also proposes a four-stage attack pattern—compromise, alteration, propagation, and exploitation—and argues that transparency primarily mitigates compromise, validity primarily mitigates alteration, and separation primarily mitigates propagation. Its comparison of SCIM, SLSA 4, and CNCF Software Supply Chain Best Practices reports that SCIM is narrower, especially on separation and actor-related concerns, while SLSA 4 and CNCF cover all three properties more broadly but impose higher implementation complexity, with SLSA 4 having 20 requirements and CNCF nearly 60 requirements (Okafor et al., 2024).
The problem of boundary definition also appears in empirical research itself. An RSSC-oriented taxonomy for research software supply chain studies introduces four dimensions—Actor unit, Supply chain role, Research role, and Distribution pathway—to make the operational boundary of “research software” explicit. Applied to 6,966 entries from the Research Software Encyclopedia, with Scorecard successfully computed for 5,937 of 6,966 entries, or 85.2% coverage, the study reports an overall RS median Scorecard score of 2.9 versus 3.9 for an Apache Software Foundation baseline, with median missingness of 22% for RS and 28% for ASF. Stratification matters: community or foundation-maintained research software has median 3.6 and 11% missingness, while individual maintainer research software has median 2.7 and 28% missingness (Kalu et al., 28 Jan 2026).
The roadmap for software security analysis pushes the boundary still further outward. It argues that the field has largely optimized around shallow bugs that are easy to define, easy to trigger, and easy to observe, especially memory-safety errors, while future work must address deep vulnerabilities hidden in interactions, specifications, architectures, dependency chains, and emergent behaviors. OSS-Fuzz is cited as finding 10K+ vulnerabilities and 36K+ bugs across 1000+ projects, illustrating how much progress has been made on shallow, reachable faults. At the same time, the roadmap argues for an “assume breach” posture, ecosystem-scale analysis, and new methods for heterogeneous systems and machine-generated code, because guarantees become partial, probabilistic, or compositional as systems mix conventional software, neural components, supply-chain dependencies, and runtime defenses (Böhme et al., 2024).
The supply-chain literature therefore extends the notion of boundary in two directions. First, it shows that security knowledge depends on how the system and its population are scoped. Second, it shows that the frontier is dynamic: as local bug-finding improves, the remaining boundary shifts toward transitive risk, provenance, semantics, and resilience under residual uncertainty.