WiFiPenTester: GenAI Wireless PenTest
- WiFiPenTester is an experimental, governed GenAI-assisted framework for wireless penetration testing that integrates large language models with human oversight.
- It utilizes a modular architecture combining passive scanning, decision-support by LLMs, and strict operator approvals to ensure reproducibility and auditability.
- Experimental evaluations show significant improvements in efficiency and accuracy, with reduced assessment times and improved handshake capture success compared to heuristic-only approaches.
WiFiPenTester is an experimental, governed, GenAI-assisted wireless penetration testing framework that fundamentally restructures the reconnaissance and attack-prioritization phases of wireless ethical hacking through the integration of LLMs subject to strict human-in-the-loop controls and budget-aware execution. Its design objective is to augment practitioner efficiency and reproducibility, not to automate or supplant critical operator judgment, thus marking a significant shift in the operational paradigm of wireless security assessment (Al-Sinani et al., 30 Jan 2026).
1. System Components and Architecture
WiFiPenTester employs a modular architecture composed of five core subsystems tightly orchestrated by a central governance layer. Surveillance and decision-making functions are separated by the governance interlock, preventing uncontrolled LLM invocations or unsanctioned active wireless operations.
- Reconnaissance Module: Utilizes passive 802.11 scanners, such as airodump-ng, to ingest wireless beacons, probe responses, and client telemetry. Output is normalized to structured metadata for each observed network: .
- Governance Layer: Inserted as an approval gate and cost estimator for every LLM (GenAI) interaction and every initiation of active RF transmissions. No action proceeds without explicit operator consent.
- LLM Decision-Support Engine: Runs Chain-of-Thought reasoning over structured session metadata and outputs a deterministic, schema-validated JSON containing ranked targets, feasibility scores, risk factors, and justifications. Output is for advisory purposes only—execution commands are strictly prohibited.
- Execution Controller: Presents the LLM’s ranked recommendations alongside unadulterated scan data. Allows the operator to select a specific target, fixes the monitoring interface to an appropriate channel, optionally performs deauthentication, and manages handshake capture and protocol-aware verification.
- Evidence Store & Report Generator: Collates the entire process (scan results, prompts, LLM outputs, execution logs, handshake captures) for audit, compliance, and reproducibility. Structured, anonymized penetration test reports can be generated via LLM using a facts-only, masked prompt.
This architecture ensures reproducibility, control, and full auditability. Human-in-the-loop (HITL) is enforced at each critical juncture to prevent unintended actions or cost overruns (Al-Sinani et al., 30 Jan 2026).
2. Threat Model and Formalism
WiFiPenTester models the operator as an ethical adversary constrained to:
- Passive eavesdropping of 802.11 management and data frames
- Injection of deauthentication frames (bounded by Management Frame Protection (MFP))
- Capture and offline dictionary attacks on WPA/WPA2 handshakes
- Configuration-only analysis of WPA3-SAE due to handshake-capture infeasibility under strict MFP enforcement
The set of discovered networks is subject to LLM-based prioritization. For each candidate , a vulnerability score and feasibility are computed under a linear model:
where representative feature codings include:
- if WEP, $0.7$ if WPA2-PSK, $0.4$ if WPA3-SAE mixed, $0.2$ if SAE-only
- if MFP enabled (attack resistant), else $1$
The candidate set is determined by applying thresholds (vulnerability) and (risk/feasibility):
Budget constraints on LLM API usage are strictly enforced:
This formal structure ensures that resource usage, output justification, and risk management are mathematically bound and operator-verifiable (Al-Sinani et al., 30 Jan 2026).
3. Governance and Human Oversight
Governance is implemented through several mechanisms:
- Approval Gates: Every LLM invocation and active wireless command (e.g., deauthentication, channel-lock) is preceded by explicit operator consent.
- Cost-Aware Execution: The system estimates the token count and resulting API cost of each LLM prompt; further actions are refused when the cumulative spend exceeds user-defined limits.
- Audit Logging: Every LLM-related API call is logged as a tuple , capturing the timestamp, prompt, returned JSON, and cost.
- Safety Guardrails: LLMs are constrained by structured prompts and output schema validation. Operational command generation and non-JSON (“advisory only”) responses are explicitly forbidden. If LLM output fails validation, action is aborted. If MFP is enabled, WPA3-SAE attacks are automatically marked infeasible by both prompt and post-parsing logic.
These governance methods collectively ensure bounded autonomy, accountability, and legal compliance in real penetration test cycles (Al-Sinani et al., 30 Jan 2026).
4. LLM Integration and Prompt Engineering
LLM integration is performed through an abstract connector module (supporting, e.g., OpenAI gpt-4o-mini), wrapped in prompt templates that encode operator intent, contextual metadata, and system constraints.
Prompt template (abridged):
1 2 3 4 5 6 7 |
“You are a seasoned wireless penetration tester. Below is session {session_id} at {timestamp}.
Networks: {JSON-array of structured metadata}
Constraints: max_targets=3, θ_vuln=0.5
Task: Rank each network by expected attack feasibility and vulnerability. Use Chain-of-Thought.
Output MUST be valid JSON with fields:
recommendations: [ { essid, bssid, score, feasibility, justification } … ]
Prohibited: any operational commands, any text outside JSON.” |
5. Target Ranking and Attack Strategy Recommendation
Target prioritization is mathematically formalized:
with typical weights , , , (Al-Sinani et al., 30 Jan 2026).
Attack strategy selection is protocol-aware and codified both in prompt and parsing logic:
- If
protocol == "WEP"Strategy: FMS/Korek statistical attack - If
protocol == "WPA2-PSK"and Deauthentication, 4-way handshake capture, then offline dictionary attack - If protocol contains
WPA3Configuration audit only (offline cracking disabled)
This structured approach eliminates ambiguity and offers both replicability and operational transparency.
6. Experimental Evaluation and Metrics
Experiments conducted in virtualized Kali Linux environments with MediaTek MT7601U adapters and OpenAI gpt-4o-mini API yielded the following results for GenAI-augmented versus heuristic-only (baseline) modes:
| Environment | Baseline Accuracy | GenAI Accuracy | ΔA | T_base (min) | T_genAI (min) | P_success_base | P_success_genAI |
|---|---|---|---|---|---|---|---|
| Static | 70% | 81% | +11% | 15.4 | 10.2 | 78% | 85% |
| Dynamic | 60% | 79% | +19% | 20.1 | 13.6 | 65% | 73% |
| Dense | 58% | 76% | +18% | 21.5 | 13.1 | 60% | 70% |
Average assessment time reduction is ≈34%. Handshake-capture success for top-ranked targets is improved by 8–10 percentage points. Standard deviation of target rank (σ_rank) decreased by 25%, indicating more consistent recommendations under variable RF conditions.
These empirical findings support the conclusion that LLM integration enhances both the efficiency and accuracy of wireless penetration testing when guardrails are in place (Al-Sinani et al., 30 Jan 2026).
7. Safety, Ethical, and Practical Considerations
WiFiPenTester’s architecture enforces operator accountability through human-in-the-loop requirements for all disruptive actions. Persistent and structured audit logging ensures full traceability and enables post-hoc regulatory review. LLM output schema enforcement mitigates hallucination and injection risk.
Limitations include snapshot dependence on passive scans, incomplete automation of WPA3-SAE attacks (configuration/downgrade only), and LLM prompt sensitivity. Mitigations consist of operator feedback loops and continuous prompt engineering.
Future work proposes extension to enterprise 802.1X/EAP, improved protocol-aware SAE analysis (e.g., Dragonblood-side channels), adaptive prompt refinement based on real-time RF feedback, and support for self-hosted/offline LLMs (Al-Sinani et al., 30 Jan 2026).
WiFiPenTester exemplifies a principled, auditable, and reproducible methodology for GenAI-assisted wireless penetration testing, demonstrating substantial improvement over manual heuristics while preserving operator control and compliance in adversarial RF environments.