DeepInception Framework Overview

Updated 26 January 2026

DeepInception Framework is a multi-branch architecture that uses parallel inception modules for efficient feature extraction across computer vision, finance, and security applications.
It enhances model performance by factorizing convolutions and reducing computational load, achieving competitive accuracy with fewer parameters.
It extends to LLM security by leveraging recursive role-play strategies for jailbreak evaluations, prompting new defenses against adversarial attacks.

The DeepInception framework refers to a family of architectures, attack methods, and analytical models rooted in inception-style multi-branch computation, initially introduced for efficient deep learning in computer vision and since adopted for quantitative finance and security analysis in LLMs. Canonical implementations include GoogLeNet/Inception (Szegedy et al., 2014), Inception-V3 (Szegedy et al., 2015), MBInception (Froughirad et al., 2024), systematic trading models (Liu et al., 2023), and adversarial “jailbreak” evaluation protocols (Li et al., 2023, Huang et al., 19 Jan 2026). In all cases, the central motif is parallel, multi-scale feature extraction or manipulation, either for neural network efficiency, signal mining, or adversarial vulnerability evaluation.

1. Origins: Multi-Branch Convolutional Networks

The inception module, as instantiated in GoogLeNet (Szegedy et al., 2014), is defined by four parallel branches operating on the same input tensor:

1×1 convolution: Projects cross-channel features and enables non-linear embeddings. Given $C_{in}$ channels, $F_1$ filters, $P_1 = 1×1×C_{in}×F_1$ parameters.
1×1→3×3 path: Channel reduction by $R_3$ filters, followed by $F_3$ 3×3 filters; $P_2 = (1×1×C_{in}×R_3) + (3×3×R_3×F_3)$ .
1×1→5×5 path: Reduction by $R_5$ , then $F_5$ 5×5 filters; $P_3 = (1×1×C_{in}×R_5) + (5×5×R_5×F_5)$ .
Pooling→1×1 projection: 3×3 pooling (stride 1), then $F_p$ 1×1 projections; $F_1$ 0.

The outputs are concatenated along the channel axis: for input $F_1$ 1, resulting in $F_1$ 2.

The architectural rationale is grounded in the Hebbian principle ("neurons that fire together, wire together") and the empirical finding that multi-scale receptive fields are necessary to efficiently model correlated structures at various scales. Sparse, hardware-friendly connectivity is mimicked by the joined branches. FLOP counts per branch are explicitly computed to ensure computational tractability.

2. Architectural Evolution and Application Areas

Following Inception V1, the DeepInception paradigm expanded to deeper, more factorized topologies motivated by parameter- and compute-efficiency.

Inception-V3 and Convolutional Factorization

Inception-V3 (Szegedy et al., 2015) factorizes wider filters:

A $F_1$ 3 convolution is efficiently replaced by two $F_1$ 4, reducing multiply-adds by 28%. For $F_1$ 5:

$F_1$ 6

Asymmetric decomposition (e.g., $F_1$ 7 to $F_1$ 8 then $F_1$ 9) results in 33% savings.
Every spatial aggregation is preceded by a $P_1 = 1×1×C_{in}×F_1$ 0 bottleneck.

Modules are stacked to form networks up to 42 layers (Inception-V3), yielding accuracy–cost trade-offs superior to prior VGG and ResNet configurations at similar or reduced parameter counts.

MBInception and Efficient Image Classification

MBInception (Froughirad et al., 2024) applies three or four sequential inception blocks, with each block employing two inception modules (each with 1×1, 1×3/3×3, 1×5/5×5, and pool–1×1 branches and bottlenecks), batch normalization, and dropout. This design is validated on benchmarks (CIFAR-10, CIFAR-100, MNIST, Fashion-MNIST): parameter efficiency and accuracy are consistently competitive with, or superior to, VGG, ResNet, and MobileNet.

Quantitative Finance: Deep Inception Networks (DIN)

DINs (Liu et al., 2023) adapt multi-branch inception modules for end-to-end multivariate time-series and cross-sectional feature learning in portfolio management. The FE (feature extractor) learns temporal, cross-asset, and joint time–cross-sectional patterns via four-branch custom inception modules (time-series conv, cross-sectional conv, joint conv, identity). The successor PS (position sizer) uses LSTM or transformer variants to generate portfolio weights directly optimizing for Sharpe ratio with explicit transaction cost and correlation regularization.

3. DeepInception in LLM Security and Jailbreak Analysis

DeepInception is also used to describe adversarial attack frameworks targeting LLM safety constraints. Two principal protocols have emerged:

Virtual, Nested Scene Hypnosis

(Li et al., 2023) proposes recursive scene-based personification, drawing on Milgram experiment psychology. The LLM is prompted to inhabit a virtual context with multiple recursively nested characters and scenarios (“layered inception”), inducing the model to ignore default guardrails. This approach is effective against both open- and closed-source LLMs, exceeding baseline "PAIR" and instruction-based jailbreak techniques. Empirical trials show a maximum jailbreak success rate (JSR) of 48.8% (Vicuna-v1.5), with robustness to simple defenses.

Black-box Jailbreak via Multi-turn Role-play and Scenario Simulation

(Huang et al., 19 Jan 2026) defines a process involving sequential role-play, scenario simulation, and multi-turn dialogue, each represented as composable modules. Scoring uses a seven-level hierarchical scale, with success defined as any dialogue with total score <0. The protocol is implemented in LLM jailbreak evaluations for Chinese medical ethics. Using this framework, models reach 82.1% attack success rate under adversarial prompting, compared to ≈0% for zero-shot queries; the ASR Gain is >80 percentage points.

Key formulas include: $P_1 = 1×1×C_{in}×F_1$ 1

$P_1 = 1×1×C_{in}×F_1$ 2

4. Training, Regularization, and Optimization Methodologies

Image Models

Aggressive regularization regimes are necessary for deep Inception-style modules. In GoogLeNet (Szegedy et al., 2014):

Auxiliary classifiers (after deep modules) assist gradient flow; 0.3 loss weight each.
Dropout: 40% (main tower), 70% (auxiliary).
Polyak averaging, asynchronous SGD (momentum 0.9), fixed-epoch decay.
Data augmentation: random crops (area 8–100%, aspect ratio $P_1 = 1×1×C_{in}×F_1$ 3), photometric distortions, random interpolation.

In Inception-V3 (Szegedy et al., 2015):

Batch normalization on all conv outputs and FC layers in aux classifiers (≈0.4% Top-1 improvement).
Label-smoothing (LSR) to avoid overconfident outputs, confers ≈0.2% improvement.
Dropout (40%) is used in the penultimate layer.

DINs (Liu et al., 2023) use Adam optimizer, loss is the negative Sharpe ratio augmented with correlation-based regularization; random seeds are ensembled to limit initialization variance.

Security/Ethics Evaluations

DeepInception-based attack evaluations (Li et al., 2023, Huang et al., 19 Jan 2026) are fully black-box and require no model internals, gradient information, or system prompt modification. Evaluation is purely via scenario and persona-driven prompt engineering, scored by human raters or hierarchical scales.

5. Quantitative Performance and Benchmarks

Computer Vision

Model	Params (M)	CIFAR-10 Acc.	CIFAR-100 Acc.	MNIST Acc.	F-MNIST Acc.
MBInception	19	65.3%	42.1%	99.22%	91.34%
ResNet50	22	66.9%	41.0%	99.16%	91.23%
VGG16	15	64.4%	40.6%	99.09%	84.96%
MobileNet	5	63.3%	37.3%	99.02%	83.96%

GoogLeNet set a 7.89% top-5 error (144 crops) on ILSVRC 2014; ensemble of 7 models achieved 6.67%. Compared to AlexNet (16.4% top-5) and VGGNet (7.32%), Inception/GoogLeNet used 12× fewer parameters and half the inference FLOPs, with superior accuracy (Szegedy et al., 2014, Szegedy et al., 2015, Froughirad et al., 2024).

Quantitative Finance

Best DIN results (Futures, target vol=15%): Sharpe ratio 2.95, MDD 13.8%, break-even transaction cost 4.82 bps, correlation to Long-only 11%. LSTM and other variants show reduced Sharpe and higher drawdown. DINs outperform traditional momentum/cross-sectional strategies over multiple asset classes (Liu et al., 2023).

LLM Security

In medical-ethics jailbreak analysis (Huang et al., 19 Jan 2026), ASR baseline ≈0%, ASR (jailbreak) 82.1%. DeepInception (nested-scene method) achieves JSRs up to 48.8% (Vicuna-v1.5), outperforming baseline techniques (Li et al., 2023).

6. Limitations, Trade-offs, and Defenses

Limitations cited include human-introduced bias (scenario/scoring design) in LLM adversarial evaluations, limited generalizability across domains, and practical inefficiencies at extreme model depths. Scene/layer depth in LLM jailbreaks suffers from “forgetting” at high recursion. Some application domains are more vulnerable than others (e.g., high-sensitivity legal/ethical contexts).

Defensive strategies proposed:

Shift from outcome supervision to auditable, process-based supervision (LLM security).
Multi-factor identity verification for critical roles.
Cross-model "joint defense" to prevent low-guardrail model fallback.
InceptionNet extensions using attention blocks (SE/CBAM) and depthwise-separable convolutions for further parameter reduction.
Provenance-aware detection and policy penalization for recursive-role prompts in LLMs.

7. Impact and Generalization

The DeepInception framework is a prototypical example of multi-path, multi-scale architectural design yielding demonstrable efficiency gains across vision, finance, and model security domains. Its widespread adoption reflects the ongoing need for balancing representational power, computational cost, and safety in large-scale machine learning systems. Adoption of auxiliary module regularization, label-smoothing, and enforced bottlenecks has become foundational in subsequent architectures. In LLM security, DeepInception exposes vulnerabilities rooted in model obedience and role-play capacity, motivating a new generation of defense mechanisms and highlighting the criticality of prompt-based attack models.

Key references: (Szegedy et al., 2014, Szegedy et al., 2015, Froughirad et al., 2024, Liu et al., 2023, Li et al., 2023, Huang et al., 19 Jan 2026)