Block Induced Signature GAN (BISGAN)
- The paper introduces BISGAN, a generative framework with block-induced, attention-enhanced architecture that achieves spoofing success rates up to 100% against top CNN verifiers.
- The generator employs a CycleGAN-derived encoder–decoder with residual blocks augmented by mini-Inception modules and spatial self-attention for effective multi-scale feature extraction.
- Empirical evaluations on biometric datasets show BISGAN outperforms prior GAN and diffusion-based approaches, with spoofing rates averaging above 95% and robust statistical validation via the Generated Quality Metric (GQM).
Block Induced Signature Generative Adversarial Network (BISGAN) is a generative model framework designed for synthesizing high-quality forged handwritten signatures with the explicit goal of spoofing state-of-the-art signature verification systems. BISGAN extends the CycleGAN paradigm by introducing multi-scale Inception-inspired modules and spatial self-attention in its generators, and pairs this with a specialized discriminator derived from the SigCNN architecture, enhanced with Spatial Pyramid Pooling. The evaluation of generated signature quality employs a custom metric, the Generated Quality Metric (GQM), incorporating Mahalanobis and Cook’s distances to locate forgeries within the manifold of genuine signatures. BISGAN demonstrates spoofing success rates from 88% to 100% against a variety of deep learning baselines and outperforms prior GAN and diffusion-based approaches, as empirically validated on standard biometric datasets (Amjad et al., 2024).
1. Generator Architecture
BISGAN’s generator is a CycleGAN-derived model equipped with “block induced” multi-scale connectivity and attention. Two generators, (genuine to forgery) and (forgery to genuine), implement the following backbone:
- Encoder–transformer–decoder structure: The input image passes through an initial 7×7 convolution with instance normalization and ReLU, followed by two down-sampling 3×3 conv layers (stride 2), compressing spatial dimensions by a factor of four.
- Residual (ResNet-style) transformer stage: A stack of residual blocks, each block being augmented by an internal Inception module followed by self-attention.
- Decoder: Up-sampling employs either fractional-stride convolution or nearest-neighbor interpolation, culminating in a final 7×7 convolution followed by a Tanh activation.
Inception-style block (block induced):
All convolutional steps (including those inside residuals) substitute the standard 3×3 conv for a mini-Inception design composed of:
- Four branches: 1×1 conv, 3×3 conv, 5×5 conv, and a 3×3 max-pooling 1×1 conv,
- Each branch produces feature maps (e.g., ), which are concatenated across the channel dimension,
- Enabling joint modeling of fine and coarse spatial details inherent in handwritten signatures.
Self-attention:
Within each residual block, after the Inception module, a scaled dot-product self-attention mechanism is applied:
- For feature map , projections , , are computed,
- The attention map ,
- The attended output , projected and additively combined with the block’s input.
This joint Inception–attention design allows the generator to model subtle and global characteristics of signatures, facilitating forgeries that are challenging to distinguish from genuine signatures both visually and statistically.
2. Discriminator Design
The BISGAN discriminator is an adaptation of the SigCNN model, incorporating multi-scale filtering and flexible spatial aggregation:
- Inception blocks at all stages: Every convolutional stage is followed by an Inception module (1×1, 3×3, 5×5, max-pool branches) and further downstream by strided max-pooling.
- Layer sequence: Three convolutional blocks (3×3 with 64, 128, then 256 filters, each stride 2 and LeakyReLU), with Inception modules and pooling between each.
- Spatial Pyramid Pooling (SPP): Replaces global pooling, enabling the model to process input signatures of arbitrary size and to better capture hierarchical spatial structure via aggregation at , , and levels.
- Final classifier: Flattened SPP output traverses two parallel fully connected (FC) layers (512 units each), concatenated and projected to a scalar sigmoid output.
Relative to the standard PatchGAN, this structure leverages multi-scale feature extraction and spatial aggregation to robustly model structural and textural properties critical to signature verification.
3. Training Objectives and Optimization
BISGAN’s learning procedure follows the CycleGAN philosophy, with dedicated adversarial and cycle-consistency losses:
- Adversarial loss:
and, symmetrically, for and .
- Cycle-consistency loss:
- Optional identity loss: (included in CycleGAN; not explicitly evaluated in (Amjad et al., 2024))
- Total generator objective:
- Optimization settings: Adam optimizer (, ), learning rate (linearly annealed after 100 out of 200 epochs), batch size 1.
and, when used, control the relative weighting of consistency and identity losses.
4. Quality Evaluation: Generated Quality Metric (GQM)
To rigorously quantify how well generated signatures approximate true forgeries or genuine samples, BISGAN introduces the GQM. The procedure is as follows:
- Influential point extraction via Mahalanobis distance:
For dataset with mean and covariance , compute . Select the top most distant signatures in both genuine and forged domains as “influential points.”
- Cook’s distance:
For a candidate generated signature , fit a linear model on the influential points. Compute Cook’s distance , quantifying the perturbation would induce in linear fit.
- Scoring:
Let and be the Mahalanobis distances of to the influential genuine and forged sets. Define . A signature is graded “O” (good forgery) if , otherwise “F.”
This approach provides a sample- and cohort-sensitive evaluation, ranking a generated signature’s relative authenticity in terms of statistical influence and distributional proximity.
5. Empirical Performance and Comparative Evaluation
BISGAN’s efficacy is established through experiments targeting four CNN-based signature verifiers—VGG-16, AlexNet, CapsNet, SigNet-F—across standard biometric signature datasets, notably CEDAR, SVC2021 EvalDB, and DeepSignDB.
Spoofing Success Rates:
| Model | VGG-16 | AlexNet | CapsNet | SigNet-F |
|---|---|---|---|---|
| CycleGAN | 35.0% | 38.8% | 48.8% | 36.3% |
| Stroke-cCycleGAN | 68.8% | 62.5% | 76.3% | 58.8% |
| BISGAN | 96.3% | 88.8% | 97.5% | 96.3% |
| BISGAN (paradigm) | 97.5% | 91.3% | 100.0% | 98.8% |
BISGAN yields 88.8%–98.8% acceptance of forged signatures as genuine (standard training), and up to 100% in the “paradigm shift” setting (training on forged→genuine).
Comparison with Baselines:
DCGAN, MaskGIT, OSVGAN, RSAEG, and a recent diffusion-model attack are surpassed by BISGAN, with mean spoofing success rates for BISGAN averaging above 95%, vs. 40–75% for alternatives.
GQM Results (CEDAR dataset, representative sample):
| Model | d_genuine | d_forged | Grade |
|---|---|---|---|
| CycleGAN | 0.59 | 0.41 | F |
| Stroke-cCycleGAN | 0.37 | 0.63 | O |
| BISGAN | 0.21 | 0.79 | O |
| BISGAN (paradigm) | 0.12 | 0.88 | O |
BISGAN forgeries are consistently graded “O,” reflecting proximity to genuine signature manifolds by GQM assessment.
6. Contextual Significance and Research Implications
BISGAN introduces architectural and evaluation advances for biometric forgery synthesis:
- Generator-centric improvements: By equipping the generator with multi-scale, attention-rich blocks, BISGAN prioritizes the quality and diversity of forgeries, addressing prior GANs’ focus on discriminator strength and patch-based realism.
- Evaluation methodology: The GQM provides a nuanced and distribution-aware quantification of forgery realism, moving beyond classifier-based accuracy.
- Empirical generalization: High spoofing rates across several verification architectures and datasets suggest the approach is robust to both data and verifier specifics.
This suggests that the block-induced and attention-based generator design is crucial for advancing the fidelity of biometric forgery models. A plausible implication is that similar techniques may generalize to other biometric modalities or adversarial attack contexts. The paradigm shift in training, whereby the generator is driven towards genuine reconstruction from forgeries, further amplifies the potential for generating undetectable spoofs.
7. Limitations and Future Directions
While BISGAN’s architecture and evaluation demonstrate clear empirical advantages, there are practical considerations:
- Identity loss was not extensively explored in (Amjad et al., 2024), so its contribution to forgery realism remains an open question.
- GQM, though effective for pairwise data evaluation, may require adaptation for multi-writer datasets or in adversarial settings where signature distribution shifts.
- A plausible implication is the necessity for developing complementary forensics tools and model regularization, especially as such generative models mature.
Further research may focus on extending block-induced generator motifs to unsupervised or few-shot signature generation, scaling BISGAN to different biometric tasks, or developing more adversarially robust verification systems responsive to such high-fidelity forgeries.