Papers
Topics
Authors
Recent
Search
2000 character limit reached

FVD Authorship Verification in Handwriting

Updated 26 January 2026
  • The paper demonstrates that FVD authorship verification quantitatively distinguishes document authors via computed feature vector differences with near-perfect accuracies on several datasets.
  • It employs intrinsic feature extraction, including text-line area heights, word spacing, and character size via CNN-based embeddings, to construct detailed statistical descriptors.
  • The methodology shows practical scalability and reliability in forensic analysis, while suggesting enhancements through deep learning for richer dynamic ink-trace properties.

Feature Vector Difference (FVD) Authorship Verification is a quantitative approach to determining document authorship by mathematically comparing feature representations derived from document content. This methodology constructs intrinsic feature vectors comprising statistical descriptors of manuscript properties and performs authorship discrimination based on their vectorial differences, notably using Euclidean metrics. FVD approaches are particularly influential in forensic document analysis and computational stylometry due to their objectivity, reproducibility, and compatibility with both conventional and digital writing media, as well as their scalability to large corpora and pairwise comparison frameworks.

1. Intrinsic Feature Extraction Methodologies

FVD-based verification frameworks extract document-intrinsic features reflecting handwriting geometry and spacing statistics. Specifically, the method outlined in "Innovative Methods for Non-Destructive Inspection of Handwritten Documents" (Breci et al., 2023) utilizes:

  • Text-Line Area Heights: Images are binarized to segment ink traces. Horizontal histograms of foreground pixels (Hrow[r]H_\text{row}[r]) enable segmentation into upper, middle, and lower text-line zones per detected line. Noise suppression and peak detection delimit these areas, yielding {hupper(),hmiddle(),hlower()}\{h_\text{upper}^{(\ell)}, h_\text{middle}^{(\ell)}, h_\text{lower}^{(\ell)}\} for all lines \ell.
  • Word Spacing: Vertical histograms (Hc[c]H_c[c]) over cropped text lines identify sequences of consecutive zeros (word-gaps) exceeding a threshold SoS_o. Recorded gap widths {si}\{s_i\} quantify inter-word spacings.
  • Character Size Extraction: Experts supply a character template TT. A Siamese CNN trained on the A–Z set computes $128$-dimensional embeddings (ϕ(x)\phi(x)) for character images. Sliding window operations and Euclidean embedding distance d=ϕ(T)ϕ(p)2d = \|\phi(T) - \phi(p)\|_2 permit robust template-matching for d<Tcd < T_c, followed by smoothing and character size recording ({hchar,wchar}\{h_\text{char}, w_\text{char}\}).

Document-level features comprise the mean (ηx\eta_x) and standard deviation (σx\sigma_x) for each measure set, capturing global geometrical properties.

2. Feature Vector Construction and Quantification

Each document dd is encoded as a $2K$-dimensional feature vector, where KK is the number of unique measure types extracted. For K=5K=5 (upper, middle, lower heights, word-spacing, char-height, char-width):

f(d)=[ηhu,σhu,ηhm,σhm,ηh,σh,ηs,σs,ηhchar,σhchar,ηwchar,σwchar]R2K\mathbf{f}^{(d)} = [\,\eta_{h_u},\,\sigma_{h_u},\,\eta_{h_m},\,\sigma_{h_m},\,\eta_{h_\ell},\,\sigma_{h_\ell},\,\eta_s,\,\sigma_s,\,\eta_{h_{\mathrm{char}}},\,\sigma_{h_{\mathrm{char}}},\,\eta_{w_{\mathrm{char}}},\,\sigma_{w_{\mathrm{char}}}\,] \in \mathbb{R}^{2K}

These aggregate statistics are computed as:

ηx=1Nn=1Nxn;σx=1Nn=1N(xnηx)2\eta_x = \frac{1}{N} \sum_{n=1}^N x_n \quad ; \quad \sigma_x = \sqrt{ \frac{1}{N} \sum_{n=1}^N (x_n - \eta_x)^2 }

This construction provides a compact but information-rich representation of the document's handwriting characteristics.

3. Feature Vector Difference Computation and Decision Protocols

Authorship discrimination is based on the Euclidean distance between feature vectors associated with different documents. The FVD metric is defined:

D(fi,fj)=k=12K(fi,kfj,k)2D(\mathbf{f}_i, \mathbf{f}_j) = \sqrt{ \sum_{k=1}^{2K} (f_{i,k} - f_{j,k})^2 }

Pairwise FVD quantifies similarity. Decision thresholds (τ\tau), set via cross-validation, classify documents as "same author" if D<τD < \tau, or enable nearest-neighbor attribution in databases.

Standard workflow:

  1. Binarize input images, segment lines, and extract geometric/spacing features.
  2. Build document feature vectors.
  3. Compute the FVD for document pairs.
  4. Evaluate authorship by thresholding or 1-NN assignment.

4. Empirical Protocols, Datasets, and Performance Metrics

Evaluation utilizes several handwriting datasets, including CVL (1600 documents, 311 writers), CSAFE (2430 samples, 90 writers), and a new pen-paper vs tablet corpus (124 authors, 362 docs), spanning traditional and digital media (Breci et al., 2023). Pairwise document comparisons generate accuracy statistics for both binary thresholding and nearest-neighbor identification tasks.

Reported FVD performance:

Dataset Accuracy Competing Baselines
CVL (He & Schomaker ’20) 99.8% FragNet, Bagged-VLAD, SEG-WI
CSAFE 100% -
CVL+CSAFE 99.9% Crawford et al. ’23 (93.3%)
Pen-paper vs Tablet 96% -

High accuracy demonstrates FVD's capacity for objective cross-media authorship discrimination and its superiority over prior statistical and CNN-based benchmarks.

5. Relationship to Diff-Vector Approaches in Stylometry

The FVD methodology is conceptually related to the "Diff-Vector" (DV) paradigm for authorship tasks, as articulated in "Same or Different? Diff-Vectors for Authorship Analysis" (Corbara et al., 2023). DVs represent unordered document pairs not with their absolute geometry, but via the absolute difference of frequency-based feature vectors (typically tf–idf, function words, character n-grams):

Δ(d1,d2)=x1x2\Delta(d_1, d_2) = |x_1 - x_2|

DV solvers train on O(n2)O(n^2) pairwise differences, enhancing binary "Same vs. Different" discrimination. This approach excels under data scarcity, with LazyAA and StackedAA methods further extending to closed-set authorship attribution and verification.

Both FVD and DV encode authorship-relevant distinctions as explicit vectorial differences, but FVD is domain-specific (handwriting geometry/statistics), whereas DV is text-content based (stylometry). The underlying logic of exploiting pairwise vector comparisons for robust classification is common to both.

6. Limitations and Prospective Directions

Current FVD implementations focus exclusively on geometric statistics, without leveraging dynamic ink-trace properties such as stroke orientation, slant, speed, or pressure (Breci et al., 2023). System sensitivity to binarization and segmentational inaccuracies suggests suitability for augmentation by deep-learning-based layout analysis. The use of a single template constrains character variability modeling; extension to multiple templates or end-to-end character set learning is suggested to boost discrimination. Metric learning may supplant fixed Euclidean FVD to further enhance separability.

This suggests active research directions include cross-media dataset expansion, integrated feature extraction using CNNs, and algorithmic innovation for metric representation learning.

7. Practical Implementation Considerations

Implementation requires careful feature extraction, vector assembly, and threshold optimization. Intrinsic feature statistics must be robust against noise and segmentation errors. For large-scale document databases, computational cost depends on pairwise comparison volume and feature vector dimensionality.

Recommended practices in related DV literature (Corbara et al., 2023) include balancing "Same" and "Different" pairs, capping the total number of vector examples, and using 2\ell_2-regularized logistic regression for classifier training. Standardization and χ2\chi^2-based feature selection optimize dense and sparse textual features, respectively. Parallelization of pairwise computation and matrix filling is advised for tractability.

This suggests that for large handwriting corpora, scalable data-handling and automated feature selection pipelines are essential to sustain the rigor and reproducibility of FVD-based authorship verification systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Feature Vector Difference (FVD) Authorship Verification.