Papers
Topics
Authors
Recent
Search
2000 character limit reached

BookNet: Reader Analysis & Image Rectification

Updated 5 February 2026
  • BookNet is a dual-framework approach integrating network analysis for reader preference with deep learning for correcting book image deformations.
  • Its reader analysis employs bipartite graphs, Louvain modularity, and PCA to classify genres and enhance recommendation systems.
  • The image rectification framework uses a ResNet-Transformer architecture with cross-page attention, achieving improved geometric correction and OCR accuracy.

BookNet refers to two distinct frameworks in computational research: (1) a network-based methodology for modeling reader preferences and genre structures in large-scale book consumption data, and (2) an end-to-end deep neural architecture for book image rectification that explicitly models the coupled deformations present in camera-captured book spreads. Each instantiation addresses challenges intrinsic to their domains—community detection in user–item graphs and geometric correction of document images, respectively—through principled algorithmic and architectural innovations.

1. BookNet for Reader Preference and Genre Structure Analysis

Network Construction

BookNet, as developed for modeling reader preference, starts with a bipartite user–book graph GubG_{ub}, where each edge (u,b)(u,b) carries a Goodreads star rating rub{1,2,3,4,5}r_{ub} \in \{1,2,3,4,5\}. Two projections are constructed:

  • Reader network G(R)G^{(R)}: Ignores ratings, connecting books i,ji,j with weight wij(R)w^{(R)}_{ij} equal to the Jaccard index of their reader sets: wij(R)=RiRj/RiRjw^{(R)}_{ij} = |R_i \cap R_j| / |R_i \cup R_j|.
  • Enjoyment network G(E)G^{(E)}: Retains only (u,b)(u,b) with rub4r_{ub} \ge 4. Edge weights wij(E)w^{(E)}_{ij} between books are also computed via the Jaccard index, now over the set of users who highly rated both.

After filtering, the typical node set sizes are N10,800N \approx 10{,}800 (reader network) and N6,800N \approx 6{,}800 (enjoyment network), based on a universe of books read by 26,076 users (Sakal et al., 2023).

Community Detection

Louvain modularity maximization, with resolution γ=1\gamma=1, detects communities in both networks, yielding 6 (reader) and 7 (enjoyment) top-level clusters. The modularity values are Q(R)0.302Q^{(R)} \approx 0.302 and Q(E)0.487Q^{(E)} \approx 0.487. Community centrality is assessed via weighted degree and eigenvector centrality restricted to the community subgraph (Sakal et al., 2023).

Typical communities in the enjoyment network include "Manga", "Thriller", "Children’s", "Fantasy/Sci-Fi", "Young Adult", "Contemporary/Realistic", and "Modern Classics".

Genre Assignment by Subject Enrichment

Each book is further annotated with Open Library subject tags, after thresholding for prevalence (M=279M=279 subjects retained). For each community CC and subject ss, the enrichment ratio Es,CE_{s,C} and zz-score zs,Cz_{s,C} quantify overrepresentation. Communities are interpreted by their highest-enriched subjects, which correspond to traditional fiction genres (Sakal et al., 2023).

Principal Component Analysis and Genre Axes

Principal Component Analysis is applied to the M×KM \times K subject–community frequency matrix XX, producing a covariance matrix Σ=1MXTX\Sigma = \frac{1}{M} X^T X. The first two PCs explain approximately 33% and 26% of the variance, corresponding to "maturity" (adult/horror vs. children’s) and "realism" (modern/realistic vs. fantasy/sci-fi). Communities are mapped in this explanatory plane, enabling coarse genre classification and visualization of genre gaps (Sakal et al., 2023).

Implications and Applications

BookNet’s network communities align with established genre categories, confirming the empirical validity of traditional taxonomy while revealing that enjoyment-based similarity exhibits greater granularity than co-reading. The maturity-realism plane offers a two-dimensional characterization supporting recommender systems and editorial market analysis. Distinction between reader and enjoyment network similarity enables tuning recommendation engines for sales or customer satisfaction objectives.

2. BookNet for Dual-Page Book Image Rectification

Problem Formulation and Motivation

Camera-captured book images are distorted due to asymmetric page curvature and gutter constraints; two independent single-page rectification models cannot guarantee cross-gutter consistency. BookNet addresses this by inferring page-specific and whole-spread geometric flow fields, enforcing consistency at the gutter while preserving the correction of unique deformations per side (Liu et al., 29 Jan 2026).

Architectural Composition

Encoder

BookNet employs a ResNet-style CNN backbone for local feature extraction, followed by a Transformer encoder (4 layers, 8 heads per layer), producing FencRH/8×W/8×256F_{enc} \in \mathbb{R}^{H/8 \times W/8 \times 256} encoding long-range context (Liu et al., 29 Jan 2026).

Dual-Branch Decoder with Cross-Page Attention

There are two learnable query maps Q,QrQ_\ell, Q_r (one per page), processed independently at first, then through a cross-page multi-head attention mechanism, e.g.,

F~=LN[F(1)+MA(Q,Kr,Vr)]\tilde{F}_\ell = \mathrm{LN}\left[ F_\ell^{(1)} + \mathrm{MA}(Q_\ell, K_r, V_r) \right]

and analogously for F~r\tilde{F}_r. This explicitly couples feature learning between left and right pages.

Flow Prediction and Fusion

The model outputs three flows at high resolution: MM_\ell (left page), MrM_r (right page), MfM_f (full spread). Flows are upsampled by a learnable convex combination with softmaxed weights and used for bilinear sampling of the original image to produce the rectified spread.

Datasets: Book3D and Book100

  • Book3D: 56,000 synthetic high-resolution dual-page spreads rendered with Blender and Cycles, simulating academic book layouts, realistic page curvature, and lighting; provides pixelwise ground-truth geometric supervision.
  • Book100: 100 real-world smartphone-captured spreads across multiple languages and lighting environments, each paired with a verified reference scan (Liu et al., 29 Jan 2026).

Supervision, Training, and Evaluation

A multi-task L1L_1 loss supervises all three flows jointly: L=MMgt1+MrMrgt1+MfMfgt1L = \|M_\ell - M_\ell^{gt}\|_1 + \|M_r - M_r^{gt}\|_1 + \|M_f - M_f^{gt}\|_1 Hyperparameters include AdamW optimizer with weight decay 1e51e^{-5}, OneCycle learning rate schedule, 65 epochs, and batch size 4 per GPU.

Evaluation employs Multi-Scale SSIM (MSSIM; perceptual), Local Distortion (LD), Aligned Distortion (AD; geometric), and Character Error Rate (CER), Edit Distance (ED; OCR-based). On Book100, BookNet achieves MSSIM=0.48, LD=12.42, AD=0.53, CER=0.3452, ED=948.63—surpassing all listed baselines in geometric and downstream OCR accuracy (Liu et al., 29 Jan 2026).

Ablations and Component Analysis

Removal of cross-page attention, the transformer encoder, or flow supervision significantly degrades rectification. Cross-page attention, in particular, reduces LD by 5.3%, CER by 34.1%, and ED by 44.1%. Joint supervision across left, right, and spread flows enforces both local and global consistency, improving gutter alignment.

Limitations and Future Work

BookNet may fail with extremely severe warping, heavy occlusion, or complex commercial/magazine layouts. The network comprises 30.1M parameters, achieving 24.4 FPS on RTX 3090, with real-time mobile deployment remaining a challenge. Future directions include OCR-in-the-loop feedback, extension to multi-sheet books, diffusion-based generative postprocessing, and model distillation for mobile (Liu et al., 29 Jan 2026).

3. Comparative Summary of Methodologies

Domain Core Function Principal Techniques
Reader Analysis Network-based genre discovery Bipartite projection, Louvain modularity, PCA
Image Rectification Dual-page geometric correction CNN–Transformer, cross-page attention, flow fusion

Each framework referred to as "BookNet" illustrates domain-specific advances—statistical network modeling for interpretability and recommendation, and cross-structural deep learning for dual-page document rectification.

4. Research Impact and Practical Relevance

BookNet (reader network) establishes a quantitative, scalable foundation for genre exploration, classification, and recommendation in book ecosystems, with robust alignment to conventional taxonomy and extensibility via Open Library subjects. The dual-page BookNet image rectifier demonstrates that explicit geometric coupling via cross-attention architectures induces significant improvements in both downstream and low-level metrics, setting new baselines for camera-based digitization in computational document analysis.

5. Future Directions

Open research directions for BookNet in the reader domain include exploiting finer-grained subject ontologies, temporal or sequential models of reader behavior, and dynamic community detection in evolving datasets. For document rectification, enhancing robustness to commercial layouts, integrating OCR supervision, and achieving efficient mobile inference remain active technical challenges. Multi-modal, generative, and human-in-the-loop extensions offer promising trajectories for both strands of BookNet research.


References: (Sakal et al., 2023, Liu et al., 29 Jan 2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to BookNet.