CVChess: Deep Learning-Based Image-to-FEN Converter
- CVChess is a deep learning-based system that converts smartphone-captured chessboard images into FEN notation using advanced computer vision techniques.
- The system employs a deterministic pipeline featuring image preprocessing, homography rectification, grid segmentation, and a residual CNN for robust per-square piece classification.
- CVChess demonstrates high in-domain accuracy (98.93% square-level) on the ChessReD dataset, while highlighting challenges and strategies for improving out-of-domain performance.
CVChess is a deep learning–based system for converting smartphone-captured images of physical chessboards into Forsyth–Edwards Notation (FEN), enabling downstream integration with digital chess engines. It addresses the analog–digital divide in chess analysis by reliably transcribing piece placements from real-world photographs, leveraging computer vision techniques, residual convolutional networks, and a robust preprocessing pipeline. The system is trained and validated on the Chess Recognition Dataset (ChessReD), comprising annotated images from diverse capture conditions, and evaluated on challenging out-of-domain test sets (Abeykoon et al., 14 Nov 2025).
1. System Architecture and Image-to-FEN Pipeline
The CVChess framework is organized around a deterministic pipeline transforming an RGB chessboard image into its corresponding FEN string encoding. The principal stages are outlined below:
- Image Capture: The user acquires an RGB photograph of a physical chessboard using any smartphone device.
- Image Preprocessing: The image undergoes grayscale conversion and Gaussian blur (5×5 kernel) to suppress noise, followed by Canny edge detection (thresholds 50/150) and morphological dilation (5×5) to bridge gaps from shadows and occlusion. The board contour is then extracted via
cv2.findContours, isolating the largest convex quadrilateral with area >5% of the image. The corners are ordered to enforce the (a8, h8, a1, h1) layout, guaranteeing consistent anchor mapping. - Homography & Rectification: A 3×3 projective transformation (homography matrix ) is computed using the Direct Linear Transform and SVD. The board is warped to a canonical 400×400 top-down representation.
- Grid Segmentation: The warped image is partitioned into an 8×8 grid, yielding 64 patches (50×50 pixels each), corresponding to squares a8–h1.
- Piece Recognition: Each square is processed by a residual CNN, classifying it as one of 13 classes: {P, N, B, R, Q, K} (white), {p, n, b, r, q, k} (black), or ‘.’ (empty).
- FEN Serialization: The 64 predicted labels are serialized rank-by-rank; sequences of empty squares are compacted using FEN digit notation. Only the piece-placement field of the FEN string is produced; castling rights, side-to-move, en-passant, and counters are not inferred.
A table summarizing the pipeline is provided below.
| Stage | Operation | Output |
|---|---|---|
| Image Capture | Smartphone photo | RGB image |
| Preprocessing | Grayscale/blur/edge/dilation | Board polygon |
| Homography | Warp via | 400×400 board image |
| Segmentation | 8×8 grid split | 64 square patches |
| Piece Recognition | Residual CNN classification | 64 class labels |
| FEN Encoding | Serialize per FEN rules | Piece-placement field |
2. Preprocessing, Board Rectification, and Segmentation
The robustness of CVChess in diverse imaging conditions is contingent on its preprocessing module, which mitigates geometric distortions, lighting variation, and partial occlusions.
- Edge Detection: Grayscale images are blurred and subjected to Canny edge detection, with morphological dilation closing small gaps, notably for shadowed borders or occluded pieces.
- Contour and Homography Extraction: The largest quadrilateral contour corresponds to the chessboard; its vertices are ordered to align the top-left corner with square a8 (always white after orientation standardization). The homography is computed by minimizing under scale normalization. This transformation rectifies the image to a canonical orientation and aspect ratio.
- Grid Partitioning: The canonical 400×400 board is divided into equally sized 50×50 squares. This segmentation ensures each patch isolates a physically contiguous square suitable for downstream classification.
This preprocessing pipeline is essential for generalization: ablation experiments reveal that omitting preprocessing severely impairs performance in realistic settings.
3. Residual Convolutional Neural Network for Piece Classification
CVChess employs a residual deep convolutional neural network for per-square piece recognition:
- Architecture:
- Input: 3×400×400 RGB patches (per square).
- Stem: Convolution (64 filters, 7×7, stride=2, padding=3) → BatchNorm → ReLU → MaxPool (3×3, stride=2).
- Residual Layers: Three sequential stages, with increasing channel dimensionality (64→128, 128→256, 256→512), each comprising sequences of pre-activation residual blocks. Each block implements , with shortcut path projections (1×1 convolutions) as needed.
- Pooling and Classification: Adaptive average pooling to an 8×8 map, reshaped, then classified via a linear layer to 64×13 logits, followed by per-square softmax.
- Regularization: Dropout layers (rate not specified) between residual layers; ReLU activation after each batch normalization; final softmax for cross-entropy loss over 13 classes.
This architecture enables effective retention of low-level features (e.g., board edges, piece contours) while supporting deep hierarchical feature extraction necessary for robust classification under real-world conditions.
4. Dataset Construction and Training Methodology
The Chess Recognition Dataset (ChessReD) underlies all supervised training and evaluation:
- Composition: 10,800 annotated smartphone images, spanning three device models and varied resolutions, with diverse lighting and camera angles. Each image is meticulously labeled with ground truth FEN and per-square class.
- Partitioning: 6,479 images for training, 2,192 for validation, and 2,129 for testing. An additional out-of-domain set comprises 445 images (from the Kasparov–Topalov 1999 match, 89 positions × 5 angles) serving for generalization assessment.
- Training Procedures: Cross-entropy loss per square; optimizer, learning-rate schedule, batch size, and epoch count are not explicitly stated. Early stopping and model selection are based on validation set accuracy.
- Data Augmentation: Not specified beyond acquisition under multiangle/lighting conditions.
This dataset reflects practical acquisition conditions and emphasizes generalization to real-world scenarios with minimal synthetic augmentation.
5. Performance Metrics and Evaluation
The system’s performance is quantified at both the square and board (FEN) level across in-domain and out-of-domain evaluations.
- ChessReD Test Set (in-domain, 2,129 images):
- Overall square-level accuracy: 98.93% (across all 13 classes).
- Non-empty square accuracy: 97.11% (excluding empty squares).
- Perfect FEN transcriptions (no misclassifications): 1,145 out of 1,790 boards (63.96%).
- Error analysis: Principal confusion between visually similar pieces—pawns versus bishops (~4%) and queens versus kings (~5%).
- Kasparov–Topalov Out-of-Domain Set (445 images):
- Square-level accuracy: 65.17% (overall), 54.06% (non-empty).
- Perfect FEN boards: 133 (29.88%).
- Ablation results: Performance collapses without preprocessing; increases in training set size and improvements to corner detection each yield similar accuracy gains.
6. FEN Construction and Output Specification
The output of CVChess is the FEN piece-placement string, denoting the arrangement of pieces:
- Serialization rules:
- Ranks are serialized in order 8→1, files a→h; sequential empty squares are collapsed into digits; ranks are separated by ‘/’.
- Example: “rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR”
- Algorithm:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
fen = "" for rank in 8 down to 1: empty_count = 0 for file in a to h: label = pred[rank,file] if label == empty: empty_count += 1 else: if empty_count > 0: fen += str(empty_count) empty_count = 0 fen += label # e.g. 'P','n', etc. if empty_count > 0: fen += str(empty_count) if rank > 1: fen += '/' |
- Caveats: Only the piece-placement field is generated; side-to-move, castling, en-passant targets, and move counters are excluded. If detection yields missing squares, the board is rejected.
7. Limitations and Prospective Enhancements
CVChess demonstrates robust in-domain performance, but several limitations are reported:
- Detection Failures: Approximately 16% of images exhibit board detection failure under severe glare or extreme camera angles.
- Misclassification Regimes: Small pieces (especially pawns) are susceptible to occlusion-related misclassification. Domain shift (due to novel boards, pieces, or cameras) reduces accuracy by ~35 percentage points on new data.
- Enhancement Strategies: Suggested improvements involve:
- Replacing contour-based board detection with learned keypoint detectors (e.g., CNN-based), or more sophisticated line clustering.
- Expanding the training distribution with synthetically warped images and adversarial lighting to mitigate domain shift.
- Augmenting the model with attention modules or retraining residual blocks to better distinguish visually similar pieces.
- Integrating end-to-end backbones (e.g., U-Net) for simultaneous localization and piece recognition.
- Exploring context-based inference to extract castling/en-passant status from move sequence (PGN) analysis.
Overall, CVChess establishes a comprehensive and end-to-end approach for conversion from RGB chessboard imagery to digital FEN representations, setting a technical baseline for subsequent research in physical–digital chess state recognition (Abeykoon et al., 14 Nov 2025).
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days free