Papers
Topics
Authors
Recent
2000 character limit reached

LithoSeg: Advanced Segmentation Framework

Updated 22 November 2025
  • LithoSeg is a multi-domain segmentation framework that uses a coarse-to-fine approach for accurate delineation in SEM, digital rock, and battery XCT applications.
  • The framework integrates advanced models like SAM with human-in-the-loop bootstrapping, random forests, and TransforCNN to refine edge and structure details.
  • LithoSeg minimizes manual annotation via weakly supervised techniques and achieves state-of-the-art accuracy and robustness across diverse imaging scenarios.

LithoSeg refers to multiple segmentation frameworks developed for precise delineation and characterization in diverse domains including lithography scanning electron microscopy (SEM), digital rock typing from micro-computed tomography (micro-CT) in geoscience, and X-ray computed tomography (XCT) in lithium battery inspection. Despite their application differences, LithoSeg approaches are notable for their use of multi-stage workflows, advanced machine learning models, and domain-adapted strategies to achieve state-of-the-art accuracy and robustness with minimal supervision. The following sections detail the main design principles, algorithmic structures, training protocols, evaluation metrics, and notable findings from seminal LithoSeg implementations in recent literature.

1. Conceptual Overview and Problem Scope

LithoSeg frameworks are designed to solve segmentation and classification challenges where conventional methods are limited by annotation demands, generalization, or metrology-grade precision. In lithography and metrology, LithoSeg targets precise groove-contour extraction in noisy SEM images with diverse pattern geometries and varying process windows (He et al., 15 Nov 2025). In digital rock physics, the goal is to assign every voxel in 3D carbonate core scans a discrete “rock type” encoding lithology, porosity, permeability, and capillary-pressure behavior—entirely replacing destructive laboratory assays with non-destructive imaging and machine learning (Alfarisi et al., 2021). Battery inspection applications focus on delineating dendrites in XCT data for automated defect quantification (Quenum et al., 2023).

Key commonalities across domains include:

  • Emphasis on high-precision edge or morphology segmentation
  • Human-in-the-loop and weakly supervised annotation reduction strategies
  • Integration of multi-scale or multi-stage prediction—first for coarse object localization, then for fine-grained refinement

2. Coarse-to-Fine Architectural Designs

LithoSeg architectures generally adopt a two-step or hierarchical approach that separates rough region-of-interest determination from subsequent edge or structure refinement.

The LithoSeg pipeline decomposes the task as follows:

  1. Coarse Segmentation: Utilizes the Segment Anything Model (SAM), a vision foundation model, adapted via a Human-in-the-Loop Bootstrapping scheme. Initial bounding-box prompts (from CAD layouts) guide SAM to generate coarse binary masks. These are curated by non-expert humans using keep/delete decisions, after which prompt-free fine-tuning aligns SAM to SEM data.
  2. Fine Point-wise Regression: Transforms 2D segmentation into thousands of independent 1D regression problems. Each coarse contour point samples a normal-profile (1D intensity line), which a lightweight multi-layer perceptron (MLP, ~0.4 MB parameters) uses to predict sub-pixel displacement along the edge normal. Brightest-center alignment of the profile provides translational invariance. The overall mask is then reconstructed by rasterizing the refined polygon.

In digital rock typing for carbonates, LithoSeg deploys a:

  • Patch-wise Classifier: Extracts 32³-voxel windows with overlap, computes multi-scale Difference-of-Gaussian features at σ = [1,2,4] voxels, and classifies each voxel via a random forest (RF) of 100 trees, depth 20. Majority voting across overlapping patches propagates fine-scale context and acts as an analogue of skip-connections in CNNs.
  • 3D Median Filtering & Conditional Random Field (CRF): A post-processing 3×3×3 median filter removes isolated voxels. A 3D CRF with Potts model smooths labels via mean-field inference, using RF probabilities as unary energies.

LithoSeg leverages a TransforCNN model:

  • Transformer Encoder: Processes 128×128 patches via eight layers of multi-head self-attention, operating on flattened 16×16 sub-patches with positional encodings.
  • CNN Decoder: Reconstructs high-resolution outputs with upsampling and skip-fusion mechanisms, using outputs from multiple transformer depths.

3. Training, Losses, and Regularization

Across domains, the selection of training objectives and regularization schemes is tailored to maximize label precision and generalization while preventing overfitting.

Domain Loss Functions Regularization/Pruning
Lithography SEM Dice + cross-entropy (SAM fine-tuning); L1/L2 for 1D regression No mask editing, minimal epochs
Digital Rock Typing RF: Cross-entropy + Dice (λ=0.5); CRF post-processing Tree pruning (min samples/leaf=5); L2 normalisation per DoG channel
Battery XCT Binary cross-entropy (BCE) Dropout in training, no explicit weight decay

For LithoSeg in digital rock typing, training continues until validation cross-entropy stagnates for five successive depth-increments, mitigating overfitting (Alfarisi et al., 2021). In the SEM domain, prompt-free fine-tuning cycles always restart from pre-trained weights to avoid catastrophic forgetting (He et al., 15 Nov 2025).

4. Segmentation Performance and Evaluation Metrics

LithoSeg methods demonstrate notable improvements over established baselines with respect to both segmentation and metrology performance metrics.

On four test subsets (“Easy”, “Medium”, “Hard”, “Extreme”), LithoSeg achieves:

  • Mask IoU: 98.13% (“Easy”)
  • Critical dimension (CD) error: –0.12 nm
  • Line-edge roughness (LER) mean and RMS errors: REa=0.02R_{Ea}=0.02, REq2=0.01R_{Eq}^2=0.01 nm

Even on “Extreme” patterns, IoU remains at 89.6%.

Ablation studies confirm that:

  • Brightest-center alignment is essential (IoU drops to 77.8% if omitted)
  • Alternative regressors (CNN/transformer) underperform the MLP
  • Normal scan size must follow Rayleigh criteria

LithoSeg achieves, on held-out test sets (mean ± std over 5 folds):

  • Accuracy: 0.92 ± 0.02
  • Mean IoU: 0.85 ± 0.03
  • Mean Precision/Recall: 0.88/0.87

Compared to intensity-threshold baselines:

  • +15% overall IoU gain
  • +12% grain-matrix vs. vug recall improvement

LithoSeg-TransforCNN achieves:

  • mIoU: 0.9511
  • mDSC: 0.9647
  • Inference latency per patch: 206.75 ms

TransforCNN outperforms U-Net (mIoU: 0.8698) and Y-Net (mIoU: 0.8481), but the method is ~3× slower than U-Net.

5. Annotation, Supervision, and Training Efficiency

LithoSeg is notable for its reduction in manual annotation:

  • In Lithography SEM, only layout-derived bounding boxes and keep/delete mask curation are required; fine-stage annotation is limited to simple 1D profile corrections (He et al., 15 Nov 2025).
  • In digital rock typing, data augmentation (rotations, flips, intensity jitter) and patch overlap increase robustness with modest core-level split (70% train, 15% validation, 15% test) (Alfarisi et al., 2021).
  • For battery XCT, standard data-augmentation is employed; hand-labeled patches are relatively few (4,433) (Quenum et al., 2023).

LithoSeg’s training pipeline can converge in 13 minutes on a single GPU in SEM applications, compared to 50–100 minutes for comparably supervised baselines (He et al., 15 Nov 2025).

6. Domain Adaptation and Robustness

The combination of a universal segmentation model (SAM or transformer-based backbone) and efficient fine-tuning or refinement yields strong domain adaptation properties:

  • Human-in-the-loop bootstrapping allows SAM to be rapidly tailored to new SEM domains, process windows, or fab nodes. Retraining on new wafers can occur in minutes (He et al., 15 Nov 2025).
  • In digital rock typing, the statistical random forest approach generalizes across Cretaceous carbonate textures but demonstrates limitations in transfer to untested lithologies (Alfarisi et al., 2021).
  • Battery XCT segmentation, though highly accurate, is currently limited by 2D slice-based design and hand-labeled data scale (Quenum et al., 2023).

Robustness experiments on SEM segmentation showed that injecting up to 30% random mask noise in the bootstrapped set caused negligible IoU degradation (±1%), illustrating resilience to annotation imperfections (He et al., 15 Nov 2025).

7. Limitations and Prospective Extensions

Current LithoSeg implementations are domain-specialized and exhibit certain limitations:

  • Lithography SEM: 2D–1D design may neglect full 2D–3D context; scan size hyperparameters are critical; annotator involvement, though lightweight, is not zero (He et al., 15 Nov 2025).
  • Digital Rock Typing: Model is resolution-dependent (requires at least 25–30 μm for microporosity), and underperforms on vug networks much larger than patch size. Transfer learning to other carbonate assemblages (oolitic, bioclastic) is untested. Future work targets RF replacement with 3D U-Net, integration of wettability logs, and coupling to flow simulators (Alfarisi et al., 2021).
  • Battery XCT: Processing speed may limit inline inspection; 2D slice-based approach does not leverage through-slice continuity. Scaling to multi-class segmentation and semi-supervised learning is a prospective direction (Quenum et al., 2023).

A plausible implication is that the core “coarse-to-fine” paradigm—leveraging generalist models for invariant detection, followed by 1D or patch-wise domain-specific specialists—offers a generic pattern for segmentation tasks where minimal supervision, high generalization, and metrology-grade accuracy are required.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to LithoSeg.