PointRAFT: 3D Regression for Yield Estimation

Updated 6 January 2026

PointRAFT is a high-throughput point cloud regression network that directly estimates continuous 3D shape properties from partial scans using a dedicated height embedding to mitigate self-occlusion.
Its hierarchical architecture, combining set abstraction modules and a regression head, achieves significantly lower errors compared to traditional reconstruction-based methods.
Evaluated on extensive field data, PointRAFT delivers real-time yield estimation performance and demonstrates generalization across different cultivars and practical harvesting conditions.

PointRAFT is a high-throughput point cloud regression network designed to directly predict continuous three-dimensional (3D) shape properties such as potato tuber weight from partial point clouds acquired by RGB-D cameras under practical harvesting conditions. Departing from conventional approaches that attempt explicit 3D reconstruction, PointRAFT achieves non-contact, real-time yield estimation by inferring target values using raw, incomplete geometric data. Its core architectural innovation is the incorporation of a dedicated object height embedding, which mitigates the effect of self-occlusion and enhances its predictive accuracy. Developed and evaluated on automatically acquired field data, PointRAFT delivers high performance and throughput suitable for commercial agricultural robotics and extends to other 3D phenotyping and robotic perception tasks (Blok et al., 30 Dec 2025).

1. Motivation and Problem Scope

In precision agriculture, yield estimation is crucial for informed cultivation and logistics. Potato tuber weight can be estimated in situ on harvesters using RGB-D sensors that reconstruct 3D representations of tubers moving along conveyor belts. However, these point clouds are inevitably partial due to self-occlusion: sizable portions of the object surface remain unobserved, resulting in systematic errors if full volume is estimated using traditional geometric algorithms. Conventional completion-based approaches require paired full scans, explicit mesh post-processing, and external calibration, increasing both system complexity and acquisition time.

PointRAFT (Regression of Amodal Full Three-dimensional shape properties from partial point clouds) addresses this bottleneck by regressing shape properties—most notably weight—directly from a partial 3D point cloud. This paradigm shift obviates the need for full reconstruction and external geometric calibration while retaining the latent relationship between visible geometry and missing mass. The principal challenge is designing a model capable of extracting shape cues robustly from incomplete data subject to practical operational noise, orientation variability, and geometric diversity.

2. Network Architecture and Core Components

PointRAFT is architecturally based on hierarchical set abstraction modules inspired by PointNet++ Single-Scale Grouping (SSG), and is augmented with a custom object height embedding. The complete pipeline performs the following steps:

Input Preprocessing and Sampling: Each partial point cloud $P_0 = \{(x_i, y_i, z_i)\}$ containing possibly several thousand points is downsampled to $N=1024$ using farthest-point sampling (FPS) and centered to the origin to eliminate positional bias.
Hierarchical Feature Extraction:
- Set Abstraction 1: FPS reduces $P_0$ to $P_1$ (512 points). For each $p \in P_1$ , up to 64 neighbors are grouped within radius $r_1=0.2$ m. A shared multilayer perceptron (MLP) with layers [3→64→64→128] produces per-point features.
- Set Abstraction 2: FPS on $P_1$ yields $P_2$ (128 points). Neighborhoods of up to 64 points within $r_2=0.4$ m are encoded by an MLP 131→128→128→256.
- Set Abstraction 3 (Global): A global abstraction over $P_2$ , using [259→256→512→1024] MLP, followed by global max pooling, yields the shape feature $f_{\text{shape}} \in \mathbb{R}^{1024}$ .
Object Height Embedding: Assuming tubers are lying flat, height $h$ is estimated as $h = H_\text{cam} - \min_i z_i$ where $H_\text{cam}$ is the calibrated camera-to-belt distance. Height $h$ is embedded by $MLP_{\text{height}}:[1\to32\to32]$ into $f_h\in\mathbb{R}^{32}$ .
Regression Head: The global shape feature and height embedding are concatenated ( $f_{\text{concat}} \in \mathbb{R}^{1056}$ ) and passed to $MLP_{\text{reg}}:[1056\to512\to256\to1]$ with dropout $p=0.5$ to produce the predicted weight $\hat{w}$ .

The model’s weight prediction is computed as:

$\hat{w} = W_3 \cdot \sigma(W_2 \cdot \sigma(W_1 \cdot f_{\text{concat}} + b_1) + b_2) + b_3$

where $\sigma$ denotes the ReLU nonlinearity and $W_1 \in \mathbb{R}^{512\times1056}$ , $W_2 \in \mathbb{R}^{256\times512}$ , $W_3 \in \mathbb{R}^{1\times256}$ .

3. Training Methodology and Regularization

PointRAFT was trained and validated on a dataset comprising 26,688 partial point clouds acquired from 859 potato tubers (four cultivars, three seasons, Japan). Each tuber was scanned in approximately 31 distinct poses during conveyor transport, stratified and split by weight bins as follows:

Training: 515 tubers, 16,108 clouds
Validation: 172 tubers, 5,326 clouds
Test: 172 tubers, 5,254 clouds

The network was optimized using Adam ( $\text{lr}=1 \times 10^{-3}$ , weight decay $1 \times 10^{-4}$ , batch size 32), with an exponential learning rate decay (0.97 per epoch) over 50 epochs. The final model was selected via the validation Smooth L1 loss.

Loss Functions: The central loss is Smooth L1 with $\beta=20.0$ :

$L_{\text{SmoothL1}}(w, \hat w) = \begin{cases} 0.5\cdot(w-\hat w)^2/\beta, & |w-\hat w| < \beta\ |w-\hat w| - 0.5\cdot\beta, & \text{otherwise} \end{cases}$

Hyperparameter selection (Optuna) established its robustness to outliers, outperforming standard MSE and MAE alternatives.

Data Augmentation: Jitter ( $5 \times 10^{-4}$ m), random rotation (0–2°), flips (50% probability x/y), and shear (factor 0.2) are applied on-the-fly. An ImbalancedSampler addresses the nonuniform weight distribution by binning weights into 10 classes and inversely scaling sample probabilities.
Regularization: Dropout ( $p=0.5$ ) in the regression head proved essential for generalization.

4. Performance and Comparative Evaluation

PointRAFT was benchmarked against a linear regression baseline using 3D bounding box features (length, width, height estimate $h$ ) and a standard PointNet++ regression network. The test set results are summarized below:

Method	MAE [g]	RMSE [g]	R²
Linear Regression	23.0	31.8	0.91
PointRAFT	12.0	17.2	0.97

Thus, across cultivars and seasons, PointRAFT reduced mean absolute error by approximately 48% and root mean squared error by approximately 46% over the baseline (Blok et al., 30 Dec 2025).

Throughput: Average inference time is 6.3 ms per point cloud, enabling processing rates up to 150 tubers/s on an NVIDIA RTX 4090 Laptop GPU, not accounting for point cloud generation overhead.
Ablation Analysis:
- Removing height embedding increased MAE to 16.5 g (+37.5%), RMSE to 24.5 g (+42.4%).
- Disabling dropout increased MAE to 13.1 g (+9.2%), RMSE to 20.7 g (+20.3%).
- Reducing input points to 512 increased MAE to 14.5 g (+20.8%), RMSE to 21.4 g (+24.4%).
- Increasing to 2048 points led to slightly better MAE (13.0 g), RMSE (19.6 g), but reduced throughput (9.0 ms).

5. Generalization and Extension Capabilities

The direct regression framework of PointRAFT has demonstrated generalization across camera setups (two enclosures, two belt distances) and cultivars with distinct morphological characteristics. Its modular, efficient architecture enables portability to related 3D phenotyping and robotic perception domains:

Estimation of occluded fruit/vegetable size or volume from partial scans
6D pose or grasp-force prediction supporting autonomous harvesting systems
Lightweight encoder for shape-completion pipelines (e.g., CoRe, CoRe++)

A plausible implication is that PointRAFT’s approach—where full geometry is neither reconstructed nor explicitly imputed—could be further extended to real-time inference settings for diverse, non-contact characterization tasks in industrial robotics and plant phenomics.

6. Connections to RAFT-derived and Scene Flow Methods

PointRAFT’s nomenclature and core design are rooted in the regression of amodal 3D properties; however, it is architecturally and functionally distinct from RAFT-like scene flow models such as PV-RAFT (Wei et al., 2020). PV-RAFT, designed for scene flow estimation of 3D point clouds, utilizes point–voxel correlation fields and recurrent all-pairs field transforms for iterative correspondence assignment. The PointRAFT model instead targets 3D regression, relying on hierarchical set abstraction and geometric embedding rather than all-pairs correlation pyramids or recurrent flow updates.

This suggests that, while inspired at the level of robust 3D operations and data efficiency, the methodological innovations of PointRAFT—particularly the direct shape regression and height embedding—are tailored for incomplete, amodal shape inference rather than explicit scene flow or correspondence. Future work may explore synergies between these frameworks, for example in occlusion-aware regression downstream of scene flow estimation.

7. Availability and Resources

The PointRAFT codebase, pretrained weights, and subset of the training dataset are publicly accessible at https://github.com/pieterblok/pointraft.git, enabling further benchmarking and transfer to diverse computational phenotyping and perception scenarios (Blok et al., 30 Dec 2025).

Markdown Upgrade to Chat

References (2)

PointRAFT: 3D deep learning for high-throughput prediction of potato tuber weight from partial point clouds (2025)

PV-RAFT: Point-Voxel Correlation Fields for Scene Flow Estimation of Point Clouds (2020)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to PointRAFT.