Deep Transfer Learning of Pick Points on Fabric for Robot Bed-Making (1809.09810v3)

Published 26 Sep 2018 in cs.RO and cs.AI

Abstract: A fundamental challenge in manipulating fabric for clothes folding and textiles manufacturing is computing "pick points" to effectively modify the state of an uncertain manifold. We present a supervised deep transfer learning approach to locate pick points using depth images for invariance to color and texture. We consider the task of bed-making, where a robot sequentially grasps and pulls at pick points to increase blanket coverage. We perform physical experiments with two mobile manipulator robots, the Toyota HSR and the Fetch, and three blankets of different colors and textures. We compare coverage results from (1) human supervision, (2) a baseline of picking at the uppermost blanket point, and (3) learned pick points. On a quarter-scale twin bed, a model trained with combined data from the two robots achieves 92% blanket coverage compared with 83% for the baseline and 95% for human supervisors. The model transfers to two novel blankets and achieves 93% coverage. Average coverage results of 92% for 193 beds suggest that transfer-invariant robot pick points on fabric can be effectively learned.

Citations (19)

View on Semantic Scholar

Summary

The paper introduces a deep transfer learning method that predicts fabric pick points from depth images, achieving approximately 27-pixel error accuracy.
It leverages a modified YOLO architecture with fixed pre-trained layers to overcome challenges in deformable fabric manipulation compared to traditional methods.
Experiments on HSR and Fetch robots show enhanced blanket coverage and efficiency, demonstrating robust generalization across diverse fabric types.

This paper (Deep Transfer Learning of Pick Points on Fabric for Robot Bed-Making, 2018) addresses the challenging problem of manipulating deformable objects like fabric, focusing specifically on the task of robot bed-making. The core challenge is to identify effective "pick points" on the fabric that, when grasped and pulled, lead to a desired outcome, such as increasing blanket coverage on a bed. Unlike rigid objects, fabric has an infinite-dimensional configuration space, making traditional model-based approaches difficult.

The authors propose a supervised deep learning approach to identify these pick points directly from sensor data. They use depth images as input to the neural network, which provides invariance to the color and texture of the fabric, a crucial factor for real-world variability.

Implementation Details:

Input: The system takes a depth image from the robot's head camera. The original 640x480 depth images are resized to 448x448 and the single depth channel is triplicated to match the input dimensions expected by the pre-trained network.
Model Architecture: A deep convolutional neural network based on the YOLO (You Only Look Once) architecture is used. The network is adapted for this task: it utilizes pre-trained weights from YOLO trained on RGB images (Pascal VOC 2012) for initial feature extraction. The first 32 million parameters are fixed (transfer learning), and additional convolutional and dense layers are trained.
Output: The network outputs a 2D pixel coordinate $(x, y)$ representing the predicted pick point on the input depth image.
Grasping and Pulling: The predicted 2D pick point is projected into 3D space using the corresponding depth value from the depth image and known camera parameters. The robot then moves its gripper to this 3D location with an orientation orthogonal to the bed surface, grasps the fabric, and pulls it towards the nearest uncovered bed corner.
System Loop: The robot is positioned at one side of the bed and performs grasp-and-pull actions. After each attempt, it uses a coverage heuristic (implemented as another neural network trained to detect sufficient coverage) to decide if another attempt is needed on that side. The robot is limited to a maximum of four attempts per side before moving to the other side of the bed.
Training Data: Data was collected using two mobile manipulator robots (HSR and Fetch) and a white blanket with a red marker at the corner to facilitate automatic labeling. Human supervisors generated initial blanket configurations and then performed short pulls, recording the robot's depth image and the pixel location of the marked corner as the ground truth pick point. A dataset of 2018 image-label pairs was collected from both robots.
Training Procedure: The model is trained end-to-end using supervised learning to minimize the $L_2$ pixel error between the predicted and ground truth pick points. The Adam optimizer is used. Data augmentation techniques (noise, random dots, vertical flips) were applied to increase the dataset size by 10x. Hyperparameter tuning was done using 10-fold cross-validation.
Transfer Learning: The use of pre-trained YOLO weights on RGB data, despite the input being depth images, is a key aspect of the transfer learning approach. While training the full network from scratch on depth data was attempted, fixing the pre-trained weights for initial layers performed better, suggesting the initial feature extractors learned from RGB data are still useful for processing depth information in this domain.

Experimental Evaluation and Results:

The method was evaluated on a quarter-scale bed using the HSR and Fetch robots with three different blankets (white, multicolored Y&B, and teal). Performance was benchmarked against:

Analytic Baseline: The robot grasps at the highest reachable point on the blanket.
Human Supervisor: A human selects the pick point by clicking on the robot's depth image via a web interface.
RGB-based Network: A comparable network trained on RGB images (used for generalization comparison).

Key findings include:

Pick Point Accuracy: The trained depth-based grasp network achieved an average $L_2$ pixel error of around 27 pixels on a held-out validation set. An ablation paper showed that increasing training data size improved accuracy, with diminishing returns observed after using about two-thirds of the full dataset.
Comparison to Traditional Methods: A classical method like the Harris Corner Detector applied to depth images performed significantly worse (175.0 pixel error) and often failed to find corners, highlighting the difficulty of using hand-engineered features for this task.
Coverage Performance (White Blanket): On the white blanket used for training, the learned depth-based network achieved an average coverage of 92% across both robots (93% for HSR, 90% for Fetch). This significantly outperformed the analytic highest-point baseline (83-85% coverage) with statistical significance ( $p=0.00034$ for HSR). The performance was statistically comparable to the human supervisor (93-96% coverage, $p=0.0889$ for HSR).
Efficiency: The learned network required fewer grasp attempts per rollout (average 4.3-4.4) compared to the analytic baseline (average 4.9-6.2), closer to the human supervisor (average 2.8-3.0).
Generalization to Novel Blankets: The depth-based network trained only on the white blanket successfully generalized to the multicolored Y&B and teal blankets, achieving 92% and 93% coverage, respectively. Statistical tests showed no significant difference in coverage performance between the different blankets, demonstrating the depth input's effectiveness in handling variations in color and texture.
Depth vs. RGB: An RGB-based network trained on the white blanket performed noticeably worse (86% coverage) when tested on the Y&B and teal blankets, confirming the superior generalization capability of the depth-based approach.
Timing: The neural network inference itself was very fast (mean 0.1s). The main time bottlenecks in the bed-making process were robot movements (moving between sides) and grasp execution (moving the arm and pulling), taking tens of seconds per action.

Practical Implications and Future Work:

The research demonstrates a practical, learning-based approach for a challenging fabric manipulation task. By using depth images and transfer learning, the system achieves good performance and generalizes well to different fabric types without requiring retraining. This approach could be applicable to other tasks involving manipulating deformable materials where identifying key manipulation points is critical.

Future work suggested by the authors includes:

Exploring Fog robotics for scalable pick point learning.
Reducing hard-coded robot actions and learning more general policies, potentially using reinforcement learning and simulators.
Applying the method to other deformable objects like furniture covers, table cloths, and textiles.

The authors have made their code and data publicly available, facilitating further research and implementation.

PDF Markdown

Deep Transfer Learning of Pick Points on Fabric for Robot Bed-Making (1809.09810v3)

Summary

Related Papers