Leveraging Foundation Model Automatic Data Augmentation Strategies and Skeletal Points for Hands Action Recognition in Industrial Assembly Lines
Abstract: On modern industrial assembly lines, many intelligent algorithms have been developed to replace or supervise workers. However, we found that there were bottlenecks in both training datasets and real-time performance when deploying algorithms on actual assembly line. Therefore, we developed a promising strategy for expanding industrial datasets, which utilized large models with strong generalization abilities to achieve efficient, high-quality, and large-scale dataset expansion, solving the problem of insufficient and low-quality industrial datasets. We also applied this strategy to video action recognition. We proposed a method of converting hand action recognition problems into hand skeletal trajectory classification problems, which solved the real-time performance problem of industrial algorithms. In the "hand movements during wire insertion" scenarios on the actual assembly line, the accuracy of hand action recognition reached 98.8\%. We conducted detailed experimental analysis to demonstrate the effectiveness and superiority of the method, and deployed the entire process on Midea's actual assembly line.
- Challenges of the creation of a dataset for vision based human hand action recognition in industrial assembly. In Proceedings of the Science and Information Conference. Springer, 2023, pp. 1079–1098.
- Self-supervised Representation Learning for Fine Grained Human Hand Action Recognition in Industrial Assembly Lines. In Proceedings of the International Symposium on Visual Computing. Springer, 2023, pp. 172–184.
- Algorithms at work: The new contested terrain of control. Academy of Management Annals 2020, 14, 366–410.
- VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON. arXiv preprint arXiv:2306.07890 2023.
- CLOI-NET: Class segmentation of industrial facilities’ point cloud datasets. Advanced Engineering Informatics 2020, 45, 101121.
- Real-time industrial process fault diagnosis based on time delayed mutual information analysis. Processes 2021, 9, 1027.
- Deep learning based real-time Industrial framework for rotten and fresh fruit detection using semantic segmentation. Microsystem Technologies 2021, 27, 3365–3375.
- Low-cost CNN for automatic violence recognition on embedded system. IEEE Access 2022, 10, 25190–25202.
- Automatic image annotation method based on a convolutional neural network with threshold optimization. Plos one 2020, 15, e0238956.
- DeepAIA: An Automatic Image Annotation Model based on Generative Adversarial Networks and Transfer Learning. IEEE Access 2022, 10, 38437–38445.
- Grounded language-image pre-training. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10965–10975.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499 2023.
- Simple open-vocabulary object detection with vision transformers. arXiv 2022. arXiv preprint arXiv:2205.06230.
- Revisiting skeleton-based action recognition. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2969–2978.
- Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 143–152.
- Decoupling gcn with dropgraph module for skeleton-based action recognition. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16. Springer, 2020, pp. 536–553.
- Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the Proceedings of the AAAI conference on artificial intelligence, 2018, Vol. 32.
- Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Transactions on Circuits and Systems for Video Technology 2020, 31, 1915–1925.
- Spatio-temporal graph routing for skeleton-based action recognition. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2019, Vol. 33, pp. 8561–8568.
- Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12026–12035.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 2023.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 2020.
- You only look once: Unified, real-time object detection. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
- Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Advances in neural information processing systems 2015, 28.
- Cotracker: It is better to track together. arXiv preprint arXiv:2307.07635 2023.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.