Papers
Topics
Authors
Recent
Search
2000 character limit reached

Leveraging Foundation Model Automatic Data Augmentation Strategies and Skeletal Points for Hands Action Recognition in Industrial Assembly Lines

Published 14 Mar 2024 in cs.CV | (2403.09056v1)

Abstract: On modern industrial assembly lines, many intelligent algorithms have been developed to replace or supervise workers. However, we found that there were bottlenecks in both training datasets and real-time performance when deploying algorithms on actual assembly line. Therefore, we developed a promising strategy for expanding industrial datasets, which utilized large models with strong generalization abilities to achieve efficient, high-quality, and large-scale dataset expansion, solving the problem of insufficient and low-quality industrial datasets. We also applied this strategy to video action recognition. We proposed a method of converting hand action recognition problems into hand skeletal trajectory classification problems, which solved the real-time performance problem of industrial algorithms. In the "hand movements during wire insertion" scenarios on the actual assembly line, the accuracy of hand action recognition reached 98.8\%. We conducted detailed experimental analysis to demonstrate the effectiveness and superiority of the method, and deployed the entire process on Midea's actual assembly line.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (25)
  1. Challenges of the creation of a dataset for vision based human hand action recognition in industrial assembly. In Proceedings of the Science and Information Conference. Springer, 2023, pp. 1079–1098.
  2. Self-supervised Representation Learning for Fine Grained Human Hand Action Recognition in Industrial Assembly Lines. In Proceedings of the International Symposium on Visual Computing. Springer, 2023, pp. 172–184.
  3. Algorithms at work: The new contested terrain of control. Academy of Management Annals 2020, 14, 366–410.
  4. VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON. arXiv preprint arXiv:2306.07890 2023.
  5. CLOI-NET: Class segmentation of industrial facilities’ point cloud datasets. Advanced Engineering Informatics 2020, 45, 101121.
  6. Real-time industrial process fault diagnosis based on time delayed mutual information analysis. Processes 2021, 9, 1027.
  7. Deep learning based real-time Industrial framework for rotten and fresh fruit detection using semantic segmentation. Microsystem Technologies 2021, 27, 3365–3375.
  8. Low-cost CNN for automatic violence recognition on embedded system. IEEE Access 2022, 10, 25190–25202.
  9. Automatic image annotation method based on a convolutional neural network with threshold optimization. Plos one 2020, 15, e0238956.
  10. DeepAIA: An Automatic Image Annotation Model based on Generative Adversarial Networks and Transfer Learning. IEEE Access 2022, 10, 38437–38445.
  11. Grounded language-image pre-training. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10965–10975.
  12. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv preprint arXiv:2303.05499 2023.
  13. Simple open-vocabulary object detection with vision transformers. arXiv 2022. arXiv preprint arXiv:2205.06230.
  14. Revisiting skeleton-based action recognition. In Proceedings of the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2969–2978.
  15. Disentangling and unifying graph convolutions for skeleton-based action recognition. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 143–152.
  16. Decoupling gcn with dropgraph module for skeleton-based action recognition. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXIV 16. Springer, 2020, pp. 536–553.
  17. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the Proceedings of the AAAI conference on artificial intelligence, 2018, Vol. 32.
  18. Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Transactions on Circuits and Systems for Video Technology 2020, 31, 1915–1925.
  19. Spatio-temporal graph routing for skeleton-based action recognition. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, 2019, Vol. 33, pp. 8561–8568.
  20. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In Proceedings of the Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12026–12035.
  21. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 2023.
  22. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 2020.
  23. You only look once: Unified, real-time object detection. In Proceedings of the Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.
  24. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Advances in neural information processing systems 2015, 28.
  25. Cotracker: It is better to track together. arXiv preprint arXiv:2307.07635 2023.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.