Papers
Topics
Authors
Recent
2000 character limit reached

Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models (2403.08420v1)

Published 13 Mar 2024 in cs.CV

Abstract: Industrial managements, including quality control, cost and safety optimization, etc., heavily rely on high quality industrial human action recognitions (IHARs) which were hard to be implemented in large-scale industrial scenes due to their high costs and poor real-time performance. In this paper, we proposed a large-scale foundation model(LSFM)-based IHAR method, wherein various LSFMs and lightweight methods were jointly used, for the first time, to fulfill low-cost dataset establishment and real-time IHARs. Comprehensive tests on in-situ large-scale industrial manufacturing lines elucidated that the proposed method realized great reduction on employment costs, superior real-time performance, and satisfactory accuracy and generalization capabilities, indicating its great potential as a backbone IHAR method, especially for large-scale industrial applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Deep learning methods for object detection in smart manufacturing: A survey. Journal of Manufacturing Systems 64, 181–196.
  2. Worker safety in agriculture 4.0: A new approach for mapping operator’s vibration risk through machine learning activity recognition. Computers and Electronics in Agriculture 193, 106637.
  3. Deepaia: An automatic image annotation model based on generative adversarial networks and transfer learning. IEEE Access 10, 38437–38445.
  4. Automatic image annotation method based on a convolutional neural network with threshold optimization. Plos one 15, e0238956.
  5. A robust weakly supervised learning of deep conv-nets for surface defect inspection. Neural Computing and Applications 32, 11229–11244.
  6. Discrete event-driven model predictive control for real-time work-in-process optimization in serial production systems. Journal of Manufacturing Systems 55, 132–142.
  7. A simple recipe for competitive low-compute self supervised vision models. arXiv:2301.09451.
  8. ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation. URL: https://doi.org/10.5281/zenodo.7347926, doi:10.5281/zenodo.7347926.
  9. Open-vocabulary object detection via vision and language knowledge distillation. arXiv preprint arXiv:2104.13921 .
  10. Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
  11. Lora: Low-rank adaptation of large language models. arXiv:2106.09685.
  12. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 .
  13. Grounded language-image pre-training, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10965--10975.
  14. Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv:2303.05499.
  15. No frame left behind: Full video action recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14892--14901.
  16. Gcdt: A global context enhanced deep transition architecture for sequence labeling. arXiv preprint arXiv:1906.02437 .
  17. Learning transferable visual models from natural language supervision, in: International conference on machine learning, PMLR. pp. 8748--8763.
  18. Prompting large language models with answer heuristics for knowledge-based visual question answering, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14974--14983.
  19. Human action recognition from various data modalities: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence , 1--20URL: https://doi.org/10.1109%2Ftpami.2022.3183112, doi:10.1109/tpami.2022.3183112.
  20. Low-cost cnn for automatic violence recognition on embedded system. IEEE Access 10, 25190--25202.
  21. Yolo v3+ vgg16-based automatic operations monitoring and analysis in a manufacturing workshop under industry 4.0. Journal of Manufacturing Systems 63, 134--142.
  22. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605 .
  23. Regionclip: Region-based language-image pretraining, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16793--16803.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.