Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models (2403.08420v1)
Abstract: Industrial managements, including quality control, cost and safety optimization, etc., heavily rely on high quality industrial human action recognitions (IHARs) which were hard to be implemented in large-scale industrial scenes due to their high costs and poor real-time performance. In this paper, we proposed a large-scale foundation model(LSFM)-based IHAR method, wherein various LSFMs and lightweight methods were jointly used, for the first time, to fulfill low-cost dataset establishment and real-time IHARs. Comprehensive tests on in-situ large-scale industrial manufacturing lines elucidated that the proposed method realized great reduction on employment costs, superior real-time performance, and satisfactory accuracy and generalization capabilities, indicating its great potential as a backbone IHAR method, especially for large-scale industrial applications.
- Deep learning methods for object detection in smart manufacturing: A survey. Journal of Manufacturing Systems 64, 181–196.
- Worker safety in agriculture 4.0: A new approach for mapping operator’s vibration risk through machine learning activity recognition. Computers and Electronics in Agriculture 193, 106637.
- Deepaia: An automatic image annotation model based on generative adversarial networks and transfer learning. IEEE Access 10, 38437–38445.
- Automatic image annotation method based on a convolutional neural network with threshold optimization. Plos one 15, e0238956.
- A robust weakly supervised learning of deep conv-nets for surface defect inspection. Neural Computing and Applications 32, 11229–11244.
- Discrete event-driven model predictive control for real-time work-in-process optimization in serial production systems. Journal of Manufacturing Systems 55, 132–142.
- A simple recipe for competitive low-compute self supervised vision models. arXiv:2301.09451.
- ultralytics/yolov5: v7.0 - YOLOv5 SOTA Realtime Instance Segmentation. URL: https://doi.org/10.5281/zenodo.7347926, doi:10.5281/zenodo.7347926.
- Open-vocabulary object detection via vision and language knowledge distillation. arXiv preprint arXiv:2104.13921 .
- Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- Lora: Low-rank adaptation of large language models. arXiv:2106.09685.
- Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint arXiv:2301.12597 .
- Grounded language-image pre-training, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10965--10975.
- Grounding dino: Marrying dino with grounded pre-training for open-set object detection. arXiv:2303.05499.
- No frame left behind: Full video action recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14892--14901.
- Gcdt: A global context enhanced deep transition architecture for sequence labeling. arXiv preprint arXiv:1906.02437 .
- Learning transferable visual models from natural language supervision, in: International conference on machine learning, PMLR. pp. 8748--8763.
- Prompting large language models with answer heuristics for knowledge-based visual question answering, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14974--14983.
- Human action recognition from various data modalities: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence , 1--20URL: https://doi.org/10.1109%2Ftpami.2022.3183112, doi:10.1109/tpami.2022.3183112.
- Low-cost cnn for automatic violence recognition on embedded system. IEEE Access 10, 25190--25202.
- Yolo v3+ vgg16-based automatic operations monitoring and analysis in a manufacturing workshop under industry 4.0. Journal of Manufacturing Systems 63, 134--142.
- Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605 .
- Regionclip: Region-based language-image pretraining, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16793--16803.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.