Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Hand Hygiene Assessment via Joint Step Segmentation and Key Action Scorer (2209.12221v4)

Published 25 Sep 2022 in cs.CV

Abstract: Hand hygiene is a standard six-step hand-washing action proposed by the World Health Organization (WHO). However, there is no good way to supervise medical staff to do hand hygiene, which brings the potential risk of disease spread. Existing action assessment works usually make an overall quality prediction on an entire video. However, the internal structures of hand hygiene action are important in hand hygiene assessment. Therefore, we propose a novel fine-grained learning framework to perform step segmentation and key action scorer in a joint manner for accurate hand hygiene assessment. Existing temporal segmentation methods usually employ multi-stage convolutional network to improve the segmentation robustness, but easily lead to over-segmentation due to the lack of the long-range dependence. To address this issue, we design a multi-stage convolution-transformer network for step segmentation. Based on the observation that each hand-washing step involves several key actions which determine the hand-washing quality, we design a set of key action scorers to evaluate the quality of key actions in each step. In addition, there lacks a unified dataset in hand hygiene assessment. Therefore, under the supervision of medical staff, we contribute a video dataset that contains 300 video sequences with fine-grained annotations. Extensive experiments on the dataset suggest that our method well assesses hand hygiene videos and achieves outstanding performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (40)
  1. L.-A. Zeng, F.-T. Hong, W.-S. Zheng, Q.-Z. Yu, W. Zeng, Y.-W. Wang, and J.-H. Lai, “Hybrid dynamic-static context-aware attention network for action assessment in long videos,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020.
  2. X. Yu, Y. Rao, W. Zhao, J. Lu, and J. Zhou, “Group-aware contrastive regression for action quality assessment,” in Proceedings of International Conference on Computer Vision, 2021.
  3. H. Pirsiavash, A. Torralba, and C. M. Vondrick, “Assessing the quality of actions,” in Proceedings of European Conference on Computer Vision, 2014.
  4. P. Parmar and B. Tran Morris, “Learning to score olympic events,” in Proceedings of Computer Vision and Pattern Recognition Workshops, 2017.
  5. Z. Li, Y. Huang, M. Cai, and Y. Sato, “Manipulation-skill assessment from videos with spatial attention network,” in Proceedings of International Conference on Computer Vision Workshops, 2019.
  6. C. Xu, Y. Fu, B. Zhang, Z. Chen, Y.-G. Jiang, and X. Xue, “Learning to score figure skating sport videos,” IEEE transactions on circuits and systems for video technology, vol. 30, no. 12, pp. 4578–4590, 2019.
  7. S.-J. Zhang, J.-H. Pan, J. Gao, and W.-S. Zheng, “Semi-supervised action quality assessment with self-supervised segment feature recovery,” IEEE Transactions on Circuits and Systems for Video Technology, 2022.
  8. J. Xu, Y. Rao, X. Yu, G. Chen, J. Zhou, and J. Lu, “Finediving: A fine-grained dataset for procedure-aware action quality assessment,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2949–2958.
  9. H. Jain, G. Harit, and A. Sharma, “Action quality assessment using siamese network-based deep metric learning,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 6, pp. 2260–2273, 2020.
  10. Y. A. Farha and J. Gall, “MS-TCN: multi-stage temporal convolutional network for action segmentation,” in Proceedings of Computer Vision and Pattern Recognition, 2019.
  11. S.-J. Li, Y. AbuFarha, Y. Liu, M.-M. Cheng, and J. Gall, “MS-TCN++: multi-stage temporal convolutional network for action segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020.
  12. M. Rohrbach, S. Amin, M. Andriluka, and B. Schiele, “A database for fine grained activity detection of cooking activities,” in Proceedings of Computer Vision and Pattern Recognition, 2012.
  13. Z. Wang, Z. Gao, L. Wang, Z. Li, and G. Wu, “Boundary-aware cascade networks for temporal action segmentation,” in Proceedings of European Conference on Computer Vision, 2020.
  14. S. Gao, Q. Han, Z. Li, P. Peng, L. Wang, and M. Cheng, “Global2local: Efficient structure search for video action segmentation,” in Proceedings of Conference on Computer Vision and Pattern Recognition, 2021.
  15. F. Yi, H. Wen, and T. Jiang, “Asformer: Transformer for action segmentation,” in Proceedings of British Machine Vision Conference, 2021.
  16. C. Zhong, A. R. Reibman, H. A. Mina, and A. J. Deering, “Designing a computer-vision application: A case study for hand-hygiene assessment in an open-room environment,” Journal of Imaging, vol. 7, no. 9, p. 170, 2021.
  17. D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3d convolutional networks,” in Proceedings of International Conference on Computer Vision, 2015.
  18. J.-H. Pan, J. Gao, and W.-S. Zheng, “Action assessment by joint relation graphs,” in Proceedings of International Conference on Computer Vision, 2019.
  19. J. Carreira and A. Zisserman, “Quo vadis, action recognition? a new model and the kinetics dataset,” in Proceedings of Computer Vision and Pattern Recognition, 2017.
  20. S. Zhao, Y. Liu, Y. Han, R. Hong, Q. Hu, and Q. Tian, “Pooling the convolutional layers in deep convnets for video action recognition,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 8, pp. 1839–1849, 2017.
  21. E. H. P. Alwando, Y.-T. Chen, and W.-H. Fang, “Cnn-based multiple path search for action tube detection in videos,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 1, pp. 104–116, 2018.
  22. A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, “Transformers are rnns: Fast autoregressive transformers with linear attention,” in International Conference on Machine Learning.   PMLR, 2020, pp. 5156–5165.
  23. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  24. A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, “Transformers are rnns: Fast autoregressive transformers with linear attention,” in Proceedings of International Conference on Machine Learning, 2020.
  25. D. Clevert, T. Unterthiner, and S. Hochreiter, “Fast and accurate deep network learning by exponential linear units (elus),” in Proceedings of International Conference on Learning Representations, 2016.
  26. X. Glorot, A. Bordes, and Y. Bengio, “Deep sparse rectifier neural networks,” in Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, 2011.
  27. H. Q. Vo, T. Do, V. C. Pham, D. Nguyen, A. T. Duong, and Q. D. Tran, “Fine-grained hand gesture recognition in multi-viewpoint hand hygiene,” in Proceedings of Systems, Man, and Cybernetics (SMC), 2021.
  28. M. Ivanovs, R. Kadikis, M. Lulla, A. Rutkovskis, and A. Elsts, “Automated quality assessment of hand washing using deep learning,” arXiv preprint arXiv:2011.11383, 2020.
  29. C. Lea, M. D. Flynn, R. Vidal, A. Reiter, and G. D. Hager, “Temporal convolutional networks for action segmentation and detection,” in Proceedings of Conference on Computer Vision and Pattern Recognition, 2017.
  30. S. Stein and S. J. McKenna, “Combining embedded accelerometers with computer vision for recognizing food preparation activities,” in Proceedings of the 2013 ACM international joint conference on Pervasive and ubiquitous computing, 2013.
  31. A. Fathi, X. Ren, and J. M. Rehg, “Learning to recognize objects in egocentric activities,” in Proceedings of Conference on Computer Vision and Pattern Recognition, 2011.
  32. H. Kuehne, A. Arslan, and T. Serre, “The language of actions: Recovering the syntax and semantics of goal-directed human activities,” in Proceedings of Conference on Computer Vision and Pattern Recognition, 2014.
  33. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of Conference on Computer Vision and Pattern Recognition, 2016.
  34. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  35. Y. Ishikawa, S. Kasai, Y. Aoki, and H. Kataoka, “Alleviating over-segmentation errors by detecting action boundaries,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 2322–2331.
  36. A. Richard and J. Gall, “Temporal action detection using a statistical language model,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 3131–3140.
  37. C. Lea, A. Reiter, R. Vidal, and G. D. Hager, “Segmental spatiotemporal cnns for fine-grained action segmentation,” in European Conference on Computer Vision.   Springer, 2016, pp. 36–52.
  38. B. Singh, T. K. Marks, M. Jones, O. Tuzel, and M. Shao, “A multi-stream bi-directional recurrent neural network for fine-grained action detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 1961–1970.
  39. P. Lei and S. Todorovic, “Temporal deformable residual networks for action segmentation in videos,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6742–6751.
  40. H. Gammulle, S. Denman, S. Sridharan, and C. Fookes, “Fine-grained action segmentation using the semi-supervised action gan,” Pattern Recognition, vol. 98, p. 107039, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.