WheelPose: Data Synthesis Techniques to Improve Pose Estimation Performance on Wheelchair Users (2404.17063v1)
Abstract: Existing pose estimation models perform poorly on wheelchair users due to a lack of representation in training data. We present a data synthesis pipeline to address this disparity in data collection and subsequently improve pose estimation performance for wheelchair users. Our configurable pipeline generates synthetic data of wheelchair users using motion capture data and motion generation outputs simulated in the Unity game engine. We validated our pipeline by conducting a human evaluation, investigating perceived realism, diversity, and an AI performance evaluation on a set of synthetic datasets from our pipeline that synthesized different backgrounds, models, and postures. We found our generated datasets were perceived as realistic by human evaluators, had more diversity than existing image datasets, and had improved person detection and pose estimation performance when fine-tuned on existing pose estimation models. Through this work, we hope to create a foothold for future efforts in tackling the inclusiveness of AI in a data-centric and human-centric manner with the data synthesis techniques demonstrated in this work. Finally, for future works to extend upon, we open source all code in this research and provide a fully configurable Unity Environment used to generate our datasets. In the case of any models we are unable to share due to redistribution and licensing policies, we provide detailed instructions on how to source and replace said models.
- 2019. The Kinesthetic Index: Video Games and the Body of Motion Capture – InVisible Culture. https://ivc.lib.rochester.edu/the-kinesthetic-index-video-games-and-the-body-of-motion-capture/.
- 2022. Waypoint - The Official Waymo Blog: Utilizing Key Point and Pose Estimation for the Task of Autonomous Driving. https://waymo.com/blog/2022/02/utilizing-key-point-and-pose-estimation.html.
- 2023. DeepMotion - AI Motion Capture & Body Tracking. https://www.deepmotion.com/.
- 2023. SyntheticHumans Package (Unity Computer Vision). Unity Technologies.
- Vid2Doppler: Synthesizing Doppler Radar Data from Videos for Training Privacy-Preserving Activity Recognition. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (¡conf-loc¿, ¡city¿Yokohama¡/city¿, ¡country¿Japan¡/country¿, ¡/conf-loc¿) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 292, 10 pages. https://doi.org/10.1145/3411764.3445138
- Synthetic Image Data for Deep Learning. arXiv:2212.06232 [cs.CV]
- 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- Make-An-Animation: Large-Scale Text-conditional 3D Human Motion Generation. arXiv:2305.09662 [cs.CV]
- Domain Adaptation through Synthesis for Unsupervised Person Re-identification. arXiv:1804.10094 [cs.CV]
- Roger Bartlett. 2014. Introduction to Sports Biomechanics: Analysing Human Movement Patterns. Routledge.
- Perception of Realism and Acquisition of Clinical Skills in Simulated Pediatric Dentistry Scenarios. International Journal of Environmental Research and Public Health 19, 18 (2022), 11387.
- BlazePose: On-device Real-time Body Pose tracking. arXiv:2006.10204 [cs.CV]
- HSPACE: Synthetic Parametric Humans Animated in Complex Environments. arXiv:2112.12867 [cs.CV]
- Cynthia L Bennett and Daniela K Rosner. 2019. The promise of empathy: Design, disability, and knowing the” other”. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–13.
- Unity Perception: Generate Synthetic Data for Computer Vision. arXiv:2107.04259 [cs.CV]
- Robert Bridson. 2007. Fast Poisson Disk Sampling in Arbitrary Dimensions. In ACM SIGGRAPH 2007 Sketches (San Diego, California) (SIGGRAPH ’07). Association for Computing Machinery, New York, NY, USA, 22–es. https://doi.org/10.1145/1278780.1278807
- Teaching RF to Sense without RF Training Measurements. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 4, Article 120 (dec 2020), 22 pages. https://doi.org/10.1145/3432224
- Playing for 3D Human Recovery. arXiv:2110.07588 [cs.CV]
- Long-term Human Motion Prediction with Scene Context. arXiv:2007.03672 [cs.CV]
- OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. arXiv:1812.08008 [cs.CV]
- Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. arXiv:1611.08050 [cs.CV]
- SpokeSense: Developing a Real-Time Sensing Platform for Wheelchair Sports. SIGACCESS Access. Comput. 124, Article 2 (mar 2020), 1 pages. https://doi.org/10.1145/3386308.3386310
- Shuhong Chen and Matthias Zwicker. 2021. Transfer Learning for Pose Estimation of Illustrated Characters. arXiv:2108.01819 [cs.CV]
- ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR.2009.5206848
- BlenderProc. arXiv:1911.01911 [cs.CV]
- Pedestrian Detection: An Evaluation of the State of the Art. IEEE transactions on pattern analysis and machine intelligence 34 (07 2011), 743–61. https://doi.org/10.1109/TPAMI.2011.155
- PSP-HDRI$+$: A Synthetic Dataset Generator for Pre-Training of Human-Centric Computer Vision Models. arXiv:2207.05025 [cs]
- PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer Vision. arXiv:2112.09290 [cs]
- Practical synthetic data generation: balancing privacy and the broad availability of data. O’Reilly Media.
- The Pascal Visual Object Classes (VOC) Challenge. Int. J. Comput. Vision 88, 2 (jun 2010), 303–338. https://doi.org/10.1007/s11263-009-0275-4
- Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World. arXiv:1803.08319 [cs.CV]
- Age-Related Change in Mobility: Perspectives From Life Course Epidemiology and Geroscience. The Journals of Gerontology Series A: Biological Sciences and Medical Sciences 71, 9 (Sept. 2016), 1184–1194. https://doi.org/10.1093/gerona/glw043
- Mobility in Older Community-Dwelling Persons: A Narrative Review. Frontiers in Physiology 11 (2020).
- Avatar and sense of embodiment: Studying the relative preference between appearance, control and point of view. IEEE transactions on visualization and computer graphics 26, 5 (2020), 2062–2072.
- Virtual worlds as proxy for multi-object tracking analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition. 4340–4349.
- Middle-Aged and Mobility-Limited: Prevalence of Disability and Symptom Attributions in a National Survey. Journal of General Internal Medicine 21, 10 (Oct. 2006), 1091–1096. https://doi.org/10.1111/j.1525-1497.2006.00564.x
- Diversity in Machine Learning. IEEE Access 7 (2019), 64323–64350. https://doi.org/10.1109/access.2019.2917620
- Toward fairness in AI for people with disabilities SBG@ a research roadmap. ACM SIGACCESS Accessibility and Computing 125 (2020), 1–1.
- Generating Diverse and Natural 3D Human Motions From Text. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5152–5161.
- Generating Diverse and Natural 3D Human Motions From Text. (2022).
- Mask R-CNN. arXiv:1703.06870 [cs.CV]
- Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90
- Is synthetic data from generative models ready for image recognition? arXiv:2210.07574 [cs.CV]
- International Population Reports. (2016).
- BlendTorch: A Real-Time, Adaptive Domain Randomization Library. arXiv:2010.11696 [cs.CV]
- Synthetic data for social good. arXiv preprint arXiv:1710.08874 (2017).
- FingerTrak: Continuous 3D hand pose tracking by deep learning hand silhouettes captured by miniature thermal cameras on wrist. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 4, 2 (2020), 1–24.
- SAIL-VOS: Semantic Amodal Instance Level Video Object Segmentation – A Synthetic Dataset and Baselines. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3100–3110. https://doi.org/10.1109/CVPR.2019.00322
- SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and 3D Mesh Reconstruction from Video Data. arXiv:2105.08612 [cs.CV]
- Development of an upper extremity outcome measure: the DASH (disabilities of the arm, shoulder, and head). American journal of industrial medicine 29, 6 (1996), 602–608.
- On the Effectiveness of Virtual IMU Data for Eating Detection with Wrist Sensors. In Adjunct Proceedings of the 2022 ACM International Joint Conference on Pervasive and Ubiquitous Computing and the 2022 ACM International Symposium on Wearable Computers (Cambridge, United Kingdom) (UbiComp/ISWC ’22 Adjunct). Association for Computing Machinery, New York, NY, USA, 50–52. https://doi.org/10.1145/3544793.3560337
- Data science for the public good. (2019).
- Reuben Kirkham and Benjamin Tannert. 2021. Using Computer Simulations to Investigate the Potential Performance of ’A to B’ Routing Systems for People with Mobility Impairments. arXiv:2107.01570 [cs.HC]
- AI2-THOR: An Interactive 3D Environment for Visual AI. arXiv:1712.05474 [cs.CV]
- Jesse Leaman and Hung M. La. 2017. A Comprehensive Review of Smart Wheelchairs: Past, Present and Future. arXiv:1704.04697 [cs.RO]
- Synthesizing Stroke Gestures Across User Populations: A Case for Users with Visual Impairments. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 4182–4193. https://doi.org/10.1145/3025453.3025906
- SMPLy Benchmarking 3D Human Pose Estimation in the Wild. arXiv:2012.02743 [cs.CV]
- Privacy-Preserving Portrait Matting. arXiv:2104.14222 [cs.CV]
- Bridging composite and real: towards end-to-end deep image matting. International Journal of Computer Vision 130, 2 (2022), 246–266.
- Deep Automatic Natural Image Matting. arXiv:2107.07235 [cs.CV]
- OpenRooms: An End-to-End Open Framework for Photorealistic Indoor Scene Datasets. arXiv:2007.12868 [cs.CV]
- GPU-Accelerated Robotic Simulation for Distributed Reinforcement Learning. arXiv:1810.05762 [cs.RO]
- Feature Pyramid Networks for Object Detection. arXiv:1612.03144 [cs.CV]
- Microsoft COCO: Common Objects in Context. arXiv:1405.0312 [cs.CV]
- MediaPipe: A Framework for Building Perception Pipelines. arXiv:1906.08172 [cs.DC]
- Scene-Aware 3D Multi-Human Motion Capture from a Single Camera. arXiv:2301.05175 [cs.CV]
- Posture Detection Based on Smart Cushion for Wheelchair Users. Sensors (Basel, Switzerland) 17, 4 (March 2017), 719. https://doi.org/10.3390/s17040719
- When and how CNNs generalize to out-of-distribution category-viewpoint combinations. arXiv:2007.08032 [cs.CV]
- Recovering Accurate 3D Human Pose in the Wild Using IMUs and a Moving Camera: 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part X. 614–631. https://doi.org/10.1007/978-3-030-01249-6_37
- NViSII: A Scriptable Tool for Photorealistic Image Generation. arXiv:2105.13962 [cs.CV]
- Real-Time Sign Language Detection using Human Pose Estimation. arXiv:2008.04637 [cs.CV]
- Estimating Ground Reaction Forces from Two-Dimensional Pose Data: A Biomechanics-Based Comparison of AlphaPose, BlazePose, and OpenPose. Sensors (Basel, Switzerland) 23, 1 (Dec. 2022), 78. https://doi.org/10.3390/s23010078
- Human Movement Datasets: An Interdisciplinary Scoping Review. Comput. Surveys 55 (05 2022). https://doi.org/10.1145/3534970
- Designing an online infrastructure for collecting AI data from people with disabilities. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. 52–63.
- MVDet: multi-view multi-class object detection without ground plane assumption. Pattern Analysis and Applications 26 (06 2023). https://doi.org/10.1007/s10044-023-01168-6
- AGORA: Avatars in Geography Optimized for Regression Analysis. arXiv:2104.14643 [cs.CV]
- Articulated people detection and pose estimation: Reshaping the future. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 3178–3185.
- 3DPeople: Modeling the Geometry of Dressed Humans. In International Conference in Computer Vision (ICCV).
- Miroslav Purkrábek and Jiří Matas. 2023. Improving 2D Human Pose Estimation across Unseen Camera Views with Synthetic Data. arXiv:2307.06737 [cs.CV]
- Let There Be IMU Data: Generating Training Data for Wearable, Motion Sensor Based Activity Recognition from Monocular RGB Videos. In Adjunct Proceedings of the 2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2019 ACM International Symposium on Wearable Computers (London, United Kingdom) (UbiComp/ISWC ’19 Adjunct). Association for Computing Machinery, New York, NY, USA, 699–708. https://doi.org/10.1145/3341162.3345590
- Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding. arXiv:2011.02523 [cs.CV]
- The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3234–3243. https://doi.org/10.1109/CVPR.2016.352
- Ben Sapp and Ben Taskar. 2013. MODEC: Multimodal Decomposable Models for Human Pose Estimation. Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 3674–3681. https://doi.org/10.1109/CVPR.2013.471
- Correlation coefficients: appropriate use and interpretation. Anesthesia & analgesia 126, 5 (2018), 1763–1768.
- Optimising Filtering Parameters for a 3D Motion Analysis System. Journal of Electromyography and Kinesiology 25, 5 (Oct. 2015), 808–814. https://doi.org/10.1016/j.jelekin.2015.06.004
- iGibson 1.0: a Simulation Environment for Interactive Tasks in Large Realistic Scenes. arXiv:2012.02924 [cs.AI]
- General and specific utility measures for synthetic data. Journal of the Royal Statistical Society Series A: Statistics in Society 181, 3 (2018), 663–688.
- Deep High-Resolution Representation Learning for Human Pose Estimation. arXiv:1902.09212 [cs.CV]
- NICOLÁS SALAZAR SUTIL. 2015. Motion and Representation: The Language of Human Movement. The MIT Press. http://www.jstor.org/stable/j.ctt17kk8zx
- Human motion diffusion model. arXiv preprint arXiv:2209.14916 (2022).
- StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners. arXiv:2306.00984 [cs.CV]
- Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World. arXiv:1703.06907 [cs.RO]
- MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. 5026–5033. https://doi.org/10.1109/IROS.2012.6386109
- Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization. arXiv:1804.06516 [cs.CV]
- Shari Trewin. 2018. AI fairness for people with disabilities: Point of view. arXiv preprint arXiv:1811.10670 (2018).
- Rigas Tzikas. 2022. How realistic is my synthetic data? A qualitative approach. Master’s thesis.
- Unity Technologies. 2022. Unity SynthHomes: A Synthetic Home Interior Dataset Generator. https://github.com/Unity-Technologies/SynthHomes.
- A cost effective eye movement tracker based wheel chair control algorithm for people with paraplegia. arXiv:2207.10511 [cs.HC]
- Svetozar Zarko Valtchev and Jianhong Wu. 2021. Domain Randomization for Neural Network Classification. Journal of Big Data 8, 1 (July 2021), 94. https://doi.org/10.1186/s40537-021-00455-5
- Learning from Synthetic Humans. In CVPR.
- Learning from synthetic humans. In Proceedings of the IEEE conference on computer vision and pattern recognition. 109–117.
- Deep Detection of People and their Mobility Aids for a Hospital Robot. arXiv:1708.00674 [cs.RO]
- Demographics of wheelchair users in France: Results of National community-based handicaps-incapacités-dépendance surveys. Journal of rehabilitation medicine : official journal of the UEMS European Board of Physical and Rehabilitation Medicine 40 (04 2008), 231–9. https://doi.org/10.2340/16501977-0159
- YouTube UGC Dataset for Video Compression Research. In 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP). 1–5. https://doi.org/10.1109/MMSP.2019.8901772
- Diffusion-HPC: Generating Synthetic Images with Realistic Humans. arXiv preprint arXiv:2303.09541 (2023).
- Disability, bias, and AI. AI Now Institute 8 (2019).
- Magnus Wrenninge and Jonas Unger. 2018. Synscapes: A Photorealistic Synthetic Dataset for Street Scene Parsing. arXiv:1810.08705 [cs.CV]
- Detectron2. https://github.com/facebookresearch/detectron2.
- Physics-based Human Motion Estimation and Synthesis from Videos. arXiv:2109.09913 [cs.CV]
- ViTPose++: Vision Transformer Foundation Model for Generic Body Pose Estimation. arXiv:2212.04246 [cs.CV]
- SynBody: Synthetic Dataset with Layered Human Models for 3D Human Perception and Modeling. arXiv:2303.17368 [cs.CV]
- T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations. arXiv:2301.06052 [cs.CV]
- MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model. arXiv:2208.15001 [cs.CV]
- Deep Learning-Based Human Pose Estimation: A Survey. arXiv:2012.13392 [cs.CV]
- Deep Learning-Based 2D Keypoint Detection in Alpine Ski Racing – A Performance Analysis of State-of-the-Art Algorithms Applied to Regular Skiing and Injury Situations. JSAMS Plus 2 (2023), 100034. https://doi.org/10.1016/j.jsampl.2023.100034
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.