Accelerated Event-Based Feature Detection and Compression for Surveillance Video Systems (2312.08213v2)
Abstract: The strong temporal consistency of surveillance video enables compelling compression performance with traditional methods, but downstream vision applications operate on decoded image frames with a high data rate. Since it is not straightforward for applications to extract information on temporal redundancy from the compressed video representations, we propose a novel system which conveys temporal redundancy within a sparse decompressed representation. We leverage a video representation framework called ADDER to transcode framed videos to sparse, asynchronous intensity samples. We introduce mechanisms for content adaptation, lossy compression, and asynchronous forms of classical vision algorithms. We evaluate our system on the VIRAT surveillance video dataset, and we show a median 43.7% speed improvement in FAST feature detection compared to OpenCV. We run the same algorithm as OpenCV, but only process pixels that receive new asynchronous events, rather than process every pixel in an image frame. Our work paves the way for upcoming neuromorphic sensors and is amenable to future applications with spiking neural networks.
- Time-Ordered Recent Event (TORE) Volumes for Event Cameras. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022), 1–1. https://doi.org/10.1109/TPAMI.2022.3172212
- Spike Timing-Based Unsupervised Learning of Orientation, Disparity, and Motion Representations in a Spiking Neural Network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 1377–1386.
- Event-Based Visual Flow. IEEE Transactions on Neural Networks and Learning Systems 25, 2 (2014), 407–417. https://doi.org/10.1109/TNNLS.2013.2273537
- G. Bradski. 2000. The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000).
- Asynchronous Convolutional Networks for Object Detection in Neuromorphic Cameras. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 1656–1665. https://doi.org/10.1109/CVPRW.2019.00209
- NeRV: Neural Representations for Videos. In Advances in Neural Information Processing Systems, A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan (Eds.). https://openreview.net/forum?id=BbikqBWZTGB
- Focal Sparse Convolutional Networks for 3D Object Detection. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 5418–5427. https://doi.org/10.1109/CVPR52688.2022.00535
- Video-based face recognition via joint sparse representation. In 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG). 1–8. https://doi.org/10.1109/FG.2013.6553787
- Spiking Cooperative Stereo-Matching at 2 ms Latency with Neuromorphic Hardware. In Biomimetic and Biohybrid Systems, Michael Mangan, Mark Cutkosky, Anna Mura, Paul F.M.J. Verschure, Tony Prescott, and Nathan Lepora (Eds.). Springer International Publishing, Cham, 119–137.
- Video Frame Interpolation: A Comprehensive Survey. ACM Trans. Multimedia Comput. Commun. Appl. 19, 2s, Article 78 (may 2023), 31 pages. https://doi.org/10.1145/3556544
- CV-C3D: Action Recognition on Compressed Videos with Convolutional 3D Networks. In 2019 32nd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI). 24–30. https://doi.org/10.1109/SIBGRAPI.2019.00012
- Image Reconstruction From Neuromorphic Event Cameras Using Laplacian-Prediction and Poisson Integration With Spiking and Artificial Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. 1333–1341.
- Kynan Eng. 2023. Kynan Eng at CVPR 2023 Workshop on Event-based Vision. Youtube. https://www.youtube.com/watch?v=tv-GqKg4Mak&ab_channel=RPGWorkshops
- FFmpeg Project. 2021. FFmpeg. https://ffmpeg.org/
- Andrew C. Freeman. 2023. The ADDER Framework: Tools for Event Video Representations. In Proceedings of the 14th Conference on ACM Multimedia Systems, MMSys 2023, Vancouver, BC, Canada, June 7-10, 2023. ACM, 343–347. https://doi.org/10.1145/3587819.3593028
- Motion Segmentation and Tracking for Integrating Event Cameras. In Proceedings of the 12th ACM Multimedia Systems Conference (Istanbul, Turkey) (MMSys ’21). Association for Computing Machinery, New York, NY, USA, 1–11. https://doi.org/10.1145/3458305.3463373
- Andrew C. Freeman and Ketan Mayer-Patel. 2020. Integrating Event Camera Sensor Emulator. In Proceedings of the 28th ACM International Conference on Multimedia (Seattle, WA, USA) (MM ’20). Association for Computing Machinery, New York, NY, USA, 4503–4505. https://doi.org/10.1145/3394171.3414394
- Andrew C. Freeman and Ketan Mayer-Patel. 2021. Lossy Compression for Integrating Event Cameras. In 2021 Data Compression Conference (DCC). 53–62. https://doi.org/10.1109/DCC50243.2021.00013
- An Asynchronous Intensity Representation for Framed and Event Video Sources. In Proceedings of the 14th ACM Multimedia Systems Conference (Vancouver, BC, Canada) (MMSys ’23). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3587819.3590969
- Event-based Vision: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020), 1–1. https://doi.org/10.1109/TPAMI.2020.3008413
- End-to-End Learning of Representations for Asynchronous Event-Based Data. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
- 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks. CVPR (2018).
- Alain Horé and Djemel Ziou. 2010. Image Quality Metrics: PSNR vs. SSIM. In 2010 20th International Conference on Pattern Recognition. 2366–2369. https://doi.org/10.1109/ICPR.2010.579
- Indexed Operations for Non-rectangular Lattices Applied to Convolutional Neural Networks. In Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2019) - Volume 5: VISAPP. INSTICC, SciTePress, 362–371. https://doi.org/10.5220/0007364303620371
- Low-Light Image and Video Enhancement Using Deep Learning: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 12 (2022), 9396–9416. https://doi.org/10.1109/TPAMI.2021.3126387
- Toward a practical perceptual video quality metric. The Netflix Tech Blog 6, 2 (2016).
- A 128 X 128 120db 30mw asynchronous vision sensor that responds to relative intensity change. In 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers. 2060–2069.
- Video Super-Resolution Based on Deep Learning: A Comprehensive Survey. Artif. Intell. Rev. 55, 8 (dec 2022), 5981–6035. https://doi.org/10.1007/s10462-022-10147-y
- Event-Based Vision Meets Deep Learning on Steering Prediction for Self-Driving Cars. 5419–5427. https://doi.org/10.1109/CVPR.2018.00568
- Event-based Asynchronous Sparse Convolutional Networks. European Conference on Computer Vision. (ECCV). http://rpg.ifi.uzh.ch/docs/ECCV20_Messikommer.pdf
- A large-scale benchmark dataset for event recognition in surveillance video. In CVPR 2011. 3153–3160. https://doi.org/10.1109/CVPR.2011.5995586
- Bringing a Blurry Frame Alive at High Frame-Rate With an Event Camera. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
- DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12487–12496. https://doi.org/10.1109/CVPR52688.2022.01217
- An overview of the basic principles of the Q-Coder adaptive binary arithmetic coder. IBM Journal of Research and Development 32, 6 (1988), 717–726. https://doi.org/10.1147/rd.326.0717
- Reza Rassool. 2017. VMAF reproducibility: Validating a perceptual practical video quality metric. In 2017 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). 1–2. https://doi.org/10.1109/BMSB.2017.7986143
- Real-time Visual-Inertial Odometry for Event Cameras using Keyframe-based Nonlinear Optimization. https://doi.org/10.5244/C.31.16
- E. Rosten and T. Drummond. 2005. Fusing points and lines for high performance tracking. In Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, Vol. 2. 1508–1515 Vol. 2. https://doi.org/10.1109/ICCV.2005.104
- Sourav Dey Roy and Mrinal Kanti Bhowmik. 2020. A Comprehensive Survey on Computer Vision Based Approaches for Moving Object Detection. In 2020 IEEE Region 10 Symposium (TENSYMP). 1531–1534. https://doi.org/10.1109/TENSYMP50017.2020.9230869
- E-CIR: Event-Enhanced Continuous Intensity Recovery. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 7793–7802. https://doi.org/10.1109/CVPR52688.2022.00765
- Overview of the High Efficiency Video Coding (HEVC) Standard. IEEE Transactions on Circuits and Systems for Video Technology 22, 12 (2012), 1649–1668. https://doi.org/10.1109/TCSVT.2012.2221191
- Event Enhanced High-Quality Image Recovery. In European Conference on Computer Vision. Springer.
- Compressed Vision for Efficient Video Understanding. In Computer Vision – ACCV 2022, Lei Wang, Juergen Gall, Tat-Jun Chin, Imari Sato, and Rama Chellappa (Eds.). Springer Nature Switzerland, Cham, 679–695.
- Compressed Video Action Recognition. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 6026–6035. https://doi.org/10.1109/CVPR.2018.00631
- Learning in the Frequency Domain. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 1737–1746. https://doi.org/10.1109/CVPR42600.2020.00181
- Task-Driven Video Compression for Humans and Machines: Framework Design and Optimization. IEEE Transactions on Multimedia (2022), 1–12. https://doi.org/10.1109/TMM.2022.3233245
- EV-FlowNet: Self-Supervised Optical Flow Estimation for Event-based Cameras. https://doi.org/10.15607/RSS.2018.XIV.062
- Machine-Learning-Based Method for Content-Adaptive Video Encoding. In 2021 Picture Coding Symposium (PCS). 1–5. https://doi.org/10.1109/PCS50896.2021.9477507