Self-supervised Multi-task Learning Framework for Safety and Health-Oriented Connected Driving Environment Perception using Onboard Camera (2306.11822v1)
Abstract: Cutting-edge connected vehicle (CV) technologies have drawn much attention in recent years. The real-time traffic data captured by a CV can be shared with other CVs and data centers so as to open new possibilities for solving diverse transportation problems. However, imagery captured by onboard cameras in a connected environment, are not sufficiently investigated, especially for safety and health-oriented visual perception. In this paper, a bidirectional process of image synthesis and decomposition (BPISD) approach is proposed, and thus a novel self-supervised multi-task learning framework, to simultaneously estimate depth map, atmospheric visibility, airlight, and PM2.5 mass concentration, in which depth map and visibility are considered highly associated with traffic safety, while airlight and PM2.5 mass concentration are directly correlated with human health. Both the training and testing phases of the proposed system solely require a single image as input. Due to the innovative training pipeline, the depth estimation network can manage various levels of visibility conditions and overcome inherent problems in current image-synthesis-based depth estimation, thereby generating high-quality depth maps even in low-visibility situations and further benefiting accurate estimations of visibility, airlight, and PM2.5 mass concentration. Extensive experiments on the synthesized data from the KITTI and real-world data collected in Beijing demonstrate that the proposed method can (1) achieve performance competitive in depth estimation as compared with state-of-the-art methods when taking clear images as input; (2) predict vivid depth map for images contaminated by various levels of haze; and (3) accurately estimate visibility, airlight, and PM2.5 mass concentrations. Beneficial applications can be developed based on the presented work to improve traffic safety, air quality, and public health.
- Agency, U. E. P. (2016). Health and environmental effects of particulate matter (pm). US Environmental Protection Agency, .
- Ganvo: Unsupervised deep monocular visual odometry and depth estimation with generative adversarial networks. In Proc. IEEE Int. Conf. Rob. Autom. (pp. 5474–5480).
- Blind dehazing using internal patch recurrence. In 2016 IEEE International Conference on Computational Photography (ICCP) (pp. 1–9). IEEE.
- Coupled depth learning. In IEEE Winter Conf. Appl. Comput. Vis. (WACV) (pp. 1–10).
- Non-local image dehazing. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1674–1682).
- Unsupervised scale-consistent depth and ego-motion learning from monocular video. In Adv. Neur. In. (NeurIPS) (pp. 35–45).
- Dehazenet: An end-to-end system for single image haze removal. IEEE Transactions on Image Processing, 25, 5187–5198.
- Impacts of aerosol compositions on visibility impairment in xi’an, china. Atmospheric Environment, 59, 559–566.
- Estimating depth from monocular images as classification using deep fully convolutional residual networks. IEEE Trans. Circuits Syst. Video Technol., 28, 3174–3182.
- Depth prediction without the sensors: Leveraging structure for unsupervised learning from monocular videos. In Proceedings of the AAAI Conference on Artificial Intelligence (pp. 8001–8008). volume 33.
- A neural network approach to visibility range estimation under foggy weather conditions. Procedia computer science, 113, 466–471.
- Air pollution in mega cities in china. Atmospheric environment, 42, 1–42.
- Chelani, A. B. (2019). Estimating pm2. 5 concentration from satellite derived aerosol optical depth and meteorological variables using a combination model. Atmospheric Pollution Research, 10, 847–857.
- Aerosol hygroscopic growth, contributing factors, and impact on haze events in a severely polluted region in northern china. Atmospheric Chemistry and Physics, 19, 1327–1342.
- Towards scene understanding: Unsupervised monocular depth estimation with semantic-aware representation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 2624–2632).
- Single-image depth perception in the wild. In Adv. Neur. In. (NeurIPS) (pp. 730–738).
- Expressway visibility estimation based on image entropy and piecewise stationary time series analysis. CoRR, abs/1804.04601. URL: http://arxiv.org/abs/1804.04601. arXiv:1804.04601.
- Depth analogy: Data-driven approach for single image depth estimation using gradient samples. IEEE Trans. Image Process., 24, 5953--5966.
- Depthnet: A recurrent neural network architecture for monocular depth prediction. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 283--291).
- Forecasting low-visibility procedure states with tree-based statistical methods. Pure and Applied Geophysics, 176, 2631--2644.
- Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In Proc. IEEE Int. Conf. Comput. Vision (pp. 2650--2658).
- Depth map prediction from a single image using a multi-scale deep network. In Adv. Neur. In. (NeurIPS) (pp. 2366--2374).
- Cycle-dehaze: Enhanced cyclegan for single image dehazing. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 825--833).
- Fattal, R. (2008). Single image dehazing. ACM transactions on graphics (TOG), 27, 1--9.
- Fattal, R. (2014). Dehazing using color-lines. ACM transactions on graphics (TOG), 34, 1--14.
- Sganvo: Unsupervised deep visual odometry and depth estimation with stacked generative adversarial networks. IEEE Robot. Autom. Let., 4, 4431--4437.
- Deep ordinal regression network for monocular depth estimation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 2002--2011).
- Depth estimation using structured light flow--analysis of projected pattern flow on an object’s surface. In Proc. IEEE Int. Conf. Comput. Vision (pp. 4640--4648).
- Unsupervised cnn for single view depth estimation: Geometry to the rescue. In Lect. Notes Comput. Sci. (pp. 740--756). Springer.
- An improved air-light estimation scheme for single haze images using color constancy prior. IEEE Signal Processing Letters, 27, 1695--1699.
- Vision meets robotics: The kitti dataset. Int. J. Rob. Res., 32, 1231--1237.
- Unsupervised monocular depth estimation with left-right consistency. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 270--279).
- Digging into self-supervised monocular depth estimation. In Proc. IEEE Int. Conf. Comput. Vision (pp. 3828--3838).
- Depth estimation from single monocular images using deep hybrid network. Multimed. Tools Appl., 76, 18585--18604.
- 3d packing for self-supervised monocular depth estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 2485--2494).
- Single image haze removal using dark channel prior. IEEE transactions on pattern analysis and machine intelligence, 33, 2341--2353.
- Deep residual learning for image recognition. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 770--778).
- Amplified transboundary transport of haze by aerosol--boundary layer interaction in china. Nature Geoscience, 13, 428--434.
- Estimation of pm 2.5 mass concentration from visibility. Advances in Atmospheric Sciences, 37, 671--678.
- Self-supervised 3d reconstruction and ego-motion estimation via on-board monocular video. IEEE Transactions on Intelligent Transportation Systems, (pp. 1--13). doi:10.1109/TITS.2021.3071428.
- Novel hybrid neural network for dense depth estimation using on-board monocular images. Transportation research record, 2674, 312--323.
- Self-supervised depth estimation leveraging global perception and geometric smoothness. IEEE Transactions on Intelligent Transportation Systems, .
- Uncertainty estimation of connected vehicle penetration rate. Transportation Science, .
- Joint learning of frequency and spatial domains for dense image prediction. ISPRS Journal of Photogrammetry and Remote Sensing, 195, 14--28.
- Ambient air pollution, climate change, and population health in china. Environment international, 42, 10--19.
- Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, .
- Self-supervised monocular depth estimation: Solving the dynamic object problem by semantic guidance. In European Conference on Computer Vision (pp. 582--600). Springer.
- Semi-supervised deep learning for monocular depth map prediction. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 6647--6655).
- Deeper depth prediction with fully convolutional residual networks. In Proc. - Int. Conf. 3D Vis. (3DV) (pp. 239--248). IEEE.
- Depth and surface normal estimation from monocular images using regression on deep features and hierarchical crfs. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 1119--1127).
- A two-streamed network for estimating fine-scaled depth maps from single rgb images. In Proc. IEEE Int. Conf. Comput. Vision (pp. 3372--3380).
- Deep convolutional neural fields for depth estimation from a single image. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 5162--5170).
- Learning depth from single monocular images using deep convolutional neural fields. IEEE Trans. Pattern Anal. Mach. Intell., 38, 2024--2039.
- Estimates of aerosol species scattering characteristics as a function of relative humidity. Atmospheric Environment, 35, 2845--2860.
- Toward domain independence for learning-based monocular depth estimation. IEEE Robot. Autom. Let., 2, 1778--1785.
- Joint semantic segmentation and depth estimation with deep convolutional networks. In Proc. - Int. Conf. 3D Vis. (3DV) (pp. 611--619). IEEE.
- Dhcnn for visibility estimation in foggy weather conditions. In 2018 Joint 10th International Conference on Soft Computing and Intelligent Systems (SCIS) and 19th International Symposium on Advanced Intelligent Systems (ISIS) (pp. 240--243). IEEE.
- A european aerosol phenomenology--6: scattering properties of atmospheric aerosol particles from 28 actris sites. Atmospheric Chemistry and Physics, 18, 7877--7911.
- Instance segmentation of fallen trees in aerial color infrared imagery using active multi-contour evolution with fully convolutional network-based intensity priors. ISPRS Journal of Photogrammetry and Remote Sensing, 178, 297--313. doi:https://doi.org/10.1016/j.isprsjprs.2021.06.016.
- Pomerleau, D. (1997). Visibility estimation from a moving vehicle using the ralph vision system. In Proceedings of Conference on Intelligent Transportation Systems (pp. 906--911). IEEE.
- Competitive collaboration: Joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 12240--12249).
- Meteorological conditions for the persistent severe fog and haze event over eastern china in january 2013. Science China Earth Sciences, 57, 26--35.
- 3-d depth reconstruction from a single still image. Int. J. Comput. Vis., 76, 53--69.
- U-net for learning and inference of dense representation of multiple air pollutants from satellite imagery. In Proceedings of the 10th International Conference on Climate Informatics CI2020 (p. 128–133). New York, NY, USA: Association for Computing Machinery. URL: https://doi.org/10.1145/3429309.3429328. doi:10.1145/3429309.3429328.
- Feature-metric loss for self-supervised learning of depth and egomotion. In European Conference on Computer Vision (pp. 572--588). Springer.
- Deep learning architecture for estimating hourly ground-level pm 2.5 using satellite remote sensing. IEEE Geoscience and Remote Sensing Letters, 16, 1343--1347.
- Tan, R. T. (2008). Visibility in bad weather from a single image. In 2008 IEEE conference on computer vision and pattern recognition (pp. 1--8). IEEE.
- Temporally consistent depth estimation in videos with recurrent architectures. In Proc. Eur. Conf. Comput. Vis. (pp. 0--0).
- Impact of particle number and mass size distributions of major chemical components on particle mass scattering efficiency in urban guangzhou in southern china. Atmospheric Chemistry and Physics, 19, 8471--8490.
- Estimating ground-level pm2. 5 using aerosol optical depth determined from satellite remote sensing. Journal of Geophysical Research: Atmospheres, 111.
- Learning depth from monocular videos using direct methods. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 2022--2030).
- Quantitative relationship between visibility and mass concentration of pm2. 5 in beijing. Journal of environmental sciences, 18, 475--481.
- Clear sky visibility has decreased over land globally from 1973 to 2007. Science, 323, 1468--1470.
- A new weakly supervised approach for als point cloud semantic segmentation. ISPRS Journal of Photogrammetry and Remote Sensing, 188, 237--254. URL: https://www.sciencedirect.com/science/article/pii/S0924271622001198. doi:https://doi.org/10.1016/j.isprsjprs.2022.04.016.
- Recurrent neural network for (un-) supervised learning of monocular video visual odometry and depth. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 5555--5564).
- Airlight estimation based on distant region segmentation. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS) (pp. 1--5). IEEE.
- Watson, J. G. (2002). Visibility: Science and regulation. Journal of the Air & Waste Management Association, 52, 628--713.
- Multi-scale continuous crfs as sequential deep networks for monocular depth estimation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 5354--5362).
- Structured attention guided convolutional neural fields for monocular depth estimation. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 3917--3925).
- Current challenges in visibility improvement in southern china. Environmental Science & Technology Letters, 7, 395--401.
- Proximal dehaze-net: A prior learning-based deep network for single image dehazing. In Proceedings of the european conference on computer vision (ECCV) (pp. 702--717).
- Unsupervised learning of geometry with edge-aware depth-normal consistency. arXiv preprint arXiv:1711.03665, .
- Geonet: Unsupervised learning of dense depth, optical flow and camera pose. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 1983--1992).
- Dmrvisnet: Deep multihead regression network for pixel-wise visibility estimation under foggy weather. IEEE Transactions on Intelligent Transportation Systems, 23, 22354--22366.
- Unsupervised learning of monocular depth estimation and visual odometry with deep feature reconstruction. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 340--349).
- Densely connected pyramid dehazing network. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3194--3203).
- Recent progress of aerosol light-scattering enhancement factor studies in china. Advances in Atmospheric Sciences, 36, 1015--1026.
- Analysis of influential factors for the relationship between pm 2.5 and aod in beijing. Atmospheric Chemistry and Physics, 17, 13473--13489.
- Unsupervised high-resolution depth learning from videos with dual networks. In Proc. IEEE Int. Conf. Comput. Vision (pp. 6872--6881).
- Unsupervised learning of depth and ego-motion from video. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR) (pp. 1851--1858).
- A fast single image haze removal algorithm using color attenuation prior. IEEE transactions on image processing, 24, 3522--3533.
- Learning ordinal relationships for mid-level vision. In Proc. IEEE Int. Conf. Comput. Vision (pp. 388--396).