Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 96 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 35 tok/s
GPT-5 High 43 tok/s Pro
GPT-4o 106 tok/s
GPT OSS 120B 460 tok/s Pro
Kimi K2 228 tok/s Pro
2000 character limit reached

Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps (2310.12616v1)

Published 19 Oct 2023 in cs.CV and cs.AI

Abstract: Historical maps provide useful spatio-temporal information on the Earth's surface before modern earth observation techniques came into being. To extract information from maps, neural networks, which gain wide popularity in recent years, have replaced hand-crafted map processing methods and tedious manual labor. However, aleatoric uncertainty, known as data-dependent uncertainty, inherent in the drawing/scanning/fading defects of the original map sheets and inadequate contexts when cropping maps into small tiles considering the memory limits of the training process, challenges the model to make correct predictions. As aleatoric uncertainty cannot be reduced even with more training data collected, we argue that complementary spatio-temporal contexts can be helpful. To achieve this, we propose a U-Net-based network that fuses spatio-temporal features with cross-attention transformers (U-SpaTem), aggregating information at a larger spatial range as well as through a temporal sequence of images. Our model achieves a better performance than other state-or-art models that use either temporal or spatial contexts. Compared with pure vision transformers, our model is more lightweight and effective. To the best of our knowledge, leveraging both spatial and temporal contexts have been rarely explored before in the segmentation task. Even though our application is on segmenting historical maps, we believe that the method can be transferred into other fields with similar problems like temporal sequences of satellite images. Our code is freely accessible at https://github.com/chenyizi086/wu.2023.sigspatial.git.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (50)
  1. Deep Learning Detection and Recognition of Spot Elevations on Historical Topographic Maps. Frontiers in Environmental Science (2022), 117.
  2. Ding Bin and Wong Kok Cheong. 1998. A system for automatic extraction of road network from maps. In International Joint Symposia on Intelligence and Systems. 359–366.
  3. Road network evolution in the urban and rural United States since 1900. Computers, Environment and Urban Systems 95 (2022), 101803.
  4. Attention to scale: Scale-aware semantic image segmentation. In IEEE Conference on Computer Vision and Pattern Recognition. 3640–3649.
  5. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV). 801–818.
  6. Combining deep learning and mathematical morphology for historical map segmentation. In Discrete Geometry and Mathematical Morphology: First International Joint Conference, DGMM 2021, Uppsala, Sweden, May 24–27, 2021, Proceedings. Springer, 79–92.
  7. Yao-Yi Chiang and Craig A Knoblock. 2010. Extracting road vector data from raster maps. In Graphics Recognition. Achievements, Challenges, and Evolution: 8th International Workshop, GREC 2009, La Rochelle, France, July 22-23, 2009. Selected Papers 8. Springer, 93–105.
  8. François Chollet et al. 2015. Keras. https://keras.io.
  9. 3D U-Net: learning dense volumetric segmentation from sparse annotation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016: 19th International Conference, Athens, Greece, October 17-21, 2016, Proceedings, Part II 19. Springer, 424–432.
  10. Context contrasted feature and gated multi-scale aggregation for scene segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2393–2402.
  11. Looking outside the window: Wide-context transformer for the semantic segmentation of high-resolution remote sensing images. arXiv 2021. arXiv preprint arXiv:2106.15754 ([n. d.]).
  12. Automatic road extraction from historical maps using deep learning techniques: A regional case study of turkey in a German World War II Map. ISPRS International Journal of Geo-Information 10, 8 (2021), 492.
  13. Multiscale vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6824–6835.
  14. Vivien Sainte Fare Garnot and Loic Landrieu. 2021. Panoptic segmentation of satellite image time series with convolutional temporal attention networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4872–4881.
  15. Satellite image time series classification with pixel-set encoders and temporal self-attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12325–12334.
  16. Adaptive pyramid context network for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7519–7528.
  17. Magnus Heitzler and Lorenz Hurni. 2020. Cartographic reconstruction of building footprints from historical maps: A study on the Swiss Siegfried map. Transactions in GIS 24, 2 (2020), 442–461.
  18. Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016).
  19. Video scene parsing with predictive feature learning. In Proceedings of the IEEE International Conference on Computer Vision. 5580–5588.
  20. Mining contextual information beyond image for semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7231–7241.
  21. Alex Kendall and Yarin Gal. 2017. What uncertainties do we need in bayesian deep learning for computer vision? Advances in neural information processing systems 30 (2017).
  22. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization.
  23. Maps and the settlement of southern Palestine, 1799–1948: an historical/GIS analysis. Journal of Historical Geography 36, 1 (2010), 1–18.
  24. Stefan Leyk. 2009. Segmentation of colour layers in historical maps based on hierarchical colour sampling. In International Workshop on Graphics Recognition. 231–241.
  25. Spatial pyramid based graph reasoning for semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 8950–8959.
  26. Synthetic map generation to provide unlimited training data for historical map text detection. In Proceedings of the 4th ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery. 17–26.
  27. Learning to predict context-adaptive convolution for semantic segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXV 16. Springer, 769–786.
  28. Efficient semantic video segmentation with per-frame inference. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16. Springer, 352–368.
  29. Transformer in convolutional neural networks. arXiv preprint arXiv:2106.03180 3 (2021).
  30. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012–10022.
  31. David Nilsson and Cristian Sminchisescu. 2018. Semantic video segmentation by gated recurrent flow propagation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 6819–6828.
  32. Automatic differentiation in PyTorch. (2017).
  33. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 234–241.
  34. Marc Rußwurm and Marco Körner. 2018. Convolutional LSTMs for cloud-robust segmentation of remote sensing imagery. arXiv preprint arXiv:1811.02471 (2018).
  35. Urban and landscape changes through historical maps: The Real Sitio of Aranjuez (1775–2005), a case study. Computers, environment and urban systems 44 (2014), 47–58.
  36. Multi-scale fully convolutional neural networks for histopathology image segmentation: from nuclear aberrations to the global tissue architecture. Medical image analysis 70 (2021), 101996.
  37. Coarse-to-fine feature mining for video semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3126–3137.
  38. Automated extraction of human settlement patterns from historical topographic map series using weakly supervised convolutional neural networks. IEEE Access 8 (2019), 6978–6996.
  39. Attention is all you need. Advances in neural information processing systems 30.
  40. ReSeg: A recurrent neural network-based model for semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition Workshops. 41–48.
  41. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF international conference on computer vision. 568–578.
  42. An Automatic Extraction Method for Hatched Residential Areas in Raster Maps Based on Multi-Scale Feature Fusion. ISPRS International Journal of Geo-Information 10, 12 (2021), 831.
  43. a Closer Look at Segmentation Uncertainty of Scanned Historical Maps. International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences-ISPRS Archives 43, B4-2022 (2022), 189–194.
  44. Leveraging uncertainty estimation and spatial pyramid pooling for extracting hydrological features from scanned historical topographic maps. GIScience & Remote Sensing 59, 1 (2022), 200–214.
  45. Domain adaptation in segmenting historical maps: A weakly supervised approach through spatial co-occurrence. ISPRS Journal of Photogrammetry and Remote Sensing 197 (2023), 199–211.
  46. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. In Neural Information Processing Systems (NeurIPS).
  47. On layer normalization in the transformer architecture. In International Conference on Machine Learning. PMLR, 10524–10533.
  48. Hierarchical attention guided framework for multi-resolution collaborative whole slide image segmentation. (2021), 153–163.
  49. Object-contextual representations for semantic segmentation. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VI 16. Springer, 173–190.
  50. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 6881–6890.
Citations (2)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-up Questions

We haven't generated follow-up questions for this paper yet.