ELEV-VISION-SAM: Integrated Vision Language and Foundation Model for Automated Estimation of Building Lowest Floor Elevation (2404.12606v1)
Abstract: Street view imagery, aided by advancements in image quality and accessibility, has emerged as a valuable resource for urban analytics research. Recent studies have explored its potential for estimating lowest floor elevation (LFE), offering a scalable alternative to traditional on-site measurements, crucial for assessing properties' flood risk and damage extent. While existing methods rely on object detection, the introduction of image segmentation has broadened street view images' utility for LFE estimation, although challenges still remain in segmentation quality and capability to distinguish front doors from other doors. To address these challenges in LFE estimation, this study integrates the Segment Anything model, a segmentation foundation model, with vision LLMs to conduct text-prompt image segmentation on street view images for LFE estimation. By evaluating various vision LLMs, integration methods, and text prompts, we identify the most suitable model for street view image analytics and LFE estimation tasks, thereby improving the availability of the current LFE estimation model based on image segmentation from 33% to 56% of properties. Remarkably, our proposed method significantly enhances the availability of LFE estimation to almost all properties in which the front door is visible in the street view image. Also the findings present the first baseline and comparison of various vision models of street view image-based LFE estimation. The model and findings not only contribute to advancing street view image segmentation for urban analytics but also provide a novel approach for image segmentation tasks for other civil engineering and infrastructure analytics tasks.
- \BBOP2021\BBCP. \BBOQ\APACrefatitleStreet view imagery in urban analytics and GIS: A review Street view imagery in urban analytics and gis: A review.\BBCQ \APACjournalVolNumPagesLandscape and Urban Planning215104217. \PrintBackRefs\CurrentBib
- \BBOP2016\BBCP. \BBOQ\APACrefatitleFlood damage analysis: First floor elevation uncertainty resulting from LiDAR-derived digital surface models Flood damage analysis: First floor elevation uncertainty resulting from lidar-derived digital surface models.\BBCQ \APACjournalVolNumPagesRemote Sensing87604. \PrintBackRefs\CurrentBib
- \APACinsertmetastarcigler2017us{APACrefauthors}Cigler, B\BPBIA. \BBOP2017\BBCP. \BBOQ\APACrefatitleUS floods: The necessity of mitigation Us floods: The necessity of mitigation.\BBCQ \APACjournalVolNumPagesState and Local Government Review492127–139. \PrintBackRefs\CurrentBib
- \APACinsertmetastarCOHPARCELS{APACrefauthors}City of Houston GIS. \BBOP2024\BBCP. \APACrefbtitleCity of Houston CADASTRAL PARCELS web service. City of Houston CADASTRAL PARCELS web service. \APAChowpublishedhttps://www.openstreetmap.org. \PrintBackRefs\CurrentBib
- \BBOP2022\BBCP. \BBOQ\APACrefatitleDeriving First Floor Elevations within Residential Communities Located in Galveston Using UAS Based Data Deriving First Floor Elevations within Residential Communities Located in Galveston Using UAS Based Data.\BBCQ \APACjournalVolNumPagesDrones6481. \PrintBackRefs\CurrentBib
- \BBOP2020\BBCP. \BBOQ\APACrefatitleAn image is worth 16x16 words: Transformers for image recognition at scale An image is worth 16x16 words: Transformers for image recognition at scale.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2010.11929 [cs.CV]. \PrintBackRefs\CurrentBib
- \BBOP2010\BBCP. \APACrefbtitleThe PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results. The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results. \APAChowpublishedhttp://www.pascal-network.org/challenges/VOC/voc2010/workshop/index.html. \PrintBackRefs\CurrentBib
- \BBOP2024\BBCP. \BBOQ\APACrefatitleAdapting Public Annotated Data Sets and Low-Quality Dash Cameras for Spatiotemporal Estimation of Traffic-Related Air Pollution: A Transfer-Learning Approach Adapting public annotated data sets and low-quality dash cameras for spatiotemporal estimation of traffic-related air pollution: A transfer-learning approach.\BBCQ \APACjournalVolNumPagesJournal of Computing in Civil Engineering38304024006. \PrintBackRefs\CurrentBib
- \APACinsertmetastarfema_appendix_2020{APACrefauthors}FEMA. \BBOP2020\BBCP. \BBOQ\APACrefatitleAppendix C: Lowest Floor Guide Appendix C: Lowest Floor Guide.\BBCQ \BIn \APACrefbtitleNFIP Flood Insurance Manual NFIP Flood Insurance Manual (\PrintOrdinalApril 2020 \BEd). \APAChowpublishedhttps://www.fema.gov/sites/default/files/2020-05/fim_appendix-c-lowest-floor-guide_apr2020.pdf. \PrintBackRefs\CurrentBib
- \APACinsertmetastarfema_index{APACrefauthors}FEMA. \BBOP2024\BBCP. \APACrefbtitleNational Flood Insurance Program Terminology Index. National flood insurance program terminology index. \APAChowpublishedhttps://www.fema.gov/flood-insurance/terminology-index. \PrintBackRefs\CurrentBib
- \BBOP2023\BBCP. \BBOQ\APACrefatitleExploring flood mitigation governance by estimating first-floor elevation via deep learning and google street view in coastal Texas Exploring flood mitigation governance by estimating first-floor elevation via deep learning and google street view in coastal Texas.\BBCQ \APACjournalVolNumPagesEnvironment and Planning B: Urban Analytics and City Science23998083231175681. \PrintBackRefs\CurrentBib
- \BBOP\BIP\BBCP. \BBOQ\APACrefatitleELEV-VISION: Automated Lowest Floor Elevation Estimation from Segmenting Street View Images ELEV-VISION: Automated lowest floor elevation estimation from segmenting street view images.\BBCQ \APACjournalVolNumPagesAccepted for publication in ACM Journal on Computing and Sustainable Societies on 1 April 2024. \PrintBackRefs\CurrentBib
- \BBOP2020\BBCP. \BBOQ\APACrefatitleUnderstanding cities with machine eyes: A review of deep computer vision in urban analytics Understanding cities with machine eyes: A review of deep computer vision in urban analytics.\BBCQ \APACjournalVolNumPagesCities96102481. \PrintBackRefs\CurrentBib
- \BBOP2020\BBCP. \BBOQ\APACrefatitleA review of urban physical environment sensing using street view imagery in public health studies A review of urban physical environment sensing using street view imagery in public health studies.\BBCQ \APACjournalVolNumPagesAnnals of GIS263261–275. \PrintBackRefs\CurrentBib
- \BBOP2023\BBCP. \BBOQ\APACrefatitlePost-disaster damage classification based on deep multi-view image fusion Post-disaster damage classification based on deep multi-view image fusion.\BBCQ \APACjournalVolNumPagesComputer-Aided Civil and Infrastructure Engineering384528–544. \PrintBackRefs\CurrentBib
- \BBOP2023\BBCP. \BBOQ\APACrefatitleSegment Anything Segment anything.\BBCQ \BIn \APACrefbtitleProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Proceedings of the ieee/cvf international conference on computer vision (iccv) (\BPG 4015-4026). \PrintBackRefs\CurrentBib
- \APACinsertmetastarkousky2018financing{APACrefauthors}Kousky, C. \BBOP2018\BBCP. \BBOQ\APACrefatitleFinancing flood losses: A discussion of the national flood insurance program Financing flood losses: A discussion of the national flood insurance program.\BBCQ \APACjournalVolNumPagesRisk Management and Insurance Review21111–32. \PrintBackRefs\CurrentBib
- \BBOP2020\BBCP. \BBOQ\APACrefatitleAutomated building image extraction from 360 panoramas for postdisaster evaluation Automated building image extraction from 360 panoramas for postdisaster evaluation.\BBCQ \APACjournalVolNumPagesComputer-Aided Civil and Infrastructure Engineering353241–257. \PrintBackRefs\CurrentBib
- \BBOP2023\BBCP. \BBOQ\APACrefatitleClip surgery for better explainability with enhancement in open-vocabulary tasks Clip surgery for better explainability with enhancement in open-vocabulary tasks.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2304.05653 [cs.CV]. \PrintBackRefs\CurrentBib
- \BBOP2024\BBCP. \BBOQ\APACrefatitleFloodGenome: Interpretable Machine Learning for Decoding Features Shaping Property Flood Risk Predisposition in Cities Floodgenome: Interpretable machine learning for decoding features shaping property flood risk predisposition in cities.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2403.10625. \PrintBackRefs\CurrentBib
- \BBOP2023\BBCP. \BBOQ\APACrefatitleGrounding dino: Marrying dino with grounded pre-training for open-set object detection Grounding dino: Marrying dino with grounded pre-training for open-set object detection.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2303.05499 [cs.CV]. \PrintBackRefs\CurrentBib
- \BBOP2021\BBCP. \BBOQ\APACrefatitleSwin transformer: Hierarchical vision transformer using shifted windows Swin transformer: Hierarchical vision transformer using shifted windows.\BBCQ \BIn \APACrefbtitleProceedings of the IEEE/CVF international conference on computer vision Proceedings of the ieee/cvf international conference on computer vision (\BPGS 10012–10022). \PrintBackRefs\CurrentBib
- \BBOP2023\BBCP. \BBOQ\APACrefatitleAutomated visual surveying of vehicle heights to help measure the risk of overheight collisions using deep learning and view geometry Automated visual surveying of vehicle heights to help measure the risk of overheight collisions using deep learning and view geometry.\BBCQ \APACjournalVolNumPagesComputer-Aided Civil and Infrastructure Engineering382194–210. \PrintBackRefs\CurrentBib
- \BBOP2024\BBCP. \BBOQ\APACrefatitleUrban form and structure explain variability in spatial inequality of property flood risk among US counties Urban form and structure explain variability in spatial inequality of property flood risk among us counties.\BBCQ \APACjournalVolNumPagesCommunications Earth & Environment51172. \PrintBackRefs\CurrentBib
- \BBOP2020\BBCP. \BBOQ\APACrefatitleVision-based automated bridge component recognition with high-level scene consistency Vision-based automated bridge component recognition with high-level scene consistency.\BBCQ \APACjournalVolNumPagesComputer-Aided Civil and Infrastructure Engineering355465–482. \PrintBackRefs\CurrentBib
- \BBOP2022\BBCP. \BBOQ\APACrefatitleExploring the vertical dimension of street view image based on deep learning: a case study on lowest floor elevation estimation Exploring the vertical dimension of street view image based on deep learning: a case study on lowest floor elevation estimation.\BBCQ \APACjournalVolNumPagesInternational Journal of Geographical Information Science3671317–1342. \PrintBackRefs\CurrentBib
- \APACinsertmetastarOpenStreetMap{APACrefauthors}OpenStreetMap contributors. \BBOP2017\BBCP. \APACrefbtitlePlanet dump retrieved from https://planet.osm.org . Planet dump retrieved from https://planet.osm.org . \APAChowpublishedhttps://www.openstreetmap.org. \PrintBackRefs\CurrentBib
- \APACinsertmetastarsegment_anything_with_clip{APACrefauthors}Park, J. \BBOP2024\BBCP. \APACrefbtitlesegment-anything-with-clip. segment-anything-with-clip. \APAChowpublishedhttps://github.com/Curt-Park/segment-anything-with-clip. \PrintBackRefs\CurrentBib
- \BBOP2021\BBCP. \BBOQ\APACrefatitleLearning transferable visual models from natural language supervision Learning transferable visual models from natural language supervision.\BBCQ \BIn \APACrefbtitleInternational conference on machine learning International conference on machine learning (\BPGS 8748–8763). \PrintBackRefs\CurrentBib
- \BBOP2024\BBCP. \BBOQ\APACrefatitleGrounded sam: Assembling open-world models for diverse visual tasks Grounded sam: Assembling open-world models for diverse visual tasks.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2401.14159 [cs.CV]. \PrintBackRefs\CurrentBib
- \BBOP2024\BBCP. \BBOQ\APACrefatitleAccessing Eye-level Greenness Visibility from Open-Source Street View Images: A methodological development and implementation in multi-city and multi-country contexts Accessing eye-level greenness visibility from open-source street view images: A methodological development and implementation in multi-city and multi-country contexts.\BBCQ \APACjournalVolNumPagesSustainable Cities and Society105262. \PrintBackRefs\CurrentBib
- \APACinsertmetastarstromberg2007natural{APACrefauthors}Strömberg, D. \BBOP2007\BBCP. \BBOQ\APACrefatitleNatural disasters, economic development, and humanitarian aid Natural disasters, economic development, and humanitarian aid.\BBCQ \APACjournalVolNumPagesJournal of Economic perspectives213199–222. \PrintBackRefs\CurrentBib
- \APACinsertmetastarlabelme{APACrefauthors}Wada, K.\BCBT \BOthersPeriod. \BBOP2024\BBCP. \APACrefbtitleLabelMe: Image Polygonal Annotation with Python. LabelMe: Image Polygonal Annotation with Python. \APAChowpublishedhttps://github.com/labelmeai/labelme?tab=readme-ov-file. \PrintBackRefs\CurrentBib
- \BBOP2024\BBCP. \BBOQ\APACrefatitleTowards Open Vocabulary Learning: A Survey Towards open vocabulary learning: A survey.\BBCQ \APACjournalVolNumPagesIEEE Transactions on Pattern Analysis and Machine Intelligence1-20. \PrintBackRefs\CurrentBib
- \BBOP2024\BBCP. \BBOQ\APACrefatitleComputer vision based first floor elevation estimation from mobile LiDAR data Computer vision based first floor elevation estimation from mobile lidar data.\BBCQ \APACjournalVolNumPagesAutomation in Construction159105258. \PrintBackRefs\CurrentBib
- \BBOP2023\BBCP. \BBOQ\APACrefatitleBuilding height calculation for an urban area based on street view images and deep learning Building height calculation for an urban area based on street view images and deep learning.\BBCQ \APACjournalVolNumPagesComputer-Aided Civil and Infrastructure Engineering387892–906. \PrintBackRefs\CurrentBib
- \BBOP2023\BBCP. \BBOQ\APACrefatitleUnsupervised Graph Deep Learning Reveals Emergent Flood Risk Profile of Urban Areas Unsupervised graph deep learning reveals emergent flood risk profile of urban areas.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2309.14610. \PrintBackRefs\CurrentBib
- \BBOP2023\BBCP. \BBOQ\APACrefatitleAn integrated resilience assessment model of urban transportation network: A case study of 40 cities in China An integrated resilience assessment model of urban transportation network: A case study of 40 cities in china.\BBCQ \APACjournalVolNumPagesTransportation Research Part A: Policy and Practice173103687. \PrintBackRefs\CurrentBib
- \BBOP2020\BBCP. \BBOQ\APACrefatitleNeglecting uncertainties biases house-elevation decisions to manage riverine flood risks Neglecting uncertainties biases house-elevation decisions to manage riverine flood risks.\BBCQ \APACjournalVolNumPagesNature communications1115361. \PrintBackRefs\CurrentBib
- \BBOP2024\BBCP. \BBOQ\APACrefatitleVision-language models for vision tasks: A survey Vision-language models for vision tasks: A survey.\BBCQ \APACjournalVolNumPagesIEEE Transactions on Pattern Analysis and Machine Intelligence. \PrintBackRefs\CurrentBib
- \BBOP2023\BBCP. \BBOQ\APACrefatitleText2Seg: Remote Sensing Image Semantic Segmentation via Text-Guided Visual Foundation Models Text2seg: Remote sensing image semantic segmentation via text-guided visual foundation models.\BBCQ \APACjournalVolNumPagesarXiv preprint arXiv:2304.10597 [cs.CV]. \PrintBackRefs\CurrentBib