MegaLoc: One Retrieval to Place Them All
Abstract: Retrieving images from the same location as a given query is an important component of multiple computer vision tasks, like Visual Place Recognition, Landmark Retrieval, Visual Localization, 3D reconstruction, and SLAM. However, existing solutions are built to specifically work for one of these tasks, and are known to fail when the requirements slightly change or when they meet out-of-distribution data. In this paper we combine a variety of existing methods, training techniques, and datasets to train a retrieval model, called MegaLoc, that is performant on multiple tasks. We find that MegaLoc (1) achieves state of the art on a large number of Visual Place Recognition datasets, (2) impressive results on common Landmark Retrieval datasets, and (3) sets a new state of the art for Visual Localization on the LaMAR datasets, where we only changed the retrieval method to the existing localization pipeline. The code for MegaLoc is available at https://github.com/gmberton/MegaLoc
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
MegaLoc: One Retrieval to Place Them All — A Simple Explanation
What this paper is about (brief overview)
The paper introduces MegaLoc, a computer program that can find photos taken in the same place as a given picture. What’s special is that it works well across different tasks and situations—whether the photos are indoors or outdoors, taken years apart, or from very different viewpoints. Instead of building separate tools for each task, the authors train one model that performs strongly on many of them.
What the researchers wanted to find out (key objectives)
In simple terms, they asked:
- Can we train one model that recognizes the same place in photos for many different uses (like navigation, mapping, and landmark search)?
- Can this one model be as good as, or better than, the best task‑specific tools?
- Will it stay accurate even when the photos look very different (time of day, seasons, camera angles, or indoor vs. outdoor)?
How they did it (methods in everyday language)
Think of each photo as getting a unique “fingerprint” made of numbers. If two photos are from the same place, their fingerprints should be very similar; if not, they should be different. MegaLoc learns how to make these fingerprints.
To teach MegaLoc, the authors:
- Collected training photos from many sources so the model sees lots of variety:
- SF‑XL (huge San Francisco street images across years)
- GSV‑Cities (Google Street View photos from many cities)
- MSLS (street photos taken in sequences over time)
- MegaScenes (3D reconstructions from internet photos of landmarks)
- ScanNet (indoor rooms and buildings)
- Built training mini‑groups (called “quadruplets”) of 4 images from the same place to show the model “these go together,” and used other images as “look‑alikes but different places” to make the task challenging.
- Used a “push‑pull” learning rule (multi‑similarity loss): it pulls fingerprints of same‑place photos closer and pushes different‑place photos apart—like organizing a giant photo album where each place forms its own tight cluster.
- Used a strong vision backbone (DINOv2, a modern Vision Transformer) and a smart “feature combiner” (SALAD) that turns image details into a compact, powerful fingerprint.
- Trained efficiently by calculating and clearing parts of the work step by step, so it could run on GPUs without using huge amounts of memory.
What they found (main results and why they matter)
MegaLoc performed extremely well across three major types of tasks:
- Visual Place Recognition (VPR): Finding photos taken within about 25 meters of the query photo.
- MegaLoc reached state‑of‑the‑art results on many standard VPR datasets (including hard cases like night, occlusions, and indoor scenes).
- Landmark Retrieval: Finding photos of the same landmark (like a church or monument), even if taken from far away or different sides.
- MegaLoc did exceptionally well on famous landmark tests (Revisited Oxford and Paris), beating previous methods by a large margin.
- Visual Localization (part of AR and robotics): Finding the best database images to help precisely locate the camera in 3D.
- On the LaMAR benchmark (with both phone and HoloLens images, indoors and outdoors), simply swapping the retrieval step with MegaLoc set new state‑of‑the‑art results—without changing the rest of the pipeline.
Why this matters:
- One model that works across many scenarios means simpler systems and fewer failures when conditions change.
- It helps apps like AR navigation, robot mapping (SLAM), and 3D reconstruction by giving them better starting matches between images.
What this could change (implications and impact)
- Fewer specialized tools: Teams won’t need separate retrieval models for different tasks—MegaLoc can often do them all.
- More reliable real‑world performance: Because it was trained on very diverse data, MegaLoc is more robust when photos are taken from unusual angles, at different times, or indoors.
- Better foundations for 3D mapping and localization: Stronger image retrieval improves the whole pipeline, leading to faster and more accurate location estimates.
A quick note on limitations
- In sequences where all photos face forward along roads (like parts of MSLS), a specialized method (CliqueMining) can be slightly better.
- In unusual natural scenes (forests, caves), another method (AnyLoc) may perform better.
- For tiny devices where memory and speed are critical, lighter models (like small ResNets) might be preferable.
Overall, MegaLoc shows that with the right mix of data and training tricks, one “place‑finding” model can perform great across many jobs—making location‑based apps and systems simpler, stronger, and more dependable.
Knowledge Gaps
Knowledge Gaps, Limitations, and Open Questions
The paper leaves the following gaps and open questions that future work could concretely address:
- Lack of ablations on training design choices: no systematic study of how each dataset, sampler (EigenPlaces, CliqueMining, overlap constraints), and augmentation contribute to performance across tasks.
- No analysis of loss composition: the six multi-similarity losses are simply summed with equal weight; the effect of alternative weightings, curriculum learning, or adaptive domain weighting remains unexplored.
- Sub-batch isolation across datasets: the loss is computed per sub-batch/dataset, meaning no cross-dataset positives/negatives are ever contrasted; the impact of mixing datasets within batches or explicit cross-domain hard-negative mining is unknown.
- Descriptor dimensionality choice unexamined: the 8448-D projection is fixed without reporting trade-offs versus accuracy, memory, and latency; PCA/whitening, product quantization, or learned compression are not evaluated.
- Aggregator/backbone design space unexplored: only DINOv2-base + SALAD (fixed hyperparameters) is used; there is no ablation on number of clusters, token dims, global token usage, MLP size, or alternative backbones (e.g., DINOv2-L/G, CLIP-ViT) and their cost–benefit.
- Training schedule sensitivity unreported: the effect of number of iterations (40k), optimizer/scheduler choices, learning-rate schedules, and warm-up on convergence and generalization is not studied.
- Orientation bias and forward-facing sequences: MegaLoc underperforms on MSLS (mostly forward-facing); it is unclear how to make the model robust to camera orientation biases (e.g., orientation-aware sampling/augmentations or orientation-conditioned descriptors).
- Limited domain coverage for natural environments: forests, caves, underwater, subterranean, and other non-urban/natural scenes (where AnyLoc excels) are not in training or evaluation; strategies to generalize there are open.
- Camera/intrinsics modality gaps: robustness to fisheye lenses, very wide FOV, mobile ultra-wide cameras, rolling shutter, and lens distortions is not assessed; no experiments on non-RGB modalities (thermal, NIR, multispectral).
- Resource footprint and deployment constraints: the 228M-parameter model and 8448-D descriptors incur heavy training (60 GB VRAM with custom backward scheduling) and retrieval costs; distillation, pruning, low-rank adapters, quantization-aware training, or lightweight backbones for edge deployment are not explored.
- Large-scale indexing scalability: approximate nearest neighbor choices (e.g., IVF-PQ, HNSW), index quantization, and recall–latency trade-offs for multi-million to billion-scale databases are not analyzed.
- End-to-end re-ranking/geometric verification: although the paper notes many failure cases may be fixable via re-ranking (matchers, majority voting), it does not integrate or quantify end-to-end gains, compute overheads, or failure transitions with common verifiers (e.g., SuperGlue, RANSAC variants).
- Label noise robustness: incorrect GPS labels in GSV/MSLS are observed but not addressed; noise-robust training (loss correction, confidence-weighted sampling, co-teaching) and uncertainty modeling remain open.
- Evaluation metrics beyond fixed thresholds: VPR commonly uses positives; the paper argues for retrieval beyond nearest-coverage but does not propose or test continuous geodesic/angle metrics, variable thresholds, or coverage-aware scoring that reflect real deployments.
- Dataset and geographic bias: training relies heavily on SF-XL (San Francisco) and specific sources; generalization across underrepresented regions, architectural styles, and socio-environmental conditions (weather, seasons, cultural landmarks) is not characterized.
- Indoor granularity and cross-floor ambiguities: while ScanNet is included, there is no targeted analysis of cross-floor aliasing, room-level disambiguation, or building-scale transitions in complex indoor spaces.
- Sequential/temporal cues unused: the model operates on single images; leveraging sequence information (temporal aggregation, pose-graph-aware retrieval) for SLAM/loop closure is left open.
- SLAM and reconstruction outcomes: downstream effects on SLAM quality (ATE/ATE drift, loop-closure precision/recall, map completeness) and 3D reconstruction metrics are not evaluated despite claims of broad utility.
- Visual localization breadth: evaluation is limited to LaMAR; generalization to standard VL benchmarks (e.g., Aachen Day-Night, InLoc, Cambridge Landmarks) and diverse capture conditions is untested.
- Data augmentation specificity: only RandAugment is used; the benefits of task/domain-specific augmentations (viewpoint/random homographies, photometric night/rain/fog, motion blur, sensor noise) are not assessed.
- Handling sparse or biased database coverage: proposed remedies (e.g., multi-directional capture) are discussed qualitatively; methods for synthetic view generation, viewpoint completion, or learned view synthesis for retrieval are not investigated.
- Sensor and prior fusion: integration of IMU, coarse GPS, map priors, semantics, or depth to guide retrieval is not considered; multi-modal fusion remains an open direction.
- Reproducibility and variance: results are reported without run-to-run variance, confidence intervals, or sensitivity to random seeds; robustness of conclusions to stochasticity is unclear.
- Failure-case taxonomy quantification: the four categories are illustrated but not quantified; automatic detection/mitigation strategies and their prevalence across datasets remain unmeasured.
- Ethical, legal, and privacy considerations: training on sources like GSV/Mapillary raises PII and licensing questions; the paper does not discuss compliance, data filtering, or privacy-preserving training.
- Unified benchmarking: while advocating one model for LR, VPR, and VL, the paper does not release a consolidated benchmark or protocol that jointly evaluates cross-task performance with standardized metrics and compute budgets.
Practical Applications
Immediate Applications
Below are specific, deployable use cases enabled by MegaLoc’s unified image-retrieval model and training strategy. Each item lists sectors, potential tools/workflows/products, and key assumptions/dependencies.
- Sector: Software/3D Vision/Mapping
- Use case: Drop-in upgrade for visual localization and 3D reconstruction pipelines (e.g., HLoc, COLMAP, GLOMAP, InLoc) to improve retrieval of candidate images across small and large scenes, indoor and outdoor.
- Tools/workflows: Replace retrieval module with MegaLoc; pair with local feature matchers (e.g., SuperGlue/LoFTR/LightGlue) for verification; index embeddings with ANN libraries (e.g., FAISS with PQ/IVF).
- Assumptions/dependencies: Requires a geo-referenced image database with sufficient coverage and view diversity; memory footprint for 8.4k-d embeddings at city scale demands ANN + compression; licensing/usage rights for imagery.
- Sector: Robotics/Autonomous Systems (ground robots, drones, warehouse AMRs)
- Use case: Robust loop-closure detection and re-localization under GPS loss, viewpoint changes, and indoor/outdoor transitions.
- Tools/workflows: Swap retrieval in SLAM stacks; integrate into ROS-based navigation; combine with temporal re-ranking or sequence models (e.g., JIST-style) for further gains.
- Assumptions/dependencies: Adequate prior mapping imagery; compute constraints on embedded platforms (consider server-offload or distillation to lighter backbones for edge).
- Sector: AR/VR/Spatial Computing
- Use case: Persistent AR content anchoring and fast device re-localization in venues (malls, museums, campuses, stadiums) and mixed indoor–outdoor spaces.
- Tools/workflows: Build an “AR cloud” index of place embeddings; on-device query, server-side retrieval, then local pose refinement; use majority voting/re-ranking for ambiguity resolution.
- Assumptions/dependencies: Pre-built and maintained visual maps; data freshness with seasonal/time-of-day changes; privacy and consent for visual indexing.
- Sector: Transportation/Navigation
- Use case: Camera-only navigation fallback for driver assistance and micro-mobility; geo-localization in urban canyons where GPS is unreliable.
- Tools/workflows: Integrate MegaLoc into navigation stacks for candidate retrieval; fuse with IMU and map priors; verify with local matching for precise pose.
- Assumptions/dependencies: City-scale image databases with directional coverage; legal and safety validation for on-road deployment.
- Sector: Infrastructure/Construction/Utilities
- Use case: Photo-based progress monitoring and asset inspection by retrieving historical views of the same site (bridges, towers, pipelines).
- Tools/workflows: “Find prior views by place” portal; timeline visualization; change detection after retrieval.
- Assumptions/dependencies: Archived imagery over time; consistent metadata; occlusions and night imagery partially mitigated but not fully solved.
- Sector: Security/Public Safety/Disaster Response
- Use case: Image/video geo-localization for incident mapping and rapid situational awareness when GPS metadata is missing.
- Tools/workflows: OSINT toolchain to index public imagery and retrieve likely locations; human-in-the-loop verification with feature matching.
- Assumptions/dependencies: Availability/legality of indexing public images; risk of mis-localization in visually repetitive areas; ethical and privacy safeguards.
- Sector: Cultural Heritage/Tourism/Media
- Use case: Landmark retrieval for content organization, tour generation, and media asset search (“find all images of this landmark”).
- Tools/workflows: Photo library de-duplication and curation; recommendation engines for points of interest; content verification for UGC.
- Assumptions/dependencies: Database breadth (different sides/facades); time-of-day/occlusion variance partly handled but benefits from re-ranking.
- Sector: E-commerce/Insurance/Real Estate
- Use case: Location claim verification for listings and claims by matching to known place imagery; fraud reduction.
- Tools/workflows: Backend verification API; manual review UI with top-k retrieved matches; escalate to local matching-based verification.
- Assumptions/dependencies: Sufficient coverage of relevant locales; consider adversarial manipulations; maintain explainability workflows for auditors.
- Sector: Academia/Research Engineering
- Use case: Unified benchmarking across VPR, Landmark Retrieval, and Visual Localization; teaching and rapid prototyping for place-centric pipelines.
- Tools/workflows: Open-source MegaLoc; memory-efficient training technique (independent backward calls per sub-batch); multi-dataset samplers (quadruplets with overlap constraints).
- Assumptions/dependencies: Access to datasets (GSV-Cities, MSLS, SF-XL, MegaScenes, ScanNet); large-scale training compute for reproduction (though inference is straightforward).
- Sector: Consumer Apps/Daily Life
- Use case: “Where was this photo taken?” on-device/offline geo-hinting; private-by-design local retrieval against a downloaded city pack.
- Tools/workflows: Compressed embedding packs for cities; optional server-side re-ranking for high precision.
- Assumptions/dependencies: Storage and bandwidth for packs; privacy-preserving defaults; compute adaptation for mobile.
- Sector: Open Mapping/Community GIS
- Use case: Better deduplication, indexing, and coverage analysis for crowd-sourced street-level imagery (e.g., OpenStreetMap communities).
- Tools/workflows: Coverage heatmaps based on retrieval misses; suggest capture directions (forward/sideways) to close gaps.
- Assumptions/dependencies: Community policies and data-sharing; fair compute access for volunteers.
Long-Term Applications
These applications are promising but require further research, scaling, or engineering (e.g., model compression, broader domain training, policy frameworks).
- Sector: Edge/Embedded AI
- Use case: Real-time, on-device unified place retrieval for wearables, drones, and automotive-grade hardware.
- Needed advances: Distillation/quantization of DINOv2+SALAD; hardware-aware architectures; mixed-precision ANN indices on-device.
- Assumptions/dependencies: Robust performance under tight power/memory budgets; safety certification for automotive/aviation use.
- Sector: Natural Environments (Forests, Caves, Off-road)
- Use case: Reliable place recognition in visually repetitive, texture-poor settings.
- Needed advances: Domain-specific training (beyond MegaLoc’s current strengths); hybrid features (multispectral, LiDAR, event cameras); sequence modeling.
- Assumptions/dependencies: New datasets and continual learning to avoid catastrophic forgetting; weather/season generalization.
- Sector: Global AR Cloud and Digital Twins
- Use case: World-scale spatial indexing for persistent, shared AR and city digital twins.
- Needed advances: Massive-scale indexing with privacy-by-design; dynamic updates and drift handling; federated, jurisdiction-compliant data governance.
- Assumptions/dependencies: Stable funding and data partnerships; standards for interoperability and safety; robust re-localization across dense urban aliasing.
- Sector: Continual/Online Learning and Domain Adaptation
- Use case: Retrieval models that adapt to new cities, renovations, and long-term changes without full retraining.
- Needed advances: Incremental model updates; rehearsal-free learning; confidence estimation and automatic hard-negative mining at scale.
- Assumptions/dependencies: Reliable monitoring for performance regressions; human-in-the-loop safeguards for critical systems.
- Sector: Multimodal Place Recognition (Vision + IMU/LiDAR/GNSS/Audio)
- Use case: Robust cross-sensor retrieval and localization for autonomous systems and smartphones.
- Needed advances: Fusion architectures for retrieval; alignment losses across modalities; dataset curation for synchronized sensing.
- Assumptions/dependencies: Sensor synchronization; calibration pipelines; increased storage and compute for multimodal indices.
- Sector: Policy/Regulation and Ethical Tech
- Use case: Standards and guardrails for image-based geo-localization (consent, opt-out, retention limits, transparency).
- Needed advances: Policy frameworks balancing public safety with privacy; provenance tracking and explainability; red-teaming for mis-use.
- Assumptions/dependencies: Multistakeholder collaboration (industry, civil society, regulators); compliance automation and auditing.
- Sector: Planetary/Remote Sensing Extensions
- Use case: Visual localization for planetary rovers and aerial platforms; cross-domain retrieval (e.g., Earth-from-space to ground imagery).
- Needed advances: Training on extraterrestrial terrains and multi-altitude imagery; domain transfer and simulation-to-reality methods.
- Assumptions/dependencies: Specialized datasets and simulators; limited bandwidth and compute constraints for space systems.
- Sector: Automated Coverage Planning and Data Acquisition
- Use case: Use retrieval misses and failure cases to plan optimal capture routes (directions, times, viewpoints) for mapping fleets.
- Needed advances: Closed-loop systems coupling retrieval confidence with active planning; economic optimization for fleet operations.
- Assumptions/dependencies: Access to fleet telemetry; integration with routing/operations platforms.
- Sector: High-Assurance Verification (Insurance, Compliance, Journalism)
- Use case: End-to-end, auditable pipelines that verify image location claims at scale with calibrated confidence and human escalation.
- Needed advances: Standardized evaluation and reporting; adversarial robustness; provenance (C2PA) integration.
- Assumptions/dependencies: Legal frameworks and acceptance of machine-assisted verification; continuous benchmarking across domains.
Cross-cutting Assumptions and Dependencies
The following factors influence feasibility across many use cases:
- Data coverage and quality: Balanced indoor/outdoor, multiple view directions, time-of-day/seasonal variety; label accuracy (training sets may contain GPS errors).
- Compute and scalability: City-scale indices require ANN and compression; naive kNN on float32 embeddings is memory-prohibitive at multi-million scale.
- Domain mismatch risks: Performance drops in forward-only sequences (MSLS-like) and unusual natural environments; may require domain-specific fine-tuning or sequence cues.
- Pipeline design: Best results when retrieval is followed by geometric verification/re-ranking; sequence or majority voting mitigates aliasing.
- Privacy, consent, and governance: Retrieval can infer location from images—policies and opt-outs are essential for consumer and public-sector deployments.
- Hardware constraints: The DINOv2-base + SALAD stack is not ideal for embedded; deployment may require distillation/quantization or server-offloaded inference.
Glossary
- 3D reconstruction: The process of building 3D models or scene geometry from multiple images. "Imagine you are doing 3D reconstruction, where image retrieval is a fundamental component"
- AdamW: An optimization algorithm that decouples weight decay from the gradient-based updates in Adam. "and AdamW as optimizer."
- aggregation layer: A network component that aggregates local or token-level features into a single global descriptor for retrieval. "followed by a SALAD aggregation layer"
- bag-of-words: A vector quantization approach that represents images by counts of visual word occurrences, commonly used in classical retrieval. "like RootSIFT with bag-of-words"
- backward(): The automatic differentiation call in PyTorch that computes gradients and frees the computation graph. "calling in PyTorch not only computes the gradient (which is added to any existing gradient), but also frees the computational graph (hence freeing memory)."
- COLMAP: A widely used structure-from-motion and multi-view stereo pipeline for 3D reconstruction. "3D vision pipelines like COLMAP"
- DINO-v2-base: A self-supervised Vision Transformer backbone from the DINOv2 family, used for feature extraction. "consists of a DINO-v2-base backbone"
- EigenPlaces: A training/sampling strategy aimed at improving viewpoint robustness for visual place recognition. "we use the sampling technique presented in EigenPlaces"
- GLOMAP: A modern large-scale 3D reconstruction/localization pipeline used in vision research. "and GLOMAP keep using outdated retrieval methods"
- Google Street View Cities (GSV-Cities): A geolocated dataset of street-view images organized by places for VPR training/evaluation. "Google Street View Cities (GSV-Cities) is a dataset of 530k images"
- hard negatives: Non-matching samples that are visually similar to the query, used to make training more discriminative. "places (i.e hard negatives)"
- Hierarchical Localization: A two-stage visual localization pipeline combining retrieval and local feature matching. "Hierarchical Localization"
- InLoc: A retrieval-based visual localization method for indoor environments. "and InLoc"
- kNN: k-nearest neighbors search used for large-scale similarity retrieval. "for a float32-based kNN"
- L2 normalization: Normalizing a vector to unit Euclidean norm to standardize descriptor magnitude. "and an L2 normalization."
- LaMAR: A benchmark suite for large-scale augmented reality visual localization across devices and environments. "Visual Localization on the LaMAR datasets"
- Landmark Retrieval (LR): The task of retrieving images depicting the same landmark, regardless of camera proximity. "Landmark Retrieval (LR) folks will tell you"
- Mapillary Street-Level Sequences (MSLS): A large-scale dataset of street-level images organized in temporal sequences across many cities. "Mapillary Street-Level Sequences (MSLS) is a dataset of 1.6M images split in contiguous sequences, across 30 different cities over 9 years."
- MegaScenes: A large collection of community photo-based 3D reconstructions used for training robust retrieval models. "MegaScenes is a collection of 100k 3D structure-from-motion reconstructions"
- multi-similarity loss: A deep metric learning loss that combines multiple similarity measures to better separate positives and negatives. "use a multi-similarity loss computed over each sub-batch."
- NetVLAD: A CNN-based aggregation module that produces VLAD-like global descriptors for place recognition. "and NetVLAD"
- out-of-distribution data: Inputs whose distribution differs from the training data, often causing model performance degradation. "or when they meet out-of-distribution data."
- RandAugment: An automated data augmentation policy that applies randomized transformations during training. "We use RandAugment for data augmentation"
- Recall@1: An evaluation metric indicating the percentage of queries whose correct match is ranked at position 1. "Recall@1 and Recall@10 on multiple VPR datasets."
- re-ranking: A post-processing step that reorders initial retrieval results using additional cues or matchers to improve accuracy. "e.g re-ranking with image matchers"
- RootSIFT: A variant of SIFT descriptors that applies square-root normalization to improve matching performance. "like RootSIFT with bag-of-words"
- SALAD: A state-of-the-art learnable aggregation layer for VPR that clusters tokens and builds powerful global descriptors. "followed by a SALAD aggregation layer"
- San Francisco eXtra Large (SF-XL): A massive geolocated street-view dataset covering San Francisco across time for VPR research. "San Francisco eXtra Large (SF-XL) is a dataset of 41M images"
- ScanNet: A dataset of RGB-D scans from indoor environments used for training and evaluating localization and recognition models. "ScanNet is a dataset of 2.5M views from 1500 scans from 707 indoor places."
- SLAM: Simultaneous Localization and Mapping; the joint task of building a map while tracking the camera pose. "and SLAM."
- state of the art: The best reported performance or method at the time of writing. "achieves state of the art on a large number of Visual Place Recognition datasets,"
- structure-from-motion: A technique to reconstruct 3D structure and camera poses from multiple overlapping images. "3D structure-from-motion reconstructions"
- visual aliasing: The phenomenon where distinct places look visually similar, confusing recognition and localization. "which comprise various challenges, including plenty of visual aliasing"
- visual overlap: The presence of shared scene content between images indicating overlapping fields of view. "each of these four images should have visual overlap with each other"
- Visual Localization (VL): Estimating the precise 6-DoF camera pose of a query image in a known environment. "Visual Localization (VL) / 3D Vision researchers"
- Visual Place Recognition (VPR): Retrieving images of the same place (often within a set distance threshold) as a given query. "Visual Place Recognition (VPR) people set a camera pose distance of 25 meters"
- VRAM: Video RAM on GPUs used to store model parameters, features, and computation graphs during training/inference. "This simple technique reduces the VRAM requirement of training MegaLoc from (roughly) 300GB to 60GB."
Collections
Sign up for free to add this paper to one or more collections.