LoMa: Local Feature Matching Revisited
This presentation challenges the prevailing view that local feature matching has reached its limits in computer vision. It introduces LoMa, a family of scalable descriptor-matcher models that demonstrate how proper scaling in data and model size can revitalize the detector-based matching paradigm. Through the introduction of HardMatch—a rigorous benchmark of hand-annotated challenging pairs—and comprehensive empirical analysis, the talk reveals that local feature matchers can outperform dense and feed-forward methods when trained at scale, achieving state-of-the-art results across visual localization, Structure-from-Motion, and correspondence estimation tasks.Script
Local feature matching has been declared obsolete by many researchers, supposedly surpassed by dense matchers and feed-forward pipelines. But what if the problem wasn't the paradigm itself, but how we've been scaling it?
The authors introduce LoMa, combining the DaD detector, DeDoDe descriptor, and an enhanced LightGlue matcher trained on 17 diverse datasets. The breakthrough is simple but powerful: local feature matchers improve predictably when you scale them properly, just like vision transformers do.
To prove this required a benchmark that wouldn't flatter existing methods.
HardMatch is deliberately adversarial, curating scenarios where matching should fail: ambiguous similar scenes, decade-spanning temporal changes, sketch-to-photo pairs. On these cases, previous benchmarks had saturated, hiding the real limits of matching systems.
The numbers tell a clear story. On HardMatch, LoMa-G outperforms LightGlue by 18 points and beats the best dense matcher by over 6. On pose estimation and localization benchmarks, the gains are even more dramatic—up to 24 points on IMC2022.
The scaling experiments are revealing. More training pairs and larger transformer depth both yield consistent improvements, defying claims that local matchers had hit a ceiling. The remaining challenges are precisely the adversarial cases HardMatch was designed to expose.
Local feature matching isn't obsolete—it was just waiting to be scaled properly, and LoMa proves the paradigm still has room to grow. Visit EmergentMind.com to explore this paper further and create your own research video presentations.