RaCo: Ranking and Covariance for Practical Learned Keypoints
Abstract: This paper introduces RaCo, a lightweight neural network designed to learn robust and versatile keypoints suitable for a variety of 3D computer vision tasks. The model integrates three key components: the repeatable keypoint detector, a differentiable ranker to maximize matches with a limited number of keypoints, and a covariance estimator to quantify spatial uncertainty in metric scale. Trained on perspective image crops only, RaCo operates without the need for covisible image pairs. It achieves strong rotational robustness through extensive data augmentation, even without the use of computationally expensive equivariant network architectures. The method is evaluated on several challenging datasets, where it demonstrates state-of-the-art performance in keypoint repeatability and two-view matching, particularly under large in-plane rotations. Ultimately, RaCo provides an effective and simple strategy to independently estimate keypoint ranking and metric covariance without additional labels, detecting interpretable and repeatable interest points. The code is available at https://github.com/cvg/RaCo.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
What this paper is about
This paper presents RaCo, a small and fast computer program (a neural network) that finds “keypoints” in pictures. Keypoints are special spots in an image—like corners or distinctive patterns—that are easy to find again in other photos of the same scene. If you can find and match these spots across photos, you can do useful things like build 3D models, estimate camera motion, or help robots understand where they are.
RaCo focuses on three practical things:
- Detecting repeatable keypoints, even when the image is rotated.
- Ranking the keypoints so the best ones are kept when you can only afford a few.
- Estimating how uncertain each keypoint’s position is, so later steps can weigh reliable points more and noisy points less.
What questions the paper tries to answer
- How can we reliably detect the same keypoints across different photos, even if images are rotated, darker, or slightly warped?
- When we can only keep a small number of keypoints (to save time or memory), how do we pick the ones most likely to match across photos?
- Can we estimate how “sure” we are about each keypoint’s exact location in pixels, so that 3D reconstruction and pose estimation become more accurate?
How RaCo works, in simple terms
Training with image “warp-and-rotate” pairs
Instead of needing special labeled pairs of images, RaCo trains on single images by making two different “views” of the same picture:
- Imagine printing a photo on stretchy paper and then rotating, scaling, and slightly skewing it—that’s a “homography,” a kind of 2D warp.
- They also change brightness and contrast to mimic different lighting.
By knowing exactly how one warped view maps to the other, the system can check whether a keypoint detected in one view shows up in the right place in the other.
1) The keypoint detector: finding repeatable “landmarks”
- Think of the detector as a heatmap that highlights likely “landmarks” (corners or distinctive textures).
- It picks the strongest local peaks (so points don’t cluster too closely).
- To handle rotations, the authors don’t use heavy, slow rotation-aware networks. Instead, they heavily rotate images during training. This simple trick teaches RaCo to be robust to rotations up to 360°.
Analogy: The detector is like a treasure map that marks “X” at places you can easily recognize again—no matter if the map has been rotated.
2) The ranker: keeping the most matchable points
- Often you can’t keep every detected keypoint—phones and drones have limited speed and memory.
- RaCo adds a ranking module that scores keypoints so that, when you keep only the top ones, you still get as many matches as possible between two images.
- It learns to give similar ranks to corresponding points in both images, so matches aren’t accidentally thrown away when the list is shortened.
Analogy: If you can only bring a few tools on a trip, the ranker helps you pack the most useful ones. It also tries to pack matching tools for both “sides” so they work together.
3) The covariance estimator: measuring uncertainty
- Even good keypoints can be slightly off by a fraction of a pixel.
- RaCo predicts a 2D “uncertainty bubble” (an ellipse) for each keypoint in pixel units, called its covariance.
- This tells later algorithms how much to trust each point. For example, on bland, smooth walls the uncertainty is bigger; on sharp corners it’s smaller.
Analogy: If you draw a dot to show a location, the covariance is like drawing an oval around it that says, “I’m pretty sure the dot is somewhere inside this oval.”
A note on training style
- The detector is trained with a trial-and-error idea from reinforcement learning: points that reappear correctly in the second view get a “reward,” and the detector learns to pick more of those.
- The ranker uses a “soft sorting” trick so the network can learn which order maximizes matches.
- The covariance head learns by trying to explain the tiny differences between matched points across views as a realistic “uncertainty bubble.”
What the experiments show and why it matters
Here are the main takeaways from tests on standard datasets:
- Strong rotation robustness:
- RaCo stays accurate even when images are rotated all the way around. This is achieved with smart rotation training, not heavy special-purpose architectures.
- Why it matters: Real-world cameras tilt and rotate; drone or phone photos are rarely perfectly upright.
- High repeatability and solid matching:
- RaCo finds keypoints that show up reliably in other views, leading to many correct matches. It competes with or beats well-known methods, especially under big rotations and lighting changes.
- Better keypoint selection when you have a small budget:
- The ranker significantly boosts performance when you can only keep, say, 128 or 256 keypoints per image.
- It also improves other detectors (like SuperPoint) when attached to them, showing it’s a practical, plug-in tool.
- Why it matters: On edge devices and in large-scale systems, limiting the number of keypoints saves time and memory without sacrificing accuracy.
- Useful, metric uncertainty for 3D:
- The covariance estimates (the “uncertainty bubbles”) improve 3D triangulation and filtering of bad points.
- They’re well-calibrated in pixel units, so downstream steps can weight points properly and get more accurate 3D reconstructions.
- Why it matters: Knowing how sure you are about a point helps all later steps make smarter decisions.
Why this work is important
- Practical and lightweight: RaCo is designed to be simple and fast, making it more usable on real devices.
- Flexible: It learns from single images with synthetic warps—no need for expensive labeled pairs.
- Robust to rotations: A real pain point in many applications is handled with a simple and effective training trick.
- Plays well with others: The ranker can enhance other detectors, and the uncertainty estimates make downstream 3D tasks more reliable.
In short, RaCo makes the basic building blocks of 3D vision—finding, choosing, and trusting keypoints—more reliable and efficient. This helps improve tasks like mapping, augmented reality, and robot navigation, especially when computing power is limited or images come in at odd angles.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
The paper leaves several aspects missing or insufficiently explored. Below is a single, consolidated list of concrete gaps and questions to guide future work:
- End-to-end system validation: Quantify RaCo’s impact in full SfM pipelines (track-building, incremental reconstruction, bundle adjustment), including convergence quality, track length distribution, accuracy vs. completeness, and wall-clock/runtime/memory improvements when using the ranker and covariances.
- Descriptor compatibility: Evaluate how RaCo’s keypoints interact with modern matchers and descriptors (e.g., SuperGlue, LightGlue, LoFTR, HardNet), including match counts, precision/recall, and pose accuracy under varying keypoint budgets and ranking strategies.
- Covariance in downstream estimators: Investigate the effect of metric 2D covariances in RANSAC (e.g., weighted residuals, adaptive thresholds) and in robust essential matrix/homography estimation; compare against score-based weighting and DAC in these tasks.
- Non-homography training signals: Assess whether training covariance and ranker with real multi-view correspondences (e.g., SfM tracks, posed datasets) improves anisotropy, scale calibration, and generalization beyond planar homographies and within-scene augmentations.
- Metric consistency across sensors/resolutions: Test whether pixel-scale covariances remain well-calibrated across different camera intrinsics, resolutions, and lens distortions; provide procedures for re-scaling and calibration when input resolution or sensor changes.
- Aleatoric vs. epistemic uncertainty: Extend covariance modeling to include epistemic components (e.g., ensembles, MC dropout) and evaluate improvements in calibration, filtering efficacy, and robustness in out-of-distribution conditions.
- Viewpoint and scale robustness: Beyond in-plane rotations, rigorously evaluate RaCo under out-of-plane rotations, foreshortening, large baselines/parallax, and multi-scale changes; measure repeatability, localization error, and matching stability in these regimes.
- Degradation robustness: Systematically test performance under motion blur, rolling shutter, JPEG compression, sensor noise, low-light/night, dynamic scenes, specularities, and textureless regions; analyze failure modes and covariance behavior in these cases.
- Domain generalization: Benchmark across diverse domains (aerial, robotics, industrial, medical, underwater) to quantify generalization from the Oxford-Paris 1M training set; include nighttime/seasonal/weather variations and different camera models.
- Fairness of rotation comparisons: Replicate comparable rotation augmentations for all baselines (including DISK, DaD) or report controlled ablations to ensure apples-to-apples rotational robustness comparisons.
- Ranker spatial coverage: Analyze whether the learned ranking preserves spatial coverage (avoids clustering) under tight budgets; consider adding explicit coverage/diversity terms and quantify coverage vs. matchability trade-offs.
- Ranker generality: Test zero-shot applicability of the ranker to other detectors without retraining, and quantify the benefit of detector-specific vs. universal rankers across more detectors (e.g., DISK, ALIKED, DaD).
- Runtime/compute footprint: Report comprehensive efficiency metrics (FLOPs, memory, energy, latency on CPU/GPU/mobile) for detector+ranker+covariance heads; quantify the net overhead of the ranker and whether gains in downstream pruning offset added compute.
- Hyperparameter sensitivity: Ablate NMS radius, top-K budgets, reward shaping (ρpos/ρneg, dmax), soft-argmax subpixel settings, and augmentation recipes; report stability, convergence speed, and variance in policy-gradient training.
- RL training stability: Analyze policy-gradient variance, sample efficiency, reward scaling, and potential emergence of degenerate policies (e.g., “light/dark” detectors); compare to alternative training paradigms (supervised or direct differentiable objectives).
- Subpixel localization alternatives: Compare soft-argmax with other localization refinements (e.g., quadratic fit, Lucas–Kanade, gradient-based refinements) and measure effects on localization accuracy and covariance calibration.
- Covariance calibration diagnostics: Beyond log-log slope, provide reliability curves, calibration error metrics (ECE/MCE), and per-scene calibration analyses; study how calibration changes with image resolution and detector confidence.
- Integration of covariances into matching: Explore using predicted 2D covariances to guide feature matching (e.g., anisotropic search windows, adaptive match thresholds, uncertainty-aware keypoint pairing) and quantify gains.
- Lens distortion and non-pinhole cameras: Evaluate performance and covariance behavior under radial/fisheye/Omni-directional lenses; incorporate distortion-aware Jacobians in covariance propagation.
- Multi-scale detection behavior: Measure scale invariance explicitly (e.g., repeatability across zoom/resolution changes) and ablate the multi-scale backbone’s contribution relative to augmentation.
- Large-scale scalability: Test RaCo on very large image collections to quantify memory/runtime, track stability, pruning effectiveness via ranker and covariances, and practical speed-ups for mapping and localization.
- Joint detection–ranking learning: Investigate training a shared backbone with multi-task heads (detector, ranker, covariance) vs. separate ranker network; quantify performance, compute savings, and interference between tasks.
- Orientation estimation: Study whether learning keypoint orientation (or canonical frames) benefits rotational robustness further and improves descriptor compatibility; assess trade-offs vs. augmentation-only approach.
- Homography bias in training: Quantify any bias introduced by synthetic homographies (planarity, limited parallax) on real 3D scenes; evaluate whether additional 3D-aware augmentations or posed pairs reduce this gap.
- Use-case-specific metrics: Demonstrate ranker/covariance benefits in concrete downstream workloads (e.g., RANSAC iterations, BA convergence iterations, LightGlue inference time) under varying budgets to substantiate “compute-constrained” claims.
- Failure case analysis: Provide qualitative/quantitative analyses of scenarios where RaCo underperforms (e.g., repetitive patterns, extreme lighting, textureless regions) and how ranker and covariance respond; propose mitigations.
- Data and augmentation transparency: Detail the photometric/geom. augmentation distributions and training curriculum; examine how each augmentation contributes to rotation robustness and generalization.
Practical Applications
Below is a concise mapping from the paper’s technical contributions (rotation-robust keypoint detection, differentiable ranker for budget-limited matching, and metric 2D covariance estimation) to practical applications across sectors. Each item includes sector tags, potential tools/products/workflows, and feasibility notes.
Immediate Applications
- Robust, scalable SfM and photogrammetry upgrades (software, mapping/survey, cultural heritage)
- Replace existing detectors (e.g., SIFT, SuperPoint) in pipelines like COLMAP/OpenMVG/Metashape with RaCo to boost repeatability, rotation robustness, and matching stability; use RaCo’s per-observation covariances to weight bundle adjustment and prune unstable 3D points.
- Potential tools/products/workflows: COLMAP plugin that reads RaCo keypoints + per-keypoint covariances; covariance-weighted BA module; ranker-driven “adaptive keypoint budget” mode to keep reconstructions fast on large datasets.
- Assumptions/Dependencies: Your SfM/BA stack must accept per-residual covariances (or be extended to do so). For best results, adopt the paper’s rotation augmentation recipe when (re)training; descriptors or a dense matcher are still needed for matching.
- Real-time Visual SLAM/VIO front-end on embedded/edge devices (robotics, AR/MR, drones)
- Use RaCo as a drop-in keypoint front-end to improve tracking under arbitrary camera rotations; employ the ranker to maintain accuracy with small keypoint budgets to stay within latency/power constraints; feed covariances into EKF/graph-optimization for uncertainty-aware state estimation.
- Potential tools/products/workflows: ROS/ROS2 node providing RaCo keypoints + covariances; SLAM front-end module with dynamic budget control; integration with SuperGlue/LightGlue/LoFTR/RoMa matchers.
- Assumptions/Dependencies: Real-time feasibility depends on target hardware (convert model to CoreML/NNAPI/TFLite). Calibrate thresholds (NMS radius, keypoint counts) for your camera/lens; ensure downstream estimators accept anisotropic measurement covariances.
- Drone mapping and infrastructure inspection (energy, construction, transportation)
- Increase success rate of aerial mosaicking and 3D mapping under strong attitude changes; use covariances to downweight blurry/low-texture regions and to filter low-precision tracks before model delivery.
- Potential tools/products/workflows: Onboard RaCo inference with adaptive keypoint budgets; post-processing pipeline that prunes points using 3D marginal covariances for higher accuracy/completeness.
- Assumptions/Dependencies: Rolling-shutter, motion blur, and extreme lighting require validation; consider in-domain fine-tuning with unlabeled flight images using the paper’s homography-based self-supervision.
- Mobile AR/VR tracking stabilization (consumer software, e-commerce try-on)
- Improve robustness of feature tracking during phone rotations and rapid motions; ranker keeps matchable points even when device thermals force lower budgets; covariance can trigger guardrails (e.g., re-localization when uncertainty spikes).
- Potential tools/products/workflows: Mobile SDK exposing RaCo detection + ranker + covariance; “performance modes” that scale keypoint budgets to hit FPS targets.
- Assumptions/Dependencies: Must port the model efficiently (GPU/NPU). Pair with existing tracking frameworks (ARKit/ARCore) via plugin or custom front-end.
- Industrial alignment and registration (manufacturing QA, robotics manipulation)
- Use rotation-robust keypoints to register parts/fixtures from arbitrary camera orientations; employ per-keypoint covariances to set acceptance/tolerance thresholds and to flag low-confidence alignments.
- Potential tools/products/workflows: Vision-cell plugin that outputs alignment transforms with uncertainty bounds; dashboards that color-code low-confidence features for operator review.
- Assumptions/Dependencies: Domain-specific texture/lighting may require fine-tuning. Downstream control loops must interpret uncertainty correctly.
- Document scanning, panorama stitching, photo utilities (daily life, mobile apps)
- More reliable corner detection and homography estimation for document capture, and better match yield under rotations for panorama stitching.
- Potential tools/products/workflows: Lightweight on-device detector with ranker to maintain quality at low compute; covariance-weighted homography estimation for robust auto-cropping/warping.
- Assumptions/Dependencies: Strong motion blur or severe lens distortion may require tuning; descriptors or patch correlation still needed for matching.
- Detector-agnostic keypoint re-ranking (academia, software)
- Apply the ranker as a “plug-and-play” module to existing detectors (e.g., SuperPoint) to increase matches at small budgets without changing downstream code.
- Potential tools/products/workflows: Standalone ranker head trained on your image distribution; drop-in score replacement for top-K selection.
- Assumptions/Dependencies: Ranker should be (re)trained on representative unlabeled images; otherwise expect partial gains.
- Privacy-friendly, unlabeled training at scale (industry R&D, on-prem)
- Train/finetune RaCo on proprietary images without covisible pairs or depth/pose labels; homography-based synthetic pairs plus heavy rotation augmentations suffice.
- Potential tools/products/workflows: Secure on-prem training runs that never export data; scheduled continual finetuning on freshly collected images.
- Assumptions/Dependencies: Good augmentation coverage is crucial; ensure realistic photometric/geometry ranges for target deployments.
Long-Term Applications
- Uncertainty-aware autonomy certification and standards (policy, safety-critical robotics)
- Standardize per-observation covariance propagation from features to pose and map quality for safety cases, audits, and compliance.
- Potential tools/products/workflows: Test protocols that validate metric calibration (slope β≈1) of covariances; certification checklists requiring uncertainty-aware perception.
- Assumptions/Dependencies: Broad validation on domain-specific datasets (automotive, rail, UAM); harmonization with ISO/UL safety standards.
- End-to-end uncertainty propagation and risk-aware planning (robotics, digital twins)
- Extend covariance usage from local BA to whole SLAM stacks and task planners (next-best-view, active SLAM, safe navigation) that act on predicted uncertainty.
- Potential tools/products/workflows: Differentiable BA with learned covariances; planners that prioritize viewpoints reducing map entropy.
- Assumptions/Dependencies: Requires system-level changes across perception, state estimation, and planning; robust covariance calibration across sensors.
- Cross-domain registration in medicine and remote sensing (healthcare, earth observation)
- Adapt RaCo to medical/endoscopic and satellite/aerial multi-spectral imagery, where deformations/terrain parallax or modality changes challenge classic detectors.
- Potential tools/products/workflows: Domain-adapted training with tailored augmentations (non-rigid warps, spectrum shifts); uncertainty-guided registration for image fusion and change detection.
- Assumptions/Dependencies: Homography-based self-supervision may be insufficient—add domain priors and non-planar augmentations; regulatory validation in clinical workflows.
- Descriptor-light or descriptor-free pipelines (software, robotics)
- Combine RaCo with modern dense matchers (e.g., LoFTR, RoMa, DKM) to reduce descriptor storage and speed up long-term mapping via sparse tracks + dense refinement.
- Potential tools/products/workflows: Hybrid matching stacks using RaCo keypoints for indexing/seeding, then dense refinement where needed.
- Assumptions/Dependencies: Tailored schedulers deciding when to use dense vs. sparse; memory/runtime trade-offs remain app-specific.
- On-device continual self-supervised learning (mobile, drones, edge)
- Periodically refine detectors/rankers on-device with unlabeled captures to maintain robustness under drift (new scenes, sensors, lighting).
- Potential tools/products/workflows: Federated/self-supervised updates with homography augmentation; guardrails from covariance to prevent model regressions.
- Assumptions/Dependencies: Reliable on-device training loops and energy budgets; privacy/consent frameworks for data handling.
- Wide-FOV, fisheye, and omnidirectional cameras; event cameras (robotics, XR)
- Extend training to strong lens distortions and asynchronous sensors; optionally combine with equivariant layers for extreme rotations.
- Potential tools/products/workflows: Distortion-aware augmentations; event-frame synthesis for self-supervision; fast kernels for soft ranking and Cholesky heads on NPUs.
- Assumptions/Dependencies: Additional architectural changes may be needed; calibration and distortion models must be integrated.
- Multi-sensor fusion with principled covariance handling (robotics, surveying)
- Fuse RaCo’s per-feature covariances with IMU/LiDAR in factor graphs to improve consistency and observability analysis.
- Potential tools/products/workflows: Factor-graph libraries that accept anisotropic, keypoint-level measurement covariances; diagnostics for degeneracy detection.
- Assumptions/Dependencies: Consistent noise models across sensors; accurate time sync and calibration.
- Benchmarks and evaluation protocols for detectors (academia, standards)
- Formalize detector-only evaluation (repeatability, rotation AUC, localization error) divorced from descriptors; include uncertainty calibration metrics.
- Potential tools/products/workflows: Public leaderboards with rotation and budget-robustness tracks; datasets with homography/pose GT and calibration checks.
- Assumptions/Dependencies: Community adoption; curation of diverse domains (indoor/outdoor, aerial, mobile).
Notes on feasibility and integration
- RaCo is lightweight and fast, and code is released (GitHub). For best results, match the paper’s training strategy (strong rotation/photometric augmentations; homography-based self-supervision).
- The ranker can be trained for any detector; expect immediate gains in match yield at low budgets, especially for grid-sampled detectors.
- Covariance utility depends on downstream support for per-feature anisotropic weighting and for interpreting uncertainty in decision logic.
- For production, porting to mobile/embedded accelerators and establishing calibration/QA processes for uncertainty are key dependencies.
Glossary
- AdamW: An optimizer that decouples weight decay from gradient updates to improve generalization. "It is trained using the AdamW~\cite{loshchilov2017Fixing} optimizer in PyTorch~\cite{pytorch} with an NVIDIA 2080 Ti GPU."
- Aleatoric and epistemic uncertainty: Aleatoric is data-inherent noise; epistemic is model uncertainty due to limited knowledge. "UAPoint~\cite{zeng2025uncertainty} models aleatoric and epistemic uncertainty~\cite{kendall2017what} during training, which improves repeatability and matchability."
- Anisotropic covariance: A covariance that has direction-dependent variance, modeling elongated uncertainty ellipses. "Previous works quantified spatial uncertainty either through an up-to-scale anisotropic covariance~\cite{tirado2023dac,zeisl2009estimation} or a spatial confidence score~\cite{santellani2024gmm}."
- Area Under the Curve (AUC): A scalar summary of a performance curve, here for recall or repeatability over thresholds. "We also estimate homographies () or relative poses () and report the Area Under the recall Curve (AUC)."
- Bundle adjustment: Joint optimization of 3D structure and camera parameters by minimizing reprojection error across views. "estimating this is crucial for error propagation in downstream tasks, \eg, bundle adjustment in SfM~\cite{agarwal2010bundle,schoenberger2016sfm}."
- Cholesky decomposition: Factorization of a positive-definite matrix into a lower-triangular form to enable stable inverses and determinants. "the matrix inverse and determinant are computed efficiently via Cholesky decomposition of $\boldsymbol{\Sigma}_{\text{error}^i$."
- Cosine annealing schedule: A learning-rate decay strategy that follows a cosine curve to smoothly reduce the rate. "decaying according to a cosine annealing schedule to a terminal learning rate of ."
- Covisible image pairs: Pairs of images that observe overlapping parts of a scene. "RaCo operates without the need for covisible image pairs."
- Differentiable ranker: A module that assigns scores enabling learned ordering via differentiable approximations of ranks. "a differentiable ranker to maximize matches with a limited number of keypoints"
- Epipolar constraints: Geometric relationships restricting corresponding points to epipolar lines when cameras are related by a rigid motion. "used descriptors and ground truth depth or epipolar constraints to calculate reward values."
- Equivariant convolutions: Neural layers whose outputs transform predictably under input transformations (e.g., rotations). "uses equivariant convolutions~\cite{weiler2019general,cesa2022program,lee2022self} to increase rotational robustness."
- Gaussian mixture model (GMM): A probabilistic model representing data as a mixture of multiple Gaussian components. "in a gaussian mixture model (GMM) to both refine and score keypoints."
- Homography: A projective transformation mapping points between two views of a planar scene or under pure rotation. "Let be two views of an image related by a known ground-truth homography $\*H_{A\rightarrow B}$."
- Homography adaptation: Training or finetuning using synthetic or estimated homographies to improve detector robustness. "finetuned with homography adaptation on random images to close the synthetic-to-real domain gap."
- Jacobian: The matrix of partial derivatives used to linearly approximate how transformations affect local coordinates. "where is the Jacobian of the homography transformation evaluated at "
- Mutual nearest neighbors: A matching strategy where two points are paired if each is the nearest neighbor of the other. "Keypoint matching is subsequently performed by identifying mutual nearest neighbors within a specified reprojection radius in both views."
- Negative log-likelihood (NLL): A loss that measures how probable observed data is under a probabilistic model; minimizing NLL fits the model to data. "We compute the negative log-likelihood (NLL) of the reprojection error as"
- Non-linear least squares: Optimization that minimizes sums of squared residuals for problems with nonlinear relationships. "We then refine the 3D points with non-linear least squares optimization, implemented with PyCeres and COLMAP~\cite{ceres-solver,schoenberger2016sfm}"
- Non-Maximum Suppression (NMS): A procedure to keep only local maxima in a score map and prevent clustered detections. "Keypoints are selected by applying Non-Maximum Suppression (NMS)~\cite{superpoint,zhao2023aliked,tyszkiewicz2020disk} followed by top- selection"
- Policy-gradient: A reinforcement learning method that optimizes parameters by estimating gradients of expected rewards. "we adopt a policy-gradient~\cite{NIPS1999_464d828b_policy_gradient} approach to train a detector that produces repeatable keypoints."
- Reprojection error: The discrepancy between an observed point and its projection from estimated 3D or transformed 2D coordinates. "We train our covariance estimator by maximizing the log-likelihood of the reprojection error between corresponding keypoints."
- ResNet: A deep CNN architecture with skip connections that eases training of very deep networks. "The ranker is a separate ResNet~\cite{he2016deep} backbone which takes as input the normalized RGB image and outputs the ranker score map ."
- Rotation equivariance: A property where detector outputs transform consistently under image rotations. "we evaluate the rotation equivariance of keypoint detectors using in-plane rotations."
- Soft-argmax: A differentiable approximation to argmax that computes a weighted average of positions based on soft scores. "we use subpixel sampling based on the soft-argmax over the patch around the selected keypoint"
- Softplus: A smooth function ensuring positivity, often used to constrain parameters like variances. "the diagonal entries of are passed through a Softplus activation."
- Spearman's rank correlation coefficient: A nonparametric measure of monotonic association between ranked variables. "Maximizing Spearman's rank correlation coefficient~\cite{Spearman1904} of our ordered keypoints"
- Structure-from-Motion (SfM): Reconstructing 3D structure and camera motion from multiple images. "and Structure-from-Motion (SfM)~\cite{imwchallenge2021}."
- Triangulation: Estimating 3D point positions from their projections in multiple views and known camera poses. "We evaluate the covariances for the task of 3D triangulation on the ETH3D dataset~\cite{schops2017multi}."
Collections
Sign up for free to add this paper to one or more collections.
