- The paper extends triplet loss to model heteroscedastic uncertainty, enabling unsupervised estimation of variable noise in retrieval tasks.
- Evaluations on Clothing1M and HDD datasets show improved retrieval performance by identifying and mitigating noisy and mislabeled data.
- Quantifying uncertainty aids in data cleaning and enhances the robustness and safety of visual retrieval systems in real-world scenarios.
Unsupervised Data Uncertainty Learning in Visual Retrieval Systems
This paper introduces an approach to model heteroscedastic uncertainty in visual retrieval systems by extending triplet loss. It explores uncertainty quantification in image and video retrieval applications, which can be pivotal for enhancing both performance and interpretability of retrieval models. By quantifying data uncertainty, the paper aims to improve handling of noisy observations in datasets used for retrieval tasks.
Heteroscedastic Uncertainty in Retrieval
Heteroscedastic uncertainty refers to noise that varies with input data, as opposed to homoscedastic noise, which is constant across observations. The paper proposes an extension to the triplet loss function that incorporates uncertainty as a variable dependent on input data. This approach allows the model to modulate its learning strategy based on the estimated uncertainty, strengthening the robustness and interpretability of embedding spaces used in retrieval systems.
In this approach, the triplet loss enables the estimation of data uncertainty without the need for labels, which is beneficial in scenarios where labeling heteroscedastic uncertainty is impractical. The model learns to predict an additional dimension, representing uncertainty, alongside the standard feature embedding.
Evaluation and Results
Fashion Image Retrieval
The Clothing1M dataset, known for its substantial noise due to incorrect labels, serves as a testbed for the proposed method. The uncertainty learning model helped identify noisy data and achieve better retrieval performance than a baseline triplet-based retrieval system. Performance metrics improved after cleaning the dataset based on uncertainty estimates, underscoring the practical utility of uncertainty quantification in real-world noisy datasets.
Visualization of Uncertain Data: A qualitative evaluation using the Clothing1M dataset revealed the ability of the model to identify confusing and potentially mislabeled images (Figure 1).

Figure 1: Qualitative evaluation using three very high uncertainty queries from Clothing1M dataset. Outline colors emphasize the uncertainty degree, red is very high. Inter-class similarity is a primary confusion source.
Autonomous Navigation
For video retrieval tasks—particularly in autonomous driving scenarios—the method was evaluated using the Honda driving dataset (HDD). The results indicated that modeling uncertainty could improve event retrieval and provide insights into confusing driving scenarios, which is crucial in safety-critical applications.
Performance on Large Datasets: Whereas traditional models may perform suboptimally due to noise, the proposed method showed improvements in precision by modeling and accounting for uncertainty during embedding.
Implications and Future Work
The ability to estimate and interpret uncertainty opens new opportunities in improving retrieval systems' performance in noisy environments. The proposed method can effectively aid in data cleaning and enhance retrieval models' robustness in the presence of uncertain and noisy data. Future work could explore extending these techniques to more complex ranking losses and deploying attention mechanisms to better understand the cause of uncertainty, especially in video-based datasets where contextual interpretation is more challenging.
In conclusion, unsupervised data uncertainty learning can be a pivotal technique for enhancing the robustness and efficiency of visual retrieval systems. Its application spans across different domains, suggesting potential improvements in model safety and interpretability, particularly in noise-prone real-world datasets.