Learning to rank in person re-identification with metric ensembles (1503.01543v1)

Published 5 Mar 2015 in cs.CV

Abstract: We propose an effective structured learning based approach to the problem of person re-identification which outperforms the current state-of-the-art on most benchmark data sets evaluated. Our framework is built on the basis of multiple low-level hand-crafted and high-level visual features. We then formulate two optimization algorithms, which directly optimize evaluation measures commonly used in person re-identification, also known as the Cumulative Matching Characteristic (CMC) curve. Our new approach is practical to many real-world surveillance applications as the re-identification performance can be concentrated in the range of most practical importance. The combination of these factors leads to a person re-identification system which outperforms most existing algorithms. More importantly, we advance state-of-the-art results on person re-identification by improving the rank-$1$ recognition rates from $40\%$ to $50\%$ on the iLIDS benchmark, $16\%$ to $18\%$ on the PRID2011 benchmark, $43\%$ to $46\%$ on the VIPeR benchmark, $34\%$ to $53\%$ on the CUHK01 benchmark and $21\%$ to $62\%$ on the CUHK03 benchmark.

Citations (470)

View on Semantic Scholar

Summary

The paper presents a structured learning approach that integrates multiple metric learning strategies to optimize ranking and top-k recognition in person re-identification.
It employs both triplet-based optimization and structured learning for top-k recognition, achieving significant rank-1 improvements on benchmarks such as iLIDS and CUHK01.
By combining low-level handcrafted features with high-level CNN descriptors, the framework enhances operational precision for real-world surveillance applications.

Learning to Rank in Person Re-Identification with Metric Ensembles

The paper presents a structured learning approach for enhancing person re-identification (re-id) through an ensemble of metric learning algorithms, optimized specifically for practical surveillance applications. It integrates both low-level handcrafted and high-level visual features, seeking to maximize re-id performance in a meaningful operational range.

Methodological Framework

The authors propose a robust framework that directly optimizes evaluation measures commonly used in re-id, particularly the Cumulative Matching Characteristic (CMC) curve. Two optimization algorithms are introduced: one that focuses on optimizing relative distances using triplet information, and another that aims to maximize average rank- $k$ recognition rate.

Triplet-based Optimization: This method is grounded on ensuring images of the same individual have a shorter relative distance compared to images of different individuals. It formulates a constrained optimization problem that minimizes the number of false matches by setting boundaries on the allowable difference in distances between matched and non-matched images.
Structured Learning for Top-k Recognition: Building on the principles of structured learning, this approach optimizes the probability of correct identification among the top $k$ candidates retrieved, which aligns with real-world surveillance needs where typically only the first few matches are reviewed.

Experimental Results

The efficacy of the proposed methods is illustrated through extensive experimentation on multiple benchmark data sets—iLIDS, PRID2011, VIPeR, CUHK01, and CUHK03. Significant improvement in rank-$1$ recognition rates was observed, demonstrating enhancements over existing state-of-the-art methods:

iLIDS: Improved rank-$1$ recognition rate from 40% to 50%.
CUHK01: Improved recognition rate from 34% to 53%.

These results underscore the value of leveraging diverse visual features and optimization strategies in enhancing person re-id systems.

Visual Features and Metrics

The feature set includes SIFT with LAB color, LBP with RGB color, region covariance features, and convolutional neural network (CNN) derived features. Different metrics, including linear and non-linear (kernel-based) approaches, such as KISS ML and kLFDA, are fused to create a more robust discriminatory model.

Implications and Future Directions

The integration of multiple visual descriptors and the ensemble learning approach significantly enhance the precision of person re-id systems. This research has substantial implications for surveillance systems, improving their capability in real-time applications.

Focusing on optimizing performance for the first few retrieved candidates rather than overall match performance reflects a practical shift towards operational usability in security systems. The authors' approach of combining diverse features and metrics into a unified learning model indicates potential for further enhancements by incorporating additional feature types or advanced learning paradigms such as deep metric learning.

Future work may explore the scalability of these approaches to larger data sets and more complex environments, including highly dynamic and crowded scenes, to ensure robustness in diverse real-world settings. The potential for integrating more advanced deep learning architectures that can capture more complex patterns could drive further innovation in this domain.

PDF Markdown