Top-push Video-based Person Re-identification (1604.08683v2)

Published 29 Apr 2016 in cs.CV

Abstract: Most existing person re-identification (re-id) models focus on matching still person images across disjoint camera views. Since only limited information can be exploited from still images, it is hard (if not impossible) to overcome the occlusion, pose and camera-view change, and lighting variation problems. In comparison, video-based re-id methods can utilize extra space-time information, which contains much more rich cues for matching to overcome the mentioned problems. However, we find that when using video-based representation, some inter-class difference can be much more obscure than the one when using still-image based representation, because different people could not only have similar appearance but also have similar motions and actions which are hard to align. To solve this problem, we propose a top-push distance learning model (TDL), in which we integrate a top-push constrain for matching video features of persons. The top-push constraint enforces the optimization on top-rank matching in re-id, so as to make the matching model more effective towards selecting more discriminative features to distinguish different persons. Our experiments show that the proposed video-based re-id framework outperforms the state-of-the-art video-based re-id methods.

Authors (4)

Jinjie You (1 paper)
Ancong Wu (19 papers)
Xiang Li (1003 papers)
Wei-Shi Zheng (148 papers)

Citations (257)

View on Semantic Scholar

Summary

Evaluation of Top-push Video-based Person Re-identification

The field of person re-identification (re-id) has predominantly focused on matching still images across disjoint camera views, often encountering challenges such as occlusion, pose variation, and lighting discrepancies. The paper "Top-push Video-based Person Re-identification" presents an alternative approach to these issues through the use of video-based methodologies. With an emphasis on overcoming the inherent limitations of still images, the authors introduce a top-push distance learning model (TDL) aimed at optimizing top-rank matching performance by leveraging the enriched spatial-temporal features present in video data.

Summary of Approach

The proposed TDL model addresses the novel application of video-based re-id by integrating a top-push constraint. The constraint focuses on the enhancement of discriminative feature selection by enforcing optimization on top-rank matching, which is particularly effective for separating inter-class ambiguities often encountered in video data. The authors extract a comprehensive feature representation using HOG3D descriptors for spatio-temporal information and average pooling of color histograms and LBP features for appearance cues. The TDL model is then optimized to minimize intra-class variation while ensuring that the positive pair distances are smaller than the minimum inter-class distances, executed through a hinge loss function. The authors employ a stochastic gradient descent projection method for matrix optimization, ensuring positive semi-definiteness at each iteration.

Experimental Results and Discussion

The effectiveness of the TDL model is demonstrated through extensive experiments on two publicly available datasets: PRID 2011 and iLIDS-VID. The authors compare the TDL performance against several state-of-the-art video and image-based re-id methods, such as SDALF, Salience, and Color{+}LBP{+}DVR. The TDL model outperforms these benchmarks, achieving a notable improvement in Rank-1 matching accuracy, particularly on challenging datasets like iLIDS-VID. The paper illustrates that video-based approaches leveraging temporal information significantly enhance person re-id performance over traditional still-image methods, as evidenced by the superior ranking results provided by TDL.

Implications and Future Directions

The implications of this research extend to the practical application of video-based surveillance systems, where increasing robustness against occlusions and varying conditions remains a vital requirement. By effectively leveraging motion and temporal data, the TDL model offers a promising direction for refining person re-id systems. Given the demonstrated success, further investigation into scalable solutions for real-time video processing and application across expansive networked surveillance systems could be explored.

Future research might also focus on integrating deep learning techniques with the proposed distance learning model to further enhance feature representation and extraction capabilities. Exploring hybrid models that combine both video-based and newly advanced still-image methods could yield comprehensive frameworks capable of robust and efficient person re-id in varying environments. The insights provided by this paper affirm the importance of considering video features and constraints focused on top-rank performance within the re-id domain.

PDF Markdown

Related Papers

Find Related Papers