Learning Correspondence Structures for Person Re-identification
Weiyao Lin, Yang Shen, Junchi Yan, Mingliang Xu, Jianxin Wu, Jingdong Wang, Ke Lu
Department of Electronic Engineering,
Shanghai Jiao Tong University, China
This paper addresses the problem of handling spatial misalignments due to camera-view changes or human-pose variations in person re-identification. We first introduce a boosting-based approach to learn a correspondence structure which indicates the patch-wise matching probabilities between images from a target camera pair. The learned correspondence structure can not only capture the spatial correspondence pattern between cameras but also handle the viewpoint or human-pose variation in individual images. We further introduce a global constraint-based matching process. It integrates a global matching constraint over the learned correspondence structure to exclude cross-view misalignments during the image patch matching process, hence achieving a more reliable matching score between images. Finally, we also extend our approach by introducing a multi-structure scheme, which learns a set of local correspondence structures to capture the spatial correspondence sub-patterns between a camera pair, so as to handle the spatial misalignments between individual images in a more precise way. Experimental results on various datasets demonstrate the effectiveness of our approach.
This figure shows the framework of the proposed approch. During the training process, we present a boosting-based process to learn the correspondence structure between the target camera pair. During the prediction stage, given a probe image and a set of gallery images, we use the correspondence structure to evaluate the patch correlations between the probe image and each gallery image, and find the optimal one-to-one mapping between patches, and accordingly the matching score. The Re-ID result is achieved by ranking gallery images according to their matching scores.
The learned correspondence structures for various datasets. (a, d): The correspondence structures learned by our approach (with the KISSME metric) for the VIPeR , PRID 450S , 3DPeS , and SYSU-sReID  datasets, respectively (the correspondence structure for our ROAD dataset is shown in Figure). The line widths are proportional to the patch-wise probability values. (b, e): The complete correspondence structure matrices of (a, d, g, j) learned by our approach. (c, f): The correspondence structure matrices of (a, d)’s dataset obtained by the simple-average method. (Patches in (b, e) and (c, f) are organized by a row-first scanning order. All the correspondence structure matrices are down-sampled for a clearer illustration of the correspondence pattern). (Best viewed in color)
CMC results on the VIPeR dataset (Proposed(single-KMFA) and Proposed(multi-manu-KMFA) indicates our approach using a single and multiple correspondence structures with KMFA-Rx2 metric)
CMC results on the PRID 450S dataset
CMC results on the 3DPeS dataset
CMC results on the Road dataset
CMC results on the SYSU-sReID dataset (Note: The TA-W method uses the same feature and distance metric as the Proposed (single-KMFA) method)
The CMC results of different methods are shown in Tables Ⅰ-Ⅴ. Moreover, since many works also reported fusion results on the VIPeR dataset (i.e., adding multiple Re-ID results together to achieve a higher Re-ID result), we also show a fusion result of our approach (svmml+Proposed+multi-manu-KMFA) and compare it with the other fusion results in Table Ⅰ.References in Table Ⅰ-Ⅴ can be found in our paper.
Institute of Media, Information, and Network (MIN Lab)