Learning Correspondence Structures for Person Re-identification


Weiyao Lin, Yang Shen, Junchi Yan, Mingliang Xu, Jianxin Wu, Jingdong Wang, Ke Lu


Department of Electronic Engineering,

Shanghai Jiao Tong University, China






This paper addresses the problem of handling spatial misalignments due to camera-view changes or human-pose variations in person re-identification. We first introduce a boosting-based approach to learn a correspondence structure which indicates the patch-wise matching probabilities between images from a target camera pair. The learned correspondence structure can not only capture the spatial correspondence pattern between cameras but also handle the viewpoint or human-pose variation in individual images. We further introduce a global constraint-based matching process. It integrates a global matching constraint over the learned correspondence structure to exclude cross-view misalignments during the image patch matching process, hence achieving a more reliable matching score between images. Finally, we also extend our approach by introducing a multi-structure scheme, which learns a set of local correspondence structures to capture the spatial correspondence sub-patterns between a camera pair, so as to handle the spatial misalignments between individual images in a more precise way. Experimental results on various datasets demonstrate the effectiveness of our approach.


Framework of the proposed approach



This figure shows the framework of the proposed approch. During the training process, we present a boosting-based process to learn the correspondence structure between the target camera pair. During the prediction stage, given a probe image and a set of gallery images, we use the correspondence structure to evaluate the patch correlations between the probe image and each gallery image, and find the optimal one-to-one mapping between patches, and accordingly the matching score. The Re-ID result is achieved by ranking gallery images according to their matching scores.





Figure 1



The learned correspondence structures for various datasets. (a, d): The correspondence structures learned by our approach (with the KISSME metric) for the VIPeR [41], PRID 450S [9], 3DPeS [43], and SYSU-sReID [44] datasets, respectively (the correspondence structure for our ROAD dataset is shown in Figure). The line widths are proportional to the patch-wise probability values. (b, e): The complete correspondence structure matrices of (a, d, g, j) learned by our approach. (c, f): The correspondence structure matrices of (a, d)’s dataset obtained by the simple-average method. (Patches in (b, e) and (c, f) are organized by a row-first scanning order. All the correspondence structure matrices are down-sampled for a clearer illustration of the correspondence pattern). (Best viewed in color)


Table Ⅰ

CMC results on the VIPeR dataset (Proposed(single-KMFA) and Proposed(multi-manu-KMFA) indicates our approach using a single and multiple correspondence structures with KMFA-Rx2 metric)



Table Ⅱ

CMC results on the PRID 450S dataset



Table Ⅲ

CMC results on the 3DPeS dataset



Table Ⅳ

CMC results on the Road dataset



Table Ⅴ

CMC results on the SYSU-sReID dataset (Note: The TA-W method uses the same feature and distance metric as the Proposed (single-KMFA) method)



The CMC results of different methods are shown in Tables Ⅰ-Ⅴ. Moreover, since many works also reported fusion results on the VIPeR dataset (i.e., adding multiple Re-ID results together to achieve a higher Re-ID result), we also show a fusion result of our approach (svmml+Proposed+multi-manu-KMFA) and compare it with the other fusion results in Table Ⅰ.References in Table Ⅰ-Ⅴ can be found in our paper.





The Road dataset is our own constructed dataset which includes 416 image pairs taken by two cameras with camera A monitoring an exit region and camera B monitoring a road region.This dataset has large variation of human pose and camera angle. Images in this dataset are taken from a realistic crowd road scene.







Weiyao Lin, Y. Shen, J. Yan, M. Xu, J. Wu, J. Wang, K. Lu, "Learning correspondence structures for person re-identification," IEEE Trans. Image Processing, vol. 26, no. 5, pp. 2438 - 2453, 2017.






Institute of Media, Information, and Network (MIN Lab)