EN CH

Learning Correspondence Structures for Person Re-identification

 

Weiyao Lin, Yang Shen, Junchi Yan, Mingliang Xu, Jianxin Wu, Jingdong Wang, Ke Lu

 

Department of Electronic Engineering,

Shanghai Jiao Tong University, China

 

wylin@sjtu.edu.cn

 

 

Abstract

This paper addresses the problem of handling spatial misalignments due to camera-view changes or human-pose variations in person re-identification. We first introduce a boosting-based approach to learn a correspondence structure which indicates the patch-wise matching probabilities between images from a target camera pair. The learned correspondence structure can not only capture the spatial correspondence pattern between cameras but also handle the viewpoint or human-pose variation in individual images. We further introduce a global constraint-based matching process. It integrates a global matching constraint over the learned correspondence structure to exclude cross-view misalignments during the image patch matching process, hence achieving a more reliable matching score between images. Finally, we also extend our approach by introducing a multi-structure scheme, which learns a set of local correspondence structures to capture the spatial correspondence sub-patterns between a camera pair, so as to handle the spatial misalignments between individual images in a more precise way. Experimental results on various datasets demonstrate the effectiveness of our approach.

 

Framework of the proposed approach

1

 

This figure shows the framework of the proposed approch. During the training process, we present a boosting-based process to learn the correspondence structure between the target camera pair. During the prediction stage, given a probe image and a set of gallery images, we use the correspondence structure to evaluate the patch correlations between the probe image and each gallery image, and find the optimal one-to-one mapping between patches, and accordingly the matching score. The Re-ID result is achieved by ranking gallery images according to their matching scores.

 

 

Results

 

Figure 1

1

1

The learned correspondence structures for various datasets. (a, d): The correspondence structures learned by our approach (with the KISSME metric) for the VIPeR [41], PRID 450S [9], 3DPeS [43], and SYSU-sReID [44] datasets, respectively (the correspondence structure for our ROAD dataset is shown in Figure). The line widths are proportional to the patch-wise probability values. (b, e): The complete correspondence structure matrices of (a, d, g, j) learned by our approach. (c, f): The correspondence structure matrices of (a, d)’s dataset obtained by the simple-average method. (Patches in (b, e) and (c, f) are organized by a row-first scanning order. All the correspondence structure matrices are down-sampled for a clearer illustration of the correspondence pattern). (Best viewed in color)

 

Table Ⅰ

CMC results on the VIPeR dataset (Proposed(single-KMFA) and Proposed(multi-manu-KMFA) indicates our approach using a single and multiple correspondence structures with KMFA-Rx2 metric)

1

 

Table Ⅱ

CMC results on the PRID 450S dataset

1

 

Table Ⅲ

CMC results on the 3DPeS dataset

1

 

Table Ⅳ

CMC results on the Road dataset

1

 

Table Ⅴ

CMC results on the SYSU-sReID dataset (Note: The TA-W method uses the same feature and distance metric as the Proposed (single-KMFA) method)

1

 

The CMC results of different methods are shown in Tables Ⅰ-Ⅴ. Moreover, since many works also reported fusion results on the VIPeR dataset (i.e., adding multiple Re-ID results together to achieve a higher Re-ID result), we also show a fusion result of our approach (svmml+Proposed+multi-manu-KMFA) and compare it with the other fusion results in Table Ⅰ.References in Table Ⅰ-Ⅴ can be found in our paper.

 

 

Dataset

 

The Road dataset is our own constructed dataset which includes 416 image pairs taken by two cameras with camera A monitoring an exit region and camera B monitoring a road region.This dataset has large variation of human pose and camera angle. Images in this dataset are taken from a realistic crowd road scene.

CAM_A:

CAM_B:

 

RoadDataset

 

Citation

Weiyao Lin, Y. Shen, J. Yan, M. Xu, J. Wu, J. Wang, K. Lu, "Learning correspondence structures for person re-identification," IEEE Trans. Image Processing, vol. 26, no. 5, pp. 2438 - 2453, 2017.

 

PDF

 

 

 

Institute of Media, Information, and Network (MIN Lab)

沪交ICP备20160059