Transductive Video Segmentation on Tree-Structured Model


Botao Wang, Zhihui Fu, Hongkai Xiong, Senior Member, IEEE, and Yuan F. Zheng, Fellow, IEEE


Abstract: This paper presents a transductive multi-component video segmentation algorithm, which is capable of segmenting the pre-defined object of interest in the frames of a video sequence. To ensure the temporal consistency, a temporal coherent parametric min-cut algorithm is developed to generate segmentation hypotheses based on visual cues and motion cues. Furthermore, each hypothesis is evaluated by an energy function from foreground resemblance, foreground/background divergence, boundary strength and visual saliency. In particular, state-of-the-art R-CNN descriptor is leveraged to encode the visual appearance of the foreground object. Finally, the optimal segmentation of the frame can be attained by assembling the segmentation hypotheses via Monte Carlo approximation. In particular, multiple foreground components are built to capture the variances of the foreground object in shapes, poses, etc. To group the frames into different components, a tree-structured graphical model named temporal tree is designed, where visually similar and temporally coherent frames are arranged in branches. The temporal tree can be constructed by iteratively adding frames to the active nodes by probabilistic clustering. In addition, each component, consisting of frames in the same branch, is characterized by an SVM classifier, which is learned in a transductive fashion by jointly maximizing the margin over the labeled frames and the unlabeled frames. As the frames from the same video sequence follow the same distribution, the transductive classifiers achieve stronger generalization capability than inductive ones. Experimental results on the public benchmarks demonstrate the effectiveness of the proposed method in comparison with other state-of-the-art supervised and unsupervised video segmentation methods.





Citation: Botao Wang, Zhihui Fu, Hongkai Xiong, and Y. F. Zheng, "Transductive Video Segmentation on Tree-Structured Model", IEEE Transactions on Circuits and Systems for Video Technology (TCSVT), vol. 27, no. 5, pp. 992-1005, May 2017.

[PDF] [Bibtex] [IEEE Xplore]

Institute of Media, Information, and Network (MIN Lab)