Partition-Aware Adaptive Switching Neural Networks for Post-Processing in HEVC
Weiyao Lin, Xiaoyi He, Xintong Han, Dong Liu, John See, Junni Zou, Hongkai Xiong, and Feng Wu
Department of Electronic Engineering,
Shanghai Jiao Tong University, China
wylin@sjtu.edu.cn
This paper addresses neural network based post-processing for the state-of-the-art video coding standard}, High Efficiency Video Coding (HEVC). We first propose a partition-aware Convolution Neural Network (CNN) that utilizes the partition information produced by the encoder to assist in the post-processing. In contrast to existing CNN-based approaches, which only take the decoded frame as input, the proposed approach considers the coding unit (CU) size information and combines it with the distorted decoded frame such that the artifacts introduced by HEVC are efficiently reduced. We further introduce an adaptive-switching neural network (ASN) that consists of multiple independent CNNs to adaptively handle the variations in content and distortion within compressed-video frames, providing further reduction in visual artifacts. Additionally, an iterative training procedure is proposed to train these independent CNNs attentively on different local patch-wise classes. Experiments on benchmark sequences demonstrate the effectiveness of our partition-aware and adaptive-switching neural networks
The frameworks of the proposed partition-aware CNN and adaptive-switching scheme are shown in this figure. Our partition-aware CNN is shown in the left of this figure. For each patch in a decoded frame, we obtain its corresponding mask generated by the patch’s partition information, and fed this information together with the patch into a partition-aware CNN. Inside this CNN, the features of the mask and decoded patch are first extracted through two individual streams and then fused into one. The rest layers of the partition-aware CNN perform the feature enhancement, mapping, reconstruction, and output the post-processed patch with reduced artifacts. As for our adaptive-switching scheme, each patch is post-processed by a bank of trained CNNs in the encoder side. These CNNs consist of three local CNNs (CNN 0(L), CNN 1(L), CNN 2(L)) and one global CNN (CNN 3(G)) . Then the CNN is chosen such that the difference between the post-processed patch and its original patch is smallest across all CNNs. This amounts to greedily choosing the CNN that generated the most similar one to original frame patch in terms of PSNR among all CNNs. The indices of chosen CNNs are directly written into bitstream after binarization.
For the evaluation, we tested our trained model on 20 benchmark sequences (not included in our training set) from the Common Test Conditions of HEVC under the same configuration as with training, Low-delay P [31]. Performance is measured by PSNR improvement and the Rate-distortion performance measured by the Bjontegaard Distortion-rate savings (BD-rate savings, calculated at QP=22, 27, 32, 37) over the standard HEVC test model HM-16.0 (i.e., HM-16.0 baseline). Basically, a larger PSNR improvement or a larger BD-rate saving indicate that more visual artifacts are reduced.
Table Ⅰ: Comparison of different methods on BD-rate saving over HM-16.0 baseline under different configurations.
@article{lin2020partition, title={Partition-Aware Adaptive Switching Neural Networks for Post-Processing in HEVC}, author={Lin, Weiyao and He, Xiaoyi and Han, Xintong and Liu, Dong and John, See and Zou, Junni and Xiong, Hongkai and Wu, Feng}, journal={IEEE Transaction on Multimedia}, doi={IEEE Trans. Multimedia, vol. 22, no. 11, pp. 2749-2763, 2020.}, year={2020}, organization={IEEE} }
Institute of Media, Information, and Network (MIN Lab)
沪交ICP备20160059