Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hanyue Tu

Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression

Mar 27, 2025

Hanyue Tu, Siqi Wu, Li Li, Wengang Zhou, Houqiang Li

Abstract:Autoencoder-based structures have dominated recent learned image compression methods. However, the inherent information loss associated with autoencoders limits their rate-distortion performance at high bit rates and restricts their flexibility of rate adaptation. In this paper, we present a variable-rate image compression model based on invertible transform to overcome these limitations. Specifically, we design a lightweight multi-scale invertible neural network, which bijectively maps the input image into multi-scale latent representations. To improve the compression efficiency, a multi-scale spatial-channel context model with extended gain units is devised to estimate the entropy of the latent representation from high to low levels. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods, and remains competitive with recent multi-model approaches. Notably, our method is the first learned image compression solution that outperforms VVC across a very wide range of bit rates using a single model, especially at high bit rates.The source code is available at \href{https://github.com/hytu99/MSINN-VRLIC}{https://github.com/hytu99/MSINN-VRLIC}.

* Accepted to IEEE Transactions on Multimedia 2025

Via

Access Paper or Ask Questions

End-to-End Estimation of Multi-Person 3D Poses from Multiple Cameras

Apr 13, 2020

Hanyue Tu, Chunyu Wang, Wenjun Zeng

Figure 1 for End-to-End Estimation of Multi-Person 3D Poses from Multiple Cameras

Figure 2 for End-to-End Estimation of Multi-Person 3D Poses from Multiple Cameras

Figure 3 for End-to-End Estimation of Multi-Person 3D Poses from Multiple Cameras

Figure 4 for End-to-End Estimation of Multi-Person 3D Poses from Multiple Cameras

Abstract:We present an approach to estimate 3D poses of multiple people from multiple camera views. In contrast to the previous efforts which require to establish cross-view correspondence based on noisy and incomplete 2D pose estimations, we present an end-to-end solution which directly operates in the $3$D space, therefore avoids making incorrect decisions in the 2D space. To achieve this goal, the features in all camera views are warped and aggregated in a common 3D space, and fed into Cuboid Proposal Network (CPN) to coarsely localize all people. Then we propose Pose Regression Network (PRN) to estimate a detailed 3D pose for each proposal. The approach is robust to occlusion which occurs frequently in practice. Without bells and whistles, it outperforms the state-of-the-arts on the public datasets. Code will be released at https://github.com/microsoft/multiperson-pose-estimation-pytorch.

Via

Access Paper or Ask Questions