Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chenjian Gao

LoRA-Edit: Controllable First-Frame-Guided Video Editing via Mask-Aware LoRA Fine-Tuning

Jun 11, 2025

Chenjian Gao, Lihe Ding, Xin Cai, Zhanpeng Huang, Zibin Wang, Tianfan Xue

Abstract:Video editing using diffusion models has achieved remarkable results in generating high-quality edits for videos. However, current methods often rely on large-scale pretraining, limiting flexibility for specific edits. First-frame-guided editing provides control over the first frame, but lacks flexibility over subsequent frames. To address this, we propose a mask-based LoRA (Low-Rank Adaptation) tuning method that adapts pretrained Image-to-Video (I2V) models for flexible video editing. Our approach preserves background regions while enabling controllable edits propagation. This solution offers efficient and adaptable video editing without altering the model architecture. To better steer this process, we incorporate additional references, such as alternate viewpoints or representative scene states, which serve as visual anchors for how content should unfold. We address the control challenge using a mask-driven LoRA tuning strategy that adapts a pre-trained image-to-video model to the editing context. The model must learn from two distinct sources: the input video provides spatial structure and motion cues, while reference images offer appearance guidance. A spatial mask enables region-specific learning by dynamically modulating what the model attends to, ensuring that each area draws from the appropriate source. Experimental results show our method achieves superior video editing performance compared to state-of-the-art methods.

* 12 pages

Via

Access Paper or Ask Questions

GenesisTex: Adapting Image Denoising Diffusion to Texture Space

Mar 26, 2024

Chenjian Gao, Boyan Jiang, Xinghui Li, Yingpeng Zhang, Qian Yu

Figure 1 for GenesisTex: Adapting Image Denoising Diffusion to Texture Space

Figure 2 for GenesisTex: Adapting Image Denoising Diffusion to Texture Space

Figure 3 for GenesisTex: Adapting Image Denoising Diffusion to Texture Space

Figure 4 for GenesisTex: Adapting Image Denoising Diffusion to Texture Space

Abstract:We present GenesisTex, a novel method for synthesizing textures for 3D geometries from text descriptions. GenesisTex adapts the pretrained image diffusion model to texture space by texture space sampling. Specifically, we maintain a latent texture map for each viewpoint, which is updated with predicted noise on the rendering of the corresponding viewpoint. The sampled latent texture maps are then decoded into a final texture map. During the sampling process, we focus on both global and local consistency across multiple viewpoints: global consistency is achieved through the integration of style consistency mechanisms within the noise prediction network, and low-level consistency is achieved by dynamically aligning latent textures. Finally, we apply reference-based inpainting and img2img on denser views for texture refinement. Our approach overcomes the limitations of slow optimization in distillation-based methods and instability in inpainting-based methods. Experiments on meshes from various sources demonstrate that our method surpasses the baseline methods quantitatively and qualitatively.

* 12 pages, 10 figures

Via

Access Paper or Ask Questions

Multi-Sample Training for Neural Image Compression

Sep 28, 2022

Tongda Xu, Yan Wang, Dailan He, Chenjian Gao, Han Gao, Kunzan Liu, Hongwei Qin

Figure 1 for Multi-Sample Training for Neural Image Compression

Figure 2 for Multi-Sample Training for Neural Image Compression

Figure 3 for Multi-Sample Training for Neural Image Compression

Figure 4 for Multi-Sample Training for Neural Image Compression

Abstract:This paper considers the problem of lossy neural image compression (NIC). Current state-of-the-art (sota) methods adopt uniform posterior to approximate quantization noise, and single-sample pathwise estimator to approximate the gradient of evidence lower bound (ELBO). In this paper, we propose to train NIC with multiple-sample importance weighted autoencoder (IWAE) target, which is tighter than ELBO and converges to log likelihood as sample size increases. First, we identify that the uniform posterior of NIC has special properties, which affect the variance and bias of pathwise and score function estimators of the IWAE target. Moreover, we provide insights on a commonly adopted trick in NIC from gradient variance perspective. Based on those analysis, we further propose multiple-sample NIC (MS-NIC), an enhanced IWAE target for NIC. Experimental results demonstrate that it improves sota NIC methods. Our MS-NIC is plug-and-play, and can be easily extended to other neural compression tasks.

* NeurIPS 2022

Via

Access Paper or Ask Questions

Bit Allocation using Optimization

Sep 20, 2022

Tongda Xu, Han Gao, Chenjian Gao, Jinyong Pi, Yanghao Li, Yuanyuan Wang, Ziyu Zhu, Dailan He, Mao Ye, Hongwei Qin(+1 more)

Figure 1 for Bit Allocation using Optimization

Figure 2 for Bit Allocation using Optimization

Figure 3 for Bit Allocation using Optimization

Figure 4 for Bit Allocation using Optimization

Abstract:In this paper, we consider the problem of bit allocation in neural video compression (NVC). Due to the frame reference structure, current NVC methods using the same R-D (Rate-Distortion) trade-off parameter $\lambda$ for all frames are suboptimal, which brings the need for bit allocation. Unlike previous methods based on heuristic and empirical R-D models, we propose to solve this problem by gradient-based optimization. Specifically, we first propose a continuous bit implementation method based on Semi-Amortized Variational Inference (SAVI). Then, we propose a pixel-level implicit bit allocation method using iterative optimization by changing the SAVI target. Moreover, we derive the precise R-D model based on the differentiable trait of NVC. And we show the optimality of our method by proofing its equivalence to the bit allocation with precise R-D model. Experimental results show that our approach significantly improves NVC methods and outperforms existing bit allocation methods. Our approach is plug-and-play for all differentiable NVC methods, and it can be directly adopted on existing pre-trained models.

Via

Access Paper or Ask Questions

Flexible Neural Image Compression via Code Editing

Sep 19, 2022

Chenjian Gao, Tongda Xu, Dailan He, Hongwei Qin, Yan Wang

Figure 1 for Flexible Neural Image Compression via Code Editing

Figure 2 for Flexible Neural Image Compression via Code Editing

Figure 3 for Flexible Neural Image Compression via Code Editing

Figure 4 for Flexible Neural Image Compression via Code Editing

Abstract:Neural image compression (NIC) has outperformed traditional image codecs in rate-distortion (R-D) performance. However, it usually requires a dedicated encoder-decoder pair for each point on R-D curve, which greatly hinders its practical deployment. While some recent works have enabled bitrate control via conditional coding, they impose strong prior during training and provide limited flexibility. In this paper we propose Code Editing, a highly flexible coding method for NIC based on semi-amortized inference and adaptive quantization. Our work is a new paradigm for variable bitrate NIC. Furthermore, experimental results show that our method surpasses existing variable-rate methods, and achieves ROI coding and multi-distortion trade-off with a single decoder.

* NeurIPS 2022

Via

Access Paper or Ask Questions

SketchSampler: Sketch-based 3D Reconstruction via View-dependent Depth Sampling

Aug 14, 2022

Chenjian Gao, Qian Yu, Lu Sheng, Yi-Zhe Song, Dong Xu

Figure 1 for SketchSampler: Sketch-based 3D Reconstruction via View-dependent Depth Sampling

Figure 2 for SketchSampler: Sketch-based 3D Reconstruction via View-dependent Depth Sampling

Figure 3 for SketchSampler: Sketch-based 3D Reconstruction via View-dependent Depth Sampling

Figure 4 for SketchSampler: Sketch-based 3D Reconstruction via View-dependent Depth Sampling

Abstract:Reconstructing a 3D shape based on a single sketch image is challenging due to the large domain gap between a sparse, irregular sketch and a regular, dense 3D shape. Existing works try to employ the global feature extracted from sketch to directly predict the 3D coordinates, but they usually suffer from losing fine details that are not faithful to the input sketch. Through analyzing the 3D-to-2D projection process, we notice that the density map that characterizes the distribution of 2D point clouds (i.e., the probability of points projected at each location of the projection plane) can be used as a proxy to facilitate the reconstruction process. To this end, we first translate a sketch via an image translation network to a more informative 2D representation that can be used to generate a density map. Next, a 3D point cloud is reconstructed via a two-stage probabilistic sampling process: first recovering the 2D points (i.e., the x and y coordinates) by sampling the density map; and then predicting the depth (i.e., the z coordinate) by sampling the depth values at the ray determined by each 2D point. Extensive experiments are conducted, and both quantitative and qualitative results show that our proposed approach significantly outperforms other baseline methods.

* 16 pages, 7 figures, conference

Via

Access Paper or Ask Questions

PO-ELIC: Perception-Oriented Efficient Learned Image Coding

May 28, 2022

Dailan He, Ziming Yang, Hongjiu Yu, Tongda Xu, Jixiang Luo, Yuan Chen, Chenjian Gao, Xinjie Shi, Hongwei Qin, Yan Wang

Figure 1 for PO-ELIC: Perception-Oriented Efficient Learned Image Coding

Figure 2 for PO-ELIC: Perception-Oriented Efficient Learned Image Coding

Figure 3 for PO-ELIC: Perception-Oriented Efficient Learned Image Coding

Figure 4 for PO-ELIC: Perception-Oriented Efficient Learned Image Coding

Abstract:In the past years, learned image compression (LIC) has achieved remarkable performance. The recent LIC methods outperform VVC in both PSNR and MS-SSIM. However, the low bit-rate reconstructions of LIC suffer from artifacts such as blurring, color drifting and texture missing. Moreover, those varied artifacts make image quality metrics correlate badly with human perceptual quality. In this paper, we propose PO-ELIC, i.e., Perception-Oriented Efficient Learned Image Coding. To be specific, we adapt ELIC, one of the state-of-the-art LIC models, with adversarial training techniques. We apply a mixture of losses including hinge-form adversarial loss, Charbonnier loss, and style loss, to finetune the model towards better perceptual quality. Experimental results demonstrate that our method achieves comparable perceptual quality with HiFiC with much lower bitrate.

* CVPR2022 Workshop, 5-th CLIC Image Compression Track

Via

Access Paper or Ask Questions