Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tiancheng Zhi

Lynx: Towards High-Fidelity Personalized Video Generation

Sep 19, 2025

Shen Sang, Tiancheng Zhi, Tianpei Gu, Jing Liu, Linjie Luo

Abstract:We present Lynx, a high-fidelity model for personalized video synthesis from a single input image. Built on an open-source Diffusion Transformer (DiT) foundation model, Lynx introduces two lightweight adapters to ensure identity fidelity. The ID-adapter employs a Perceiver Resampler to convert ArcFace-derived facial embeddings into compact identity tokens for conditioning, while the Ref-adapter integrates dense VAE features from a frozen reference pathway, injecting fine-grained details across all transformer layers through cross-attention. These modules collectively enable robust identity preservation while maintaining temporal coherence and visual realism. Through evaluation on a curated benchmark of 40 subjects and 20 unbiased prompts, which yielded 800 test cases, Lynx has demonstrated superior face resemblance, competitive prompt following, and strong video quality, thereby advancing the state of personalized video generation.

* Lynx Technical Report

Via

Access Paper or Ask Questions

COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Nov 26, 2024

Jinqi Xiao, Shen Sang, Tiancheng Zhi, Jing Liu, Qing Yan, Linjie Luo, Bo Yuan

Figure 1 for COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Figure 2 for COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Figure 3 for COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Figure 4 for COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection

Abstract:Training large-scale neural networks in vision, and multimodal domains demands substantial memory resources, primarily due to the storage of optimizer states. While LoRA, a popular parameter-efficient method, reduces memory usage, it often suffers from suboptimal performance due to the constraints of low-rank updates. Low-rank gradient projection methods (e.g., GaLore, Flora) reduce optimizer memory by projecting gradients and moment estimates into low-rank spaces via singular value decomposition or random projection. However, they fail to account for inter-projection correlation, causing performance degradation, and their projection strategies often incur high computational costs. In this paper, we present COAP (Correlation-Aware Gradient Projection), a memory-efficient method that minimizes computational overhead while maintaining training performance. Evaluated across various vision, language, and multimodal tasks, COAP outperforms existing methods in both training speed and model performance. For LLaMA-1B, it reduces optimizer memory by 61% with only 2% additional time cost, achieving the same PPL as AdamW. With 8-bit quantization, COAP cuts optimizer memory by 81% and achieves 4x speedup over GaLore for LLaVA-v1.5-7B fine-tuning, while delivering higher accuracy.

Via

Access Paper or Ask Questions

ID-Patch: Robust ID Association for Group Photo Personalization

Nov 20, 2024

Yimeng Zhang, Tiancheng Zhi, Jing Liu, Shen Sang, Liming Jiang, Qing Yan, Sijia Liu, Linjie Luo

Figure 1 for ID-Patch: Robust ID Association for Group Photo Personalization

Figure 2 for ID-Patch: Robust ID Association for Group Photo Personalization

Figure 3 for ID-Patch: Robust ID Association for Group Photo Personalization

Figure 4 for ID-Patch: Robust ID Association for Group Photo Personalization

Abstract:The ability to synthesize personalized group photos and specify the positions of each identity offers immense creative potential. While such imagery can be visually appealing, it presents significant challenges for existing technologies. A persistent issue is identity (ID) leakage, where injected facial features interfere with one another, resulting in low face resemblance, incorrect positioning, and visual artifacts. Existing methods suffer from limitations such as the reliance on segmentation models, increased runtime, or a high probability of ID leakage. To address these challenges, we propose ID-Patch, a novel method that provides robust association between identities and 2D positions. Our approach generates an ID patch and ID embeddings from the same facial features: the ID patch is positioned on the conditional image for precise spatial control, while the ID embeddings integrate with text embeddings to ensure high resemblance. Experimental results demonstrate that ID-Patch surpasses baseline methods across metrics, such as face ID resemblance, ID-position association accuracy, and generation efficiency. Project Page is: https://byteaigc.github.io/ID-Patch/

* Project Page is: https://byteaigc.github.io/ID-Patch/

Via

Access Paper or Ask Questions

Learning Feature-Preserving Portrait Editing from Generated Pairs

Jul 29, 2024

Bowei Chen, Tiancheng Zhi, Peihao Zhu, Shen Sang, Jing Liu, Linjie Luo

Abstract:Portrait editing is challenging for existing techniques due to difficulties in preserving subject features like identity. In this paper, we propose a training-based method leveraging auto-generated paired data to learn desired editing while ensuring the preservation of unchanged subject features. Specifically, we design a data generation process to create reasonably good training pairs for desired editing at low cost. Based on these pairs, we introduce a Multi-Conditioned Diffusion Model to effectively learn the editing direction and preserve subject features. During inference, our model produces accurate editing mask that can guide the inference process to further preserve detailed subject features. Experiments on costume editing and cartoon expression editing show that our method achieves state-of-the-art quality, quantitatively and qualitatively.

Via

Access Paper or Ask Questions

FG-Net: Facial Action Unit Detection with Generalizable Pyramidal Features

Aug 23, 2023

Yufeng Yin, Di Chang, Guoxian Song, Shen Sang, Tiancheng Zhi, Jing Liu, Linjie Luo, Mohammad Soleymani

Figure 1 for FG-Net: Facial Action Unit Detection with Generalizable Pyramidal Features

Figure 2 for FG-Net: Facial Action Unit Detection with Generalizable Pyramidal Features

Figure 3 for FG-Net: Facial Action Unit Detection with Generalizable Pyramidal Features

Figure 4 for FG-Net: Facial Action Unit Detection with Generalizable Pyramidal Features

Abstract:Automatic detection of facial Action Units (AUs) allows for objective facial expression analysis. Due to the high cost of AU labeling and the limited size of existing benchmarks, previous AU detection methods tend to overfit the dataset, resulting in a significant performance loss when evaluated across corpora. To address this problem, we propose FG-Net for generalizable facial action unit detection. Specifically, FG-Net extracts feature maps from a StyleGAN2 model pre-trained on a large and diverse face image dataset. Then, these features are used to detect AUs with a Pyramid CNN Interpreter, making the training efficient and capturing essential local features. The proposed FG-Net achieves a strong generalization ability for heatmap-based AU detection thanks to the generalizable and semantic-rich features extracted from the pre-trained generative model. Extensive experiments are conducted to evaluate within- and cross-corpus AU detection with the widely-used DISFA and BP4D datasets. Compared with the state-of-the-art, the proposed method achieves superior cross-domain performance while maintaining competitive within-domain performance. In addition, FG-Net is data-efficient and achieves competitive performance even when trained on 1000 samples. Our code will be released at \url{https://github.com/ihp-lab/FG-Net}

Via

Access Paper or Ask Questions

AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer Learning

Mar 24, 2023

Guoxian Song, Hongyi Xu, Jing Liu, Tiancheng Zhi, Yichun Shi, Jianfeng Zhang, Zihang Jiang, Jiashi Feng, Shen Sang, Linjie Luo

Figure 1 for AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer Learning

Figure 2 for AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer Learning

Figure 3 for AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer Learning

Figure 4 for AgileGAN3D: Few-Shot 3D Portrait Stylization by Augmented Transfer Learning

Abstract:While substantial progresses have been made in automated 2D portrait stylization, admirable 3D portrait stylization from a single user photo remains to be an unresolved challenge. One primary obstacle here is the lack of high quality stylized 3D training data. In this paper, we propose a novel framework \emph{AgileGAN3D} that can produce 3D artistically appealing and personalized portraits with detailed geometry. New stylization can be obtained with just a few (around 20) unpaired 2D exemplars. We achieve this by first leveraging existing 2D stylization capabilities, \emph{style prior creation}, to produce a large amount of augmented 2D style exemplars. These augmented exemplars are generated with accurate camera pose labels, as well as paired real face images, which prove to be critical for the downstream 3D stylization task. Capitalizing on the recent advancement of 3D-aware GAN models, we perform \emph{guided transfer learning} on a pretrained 3D GAN generator to produce multi-view-consistent stylized renderings. In order to achieve 3D GAN inversion that can preserve subject's identity well, we incorporate \emph{multi-view consistency loss} in the training of our encoder. Our pipeline demonstrates strong capability in turning user photos into a diverse range of 3D artistic portraits. Both qualitative results and quantitative evaluations have been conducted to show the superior performance of our method. Code and pretrained models will be released for reproduction purpose.

Via

Access Paper or Ask Questions

AgileAvatar: Stylized 3D Avatar Creation via Cascaded Domain Bridging

Nov 15, 2022

Shen Sang, Tiancheng Zhi, Guoxian Song, Minghao Liu, Chunpong Lai, Jing Liu, Xiang Wen, James Davis, Linjie Luo

Abstract:Stylized 3D avatars have become increasingly prominent in our modern life. Creating these avatars manually usually involves laborious selection and adjustment of continuous and discrete parameters and is time-consuming for average users. Self-supervised approaches to automatically create 3D avatars from user selfies promise high quality with little annotation cost but fall short in application to stylized avatars due to a large style domain gap. We propose a novel self-supervised learning framework to create high-quality stylized 3D avatars with a mix of continuous and discrete parameters. Our cascaded domain bridging framework first leverages a modified portrait stylization approach to translate input selfies into stylized avatar renderings as the targets for desired 3D avatars. Next, we find the best parameters of the avatars to match the stylized avatar renderings through a differentiable imitator we train to mimic the avatar graphics engine. To ensure we can effectively optimize the discrete parameters, we adopt a cascaded relaxation-and-search pipeline. We use a human preference study to evaluate how well our method preserves user identity compared to previous work as well as manual creation. Our results achieve much higher preference scores than previous work and close to those of manual creation. We also provide an ablation study to justify the design choices in our pipeline.

* ACM SIGGRAPH Asia 2022 Conference Proceedings

Via

Access Paper or Ask Questions

Learning Continuous Implicit Representation for Near-Periodic Patterns

Aug 25, 2022

Bowei Chen, Tiancheng Zhi, Martial Hebert, Srinivasa G. Narasimhan

Figure 1 for Learning Continuous Implicit Representation for Near-Periodic Patterns

Figure 2 for Learning Continuous Implicit Representation for Near-Periodic Patterns

Abstract:Near-Periodic Patterns (NPP) are ubiquitous in man-made scenes and are composed of tiled motifs with appearance differences caused by lighting, defects, or design elements. A good NPP representation is useful for many applications including image completion, segmentation, and geometric remapping. But representing NPP is challenging because it needs to maintain global consistency (tiled motifs layout) while preserving local variations (appearance differences). Methods trained on general scenes using a large dataset or single-image optimization struggle to satisfy these constraints, while methods that explicitly model periodicity are not robust to periodicity detection errors. To address these challenges, we learn a neural implicit representation using a coordinate-based MLP with single image optimization. We design an input feature warping module and a periodicity-guided patch loss to handle both global consistency and local variations. To further improve the robustness, we introduce a periodicity proposal module to search and use multiple candidate periodicities in our pipeline. We demonstrate the effectiveness of our method on more than 500 images of building facades, friezes, wallpapers, ground, and Mondrian patterns on single and multi-planar scenes.

* ECCV 2022. Project page: https://armastuschen.github.io/projects/NPP_Net/

Via

Access Paper or Ask Questions

TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video

Aug 29, 2020

Tiancheng Zhi, Christoph Lassner, Tony Tung, Carsten Stoll, Srinivasa G. Narasimhan, Minh Vo

Figure 1 for TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video

Figure 2 for TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video

Figure 3 for TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video

Figure 4 for TexMesh: Reconstructing Detailed Human Texture and Geometry from RGB-D Video

Abstract:We present TexMesh, a novel approach to reconstruct detailed human meshes with high-resolution full-body texture from RGB-D video. TexMesh enables high quality free-viewpoint rendering of humans. Given the RGB frames, the captured environment map, and the coarse per-frame human mesh from RGB-D tracking, our method reconstructs spatiotemporally consistent and detailed per-frame meshes along with a high-resolution albedo texture. By using the incident illumination we are able to accurately estimate local surface geometry and albedo, which allows us to further use photometric constraints to adapt a synthetically trained model to real-world sequences in a self-supervised manner for detailed surface geometry and high-resolution texture estimation. In practice, we train our models on a short example sequence for self-adaptation and the model runs at interactive framerate afterwards. We validate TexMesh on synthetic and real-world data, and show it outperforms the state of art quantitatively and qualitatively.

* ECCV 2020

Via

Access Paper or Ask Questions