Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xianfeng Gu

OT-Talk: Animating 3D Talking Head with Optimal Transportation

May 03, 2025

Xinmu Wang, Xiang Gao, Xiyun Song, Heather Yu, Zongfang Lin, Liang Peng, Xianfeng Gu

Abstract:Animating 3D head meshes using audio inputs has significant applications in AR/VR, gaming, and entertainment through 3D avatars. However, bridging the modality gap between speech signals and facial dynamics remains a challenge, often resulting in incorrect lip syncing and unnatural facial movements. To address this, we propose OT-Talk, the first approach to leverage optimal transportation to optimize the learning model in talking head animation. Building on existing learning frameworks, we utilize a pre-trained Hubert model to extract audio features and a transformer model to process temporal sequences. Unlike previous methods that focus solely on vertex coordinates or displacements, we introduce Chebyshev Graph Convolution to extract geometric features from triangulated meshes. To measure mesh dissimilarities, we go beyond traditional mesh reconstruction errors and velocity differences between adjacent frames. Instead, we represent meshes as probability measures and approximate their surfaces. This allows us to leverage the sliced Wasserstein distance for modeling mesh variations. This approach facilitates the learning of smooth and accurate facial motions, resulting in coherent and natural facial animations. Our experiments on two public audio-mesh datasets demonstrate that our method outperforms state-of-the-art techniques both quantitatively and qualitatively in terms of mesh reconstruction accuracy and temporal alignment. In addition, we conducted a user perception study with 20 volunteers to further assess the effectiveness of our approach.

Via

Access Paper or Ask Questions

HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Jul 19, 2024

Zezeng Li, Weimin Wang, WenHai Li, Na Lei, Xianfeng Gu

Figure 1 for HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Figure 2 for HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Figure 3 for HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Figure 4 for HOTS3D: Hyper-Spherical Optimal Transport for Semantic Alignment of Text-to-3D Generation

Abstract:Recent CLIP-guided 3D generation methods have achieved promising results but struggle with generating faithful 3D shapes that conform with input text due to the gap between text and image embeddings. To this end, this paper proposes HOTS3D which makes the first attempt to effectively bridge this gap by aligning text features to the image features with spherical optimal transport (SOT). However, in high-dimensional situations, solving the SOT remains a challenge. To obtain the SOT map for high-dimensional features obtained from CLIP encoding of two modalities, we mathematically formulate and derive the solution based on Villani's theorem, which can directly align two hyper-sphere distributions without manifold exponential maps. Furthermore, we implement it by leveraging input convex neural networks (ICNNs) for the optimal Kantorovich potential. With the optimally mapped features, a diffusion-based generator and a Nerf-based decoder are subsequently utilized to transform them into 3D shapes. Extensive qualitative and qualitative comparisons with state-of-the-arts demonstrate the superiority of the proposed HOTS3D for 3D shape generation, especially on the consistency with text semantics.

Via

Access Paper or Ask Questions

Global Parameterization-based Texture Space Optimization

Jun 06, 2024

Wei Chen, Yuxue Ren, Na Lei, Zhongxuan Luo, Xianfeng Gu

Figure 1 for Global Parameterization-based Texture Space Optimization

Figure 2 for Global Parameterization-based Texture Space Optimization

Figure 3 for Global Parameterization-based Texture Space Optimization

Figure 4 for Global Parameterization-based Texture Space Optimization

Abstract:Texture mapping is a common technology in the area of computer graphics, it maps the 3D surface space onto the 2D texture space. However, the loose texture space will reduce the efficiency of data storage and GPU memory addressing in the rendering process. Many of the existing methods focus on repacking given textures, but they still suffer from high computational cost and hardly produce a wholly tight texture space. In this paper, we propose a method to optimize the texture space and produce a new texture mapping which is compact based on global parameterization. The proposed method is computationally robust and efficient. Experiments show the effectiveness of the proposed method and the potency in improving the storage and rendering efficiency.

* Preprint submitted to Comput. Math. Math. Phys

Via

Access Paper or Ask Questions

Backdoor Attack with Mode Mixture Latent Modification

Mar 12, 2024

Hongwei Zhang, Xiaoyin Xu, Dongsheng An, Xianfeng Gu, Min Zhang

Abstract:Backdoor attacks become a significant security concern for deep neural networks in recent years. An image classification model can be compromised if malicious backdoors are injected into it. This corruption will cause the model to function normally on clean images but predict a specific target label when triggers are present. Previous research can be categorized into two genres: poisoning a portion of the dataset with triggered images for users to train the model from scratch, or training a backdoored model alongside a triggered image generator. Both approaches require significant amount of attackable parameters for optimization to establish a connection between the trigger and the target label, which may raise suspicions as more people become aware of the existence of backdoor attacks. In this paper, we propose a backdoor attack paradigm that only requires minimal alterations (specifically, the output layer) to a clean model in order to inject the backdoor under the guise of fine-tuning. To achieve this, we leverage mode mixture samples, which are located between different modes in latent space, and introduce a novel method for conducting backdoor attacks. We evaluate the effectiveness of our method on four popular benchmark datasets: MNIST, CIFAR-10, GTSRB, and TinyImageNet.

Via

Access Paper or Ask Questions

DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport

Jul 21, 2023

Zezeng Li, ShengHao Li, Zhanpeng Wang, Na Lei, Zhongxuan Luo, Xianfeng Gu

Figure 1 for DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport

Figure 2 for DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport

Figure 3 for DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport

Figure 4 for DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport

Abstract:Sampling from diffusion probabilistic models (DPMs) can be viewed as a piecewise distribution transformation, which generally requires hundreds or thousands of steps of the inverse diffusion trajectory to get a high-quality image. Recent progress in designing fast samplers for DPMs achieves a trade-off between sampling speed and sample quality by knowledge distillation or adjusting the variance schedule or the denoising equation. However, it can't be optimal in both aspects and often suffer from mode mixture in short steps. To tackle this problem, we innovatively regard inverse diffusion as an optimal transport (OT) problem between latents at different stages and propose the DPM-OT, a unified learning framework for fast DPMs with a direct expressway represented by OT map, which can generate high-quality samples within around 10 function evaluations. By calculating the semi-discrete optimal transport map between the data latents and the white noise, we obtain an expressway from the prior distribution to the data distribution, while significantly alleviating the problem of mode mixture. In addition, we give the error bound of the proposed method, which theoretically guarantees the stability of the algorithm. Extensive experiments validate the effectiveness and advantages of DPM-OT in terms of speed and quality (FID and mode mixture), thus representing an efficient solution for generative modeling. Source codes are available at https://github.com/cognaclee/DPM-OT

* iccv2023 accepted

Via

Access Paper or Ask Questions

Large language models improve Alzheimer's disease diagnosis using multi-modality data

May 26, 2023

Yingjie Feng, Jun Wang, Xianfeng Gu, Xiaoyin Xu, Min Zhang

Abstract:In diagnosing challenging conditions such as Alzheimer's disease (AD), imaging is an important reference. Non-imaging patient data such as patient information, genetic data, medication information, cognitive and memory tests also play a very important role in diagnosis. Effect. However, limited by the ability of artificial intelligence models to mine such information, most of the existing models only use multi-modal image data, and cannot make full use of non-image data. We use a currently very popular pre-trained large language model (LLM) to enhance the model's ability to utilize non-image data, and achieved SOTA results on the ADNI dataset.

Via

Access Paper or Ask Questions

What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives

Nov 11, 2022

Zezeng Li, Zebin Xu, Ying Li, Xianfeng Gu, Na Lei

Figure 1 for What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives

Figure 2 for What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives

Figure 3 for What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives

Figure 4 for What's the Situation with Intelligent Mesh Generation: A Survey and Perspectives

Abstract:Intelligent mesh generation (IMG) refers to a technique to generate mesh by machine learning, which is a relatively new and promising research field. Within its short life span, IMG has greatly expanded the generalizability and practicality of mesh generation techniques and brought many breakthroughs and potential possibilities for mesh generation. However, there is a lack of surveys focusing on IMG methods covering recent works. In this paper, we are committed to a systematic and comprehensive survey describing the contemporary IMG landscape. Focusing on 110 preliminary IMG methods, we conducted an in-depth analysis and evaluation from multiple perspectives, including the core technique and application scope of the algorithm, agent learning goals, data types, targeting challenges, advantages and limitations. With the aim of literature collection and classification based on content extraction, we propose three different taxonomies from three views of key technique, output mesh unit element, and applicable input data types. Finally, we highlight some promising future research directions and challenges in IMG. To maximize the convenience of readers, a project page of IMG is provided at \url{https://github.com/xzb030/IMG_Survey}.

Via

Access Paper or Ask Questions

Efficient Optimal Transport Algorithm by Accelerated Gradient descent

Apr 12, 2021

Dongsheng An, Na Lei, Xianfeng Gu

Figure 1 for Efficient Optimal Transport Algorithm by Accelerated Gradient descent

Figure 2 for Efficient Optimal Transport Algorithm by Accelerated Gradient descent

Figure 3 for Efficient Optimal Transport Algorithm by Accelerated Gradient descent

Figure 4 for Efficient Optimal Transport Algorithm by Accelerated Gradient descent

Abstract:Optimal transport (OT) plays an essential role in various areas like machine learning and deep learning. However, computing discrete optimal transport plan for large scale problems with adequate accuracy and efficiency is still highly challenging. Recently, methods based on the Sinkhorn algorithm add an entropy regularizer to the prime problem and get a trade off between efficiency and accuracy. In this paper, we propose a novel algorithm to further improve the efficiency and accuracy based on Nesterov's smoothing technique. Basically, the non-smooth c-transform of the Kantorovich potential is approximated by the smooth Log-Sum-Exp function, which finally smooths the original non-smooth Kantorovich dual functional (energy). The smooth Kantorovich functional can be optimized by the fast proximal gradient algorithm (FISTA) efficiently. Theoretically, the computational complexity of the proposed method is given by $O(n^{\frac{5}{2}} \sqrt{\log n} /\epsilon)$, which is lower than that of the Sinkhorn algorithm. Empirically, compared with the Sinkhorn algorithm, our experimental results demonstrate that the proposed method achieves faster convergence and better accuracy with the same parameter.

Via

Access Paper or Ask Questions

AE-OT-GAN: Training GANs from data specific latent distribution

Jan 27, 2020

Dongsheng An, Yang Guo, Min Zhang, Xin Qi, Na Lei, Shing-Tung Yau, Xianfeng Gu

Figure 1 for AE-OT-GAN: Training GANs from data specific latent distribution

Figure 2 for AE-OT-GAN: Training GANs from data specific latent distribution

Figure 3 for AE-OT-GAN: Training GANs from data specific latent distribution

Figure 4 for AE-OT-GAN: Training GANs from data specific latent distribution

Abstract:Though generative adversarial networks (GANs) areprominent models to generate realistic and crisp images,they often encounter the mode collapse problems and arehard to train, which comes from approximating the intrinsicdiscontinuous distribution transform map with continuousDNNs. The recently proposed AE-OT model addresses thisproblem by explicitly computing the discontinuous distribu-tion transform map through solving a semi-discrete optimaltransport (OT) map in the latent space of the autoencoder.However the generated images are blurry. In this paper, wepropose the AE-OT-GAN model to utilize the advantages ofthe both models: generate high quality images and at thesame time overcome the mode collapse/mixture problems.Specifically, we first faithfully embed the low dimensionalimage manifold into the latent space by training an autoen-coder (AE). Then we compute the optimal transport (OT)map that pushes forward the uniform distribution to the la-tent distribution supported on the latent manifold. Finally,our GAN model is trained to generate high quality imagesfrom the latent distribution, the distribution transform mapfrom which to the empirical data distribution will be con-tinuous. The paired data between the latent code and thereal images gives us further constriction about the generator.Experiments on simple MNIST dataset and complex datasetslike Cifar-10 and CelebA show the efficacy and efficiency ofour proposed method.

Via

Access Paper or Ask Questions

Mode Collapse and Regularity of Optimal Transportation Maps

Feb 08, 2019

Na lei, Yang Guo, Dongsheng An, Xin Qi, Zhongxuan Luo, Shing-Tung Yau, Xianfeng Gu

Figure 1 for Mode Collapse and Regularity of Optimal Transportation Maps

Figure 2 for Mode Collapse and Regularity of Optimal Transportation Maps

Figure 3 for Mode Collapse and Regularity of Optimal Transportation Maps

Figure 4 for Mode Collapse and Regularity of Optimal Transportation Maps

Abstract:This work builds the connection between the regularity theory of optimal transportation map, Monge-Amp\`{e}re equation and GANs, which gives a theoretic understanding of the major drawbacks of GANs: convergence difficulty and mode collapse. According to the regularity theory of Monge-Amp\`{e}re equation, if the support of the target measure is disconnected or just non-convex, the optimal transportation mapping is discontinuous. General DNNs can only approximate continuous mappings. This intrinsic conflict leads to the convergence difficulty and mode collapse in GANs. We test our hypothesis that the supports of real data distribution are in general non-convex, therefore the discontinuity is unavoidable using an Autoencoder combined with discrete optimal transportation map (AE-OT framework) on the CelebA data set. The testing result is positive. Furthermore, we propose to approximate the continuous Brenier potential directly based on discrete Brenier theory to tackle mode collapse. Comparing with existing method, this method is more accurate and effective.

Via

Access Paper or Ask Questions