Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jacek Naruniec

FastGHA: Generalized Few-Shot 3D Gaussian Head Avatars with Real-Time Animation

Jan 20, 2026

Xinya Ji, Sebastian Weiss, Manuel Kansy, Jacek Naruniec, Xun Cao, Barbara Solenthaler, Derek Bradley

Abstract:Despite recent progress in 3D Gaussian-based head avatar modeling, efficiently generating high fidelity avatars remains a challenge. Current methods typically rely on extensive multi-view capture setups or monocular videos with per-identity optimization during inference, limiting their scalability and ease of use on unseen subjects. To overcome these efficiency drawbacks, we propose \OURS, a feed-forward method to generate high-quality Gaussian head avatars from only a few input images while supporting real-time animation. Our approach directly learns a per-pixel Gaussian representation from the input images, and aggregates multi-view information using a transformer-based encoder that fuses image features from both DINOv3 and Stable Diffusion VAE. For real-time animation, we extend the explicit Gaussian representations with per-Gaussian features and introduce a lightweight MLP-based dynamic network to predict 3D Gaussian deformations from expression codes. Furthermore, to enhance geometric smoothness of the 3D head, we employ point maps from a pre-trained large reconstruction model as geometry supervision. Experiments show that our approach significantly outperforms existing methods in both rendering quality and inference efficiency, while supporting real-time dynamic avatar animation.

Via

Access Paper or Ask Questions

Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion

Aug 01, 2024

Manuel Kansy, Jacek Naruniec, Christopher Schroers, Markus Gross, Romann M. Weber

Figure 1 for Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion

Figure 2 for Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion

Figure 3 for Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion

Figure 4 for Reenact Anything: Semantic Video Motion Transfer Using Motion-Textual Inversion

Abstract:Recent years have seen a tremendous improvement in the quality of video generation and editing approaches. While several techniques focus on editing appearance, few address motion. Current approaches using text, trajectories, or bounding boxes are limited to simple motions, so we specify motions with a single motion reference video instead. We further propose to use a pre-trained image-to-video model rather than a text-to-video model. This approach allows us to preserve the exact appearance and position of a target object or scene and helps disentangle appearance from motion. Our method, called motion-textual inversion, leverages our observation that image-to-video models extract appearance mainly from the (latent) image input, while the text/image embedding injected via cross-attention predominantly controls motion. We thus represent motion using text/image embedding tokens. By operating on an inflated motion-text embedding containing multiple text/image embedding tokens per frame, we achieve a high temporal motion granularity. Once optimized on the motion reference video, this embedding can be applied to various target images to generate videos with semantically similar motions. Our approach does not require spatial alignment between the motion reference video and target image, generalizes across various domains, and can be applied to various tasks such as full-body and face reenactment, as well as controlling the motion of inanimate objects and the camera. We empirically demonstrate the effectiveness of our method in the semantic video motion transfer task, significantly outperforming existing methods in this context.

* Preprint. All videos in this paper are best viewed as animations with Acrobat Reader by pressing the highlighted frame of each video

Via

Access Paper or Ask Questions

Controllable Inversion of Black-Box Face-Recognition Models via Diffusion

Mar 23, 2023

Manuel Kansy, Anton Raël, Graziana Mignone, Jacek Naruniec, Christopher Schroers, Markus Gross, Romann M. Weber

Figure 1 for Controllable Inversion of Black-Box Face-Recognition Models via Diffusion

Figure 2 for Controllable Inversion of Black-Box Face-Recognition Models via Diffusion

Figure 3 for Controllable Inversion of Black-Box Face-Recognition Models via Diffusion

Figure 4 for Controllable Inversion of Black-Box Face-Recognition Models via Diffusion

Abstract:Face recognition models embed a face image into a low-dimensional identity vector containing abstract encodings of identity-specific facial features that allow individuals to be distinguished from one another. We tackle the challenging task of inverting the latent space of pre-trained face recognition models without full model access (i.e. black-box setting). A variety of methods have been proposed in literature for this task, but they have serious shortcomings such as a lack of realistic outputs, long inference times, and strong requirements for the data set and accessibility of the face recognition model. Through an analysis of the black-box inversion problem, we show that the conditional diffusion model loss naturally emerges and that we can effectively sample from the inverse distribution even without an identity-specific loss. Our method, named identity denoising diffusion probabilistic model (ID3PM), leverages the stochastic nature of the denoising diffusion process to produce high-quality, identity-preserving face images with various backgrounds, lighting, poses, and expressions. We demonstrate state-of-the-art performance in terms of identity preservation and diversity both qualitatively and quantitatively. Our method is the first black-box face recognition model inversion method that offers intuitive control over the generation process and does not suffer from any of the common shortcomings from competing methods.

* 34 pages. Preprint. Under review

Via

Access Paper or Ask Questions

Augmentation for small object detection

Feb 19, 2019

Mate Kisantal, Zbigniew Wojna, Jakub Murawski, Jacek Naruniec, Kyunghyun Cho

Figure 1 for Augmentation for small object detection

Figure 2 for Augmentation for small object detection

Figure 3 for Augmentation for small object detection

Figure 4 for Augmentation for small object detection

Abstract:In recent years, object detection has experienced impressive progress. Despite these improvements, there is still a significant gap in the performance between the detection of small and large objects. We analyze the current state-of-the-art model, Mask-RCNN, on a challenging dataset, MS COCO. We show that the overlap between small ground-truth objects and the predicted anchors is much lower than the expected IoU threshold. We conjecture this is due to two factors; (1) only a few images are containing small objects, and (2) small objects do not appear enough even within each image containing them. We thus propose to oversample those images with small objects and augment each of those images by copy-pasting small objects many times. It allows us to trade off the quality of the detector on large objects with that on small objects. We evaluate different pasting augmentation strategies, and ultimately, we achieve 9.7\% relative improvement on the instance segmentation and 7.1\% on the object detection of small objects, compared to the current state of the art method on MS COCO.

Via

Access Paper or Ask Questions

Deep Alignment Network: A convolutional neural network for robust face alignment

Aug 10, 2017

Marek Kowalski, Jacek Naruniec, Tomasz Trzcinski

Figure 1 for Deep Alignment Network: A convolutional neural network for robust face alignment

Figure 2 for Deep Alignment Network: A convolutional neural network for robust face alignment

Figure 3 for Deep Alignment Network: A convolutional neural network for robust face alignment

Figure 4 for Deep Alignment Network: A convolutional neural network for robust face alignment

Abstract:In this paper, we propose Deep Alignment Network (DAN), a robust face alignment method based on a deep neural network architecture. DAN consists of multiple stages, where each stage improves the locations of the facial landmarks estimated by the previous stage. Our method uses entire face images at all stages, contrary to the recently proposed face alignment methods that rely on local patches. This is possible thanks to the use of landmark heatmaps which provide visual information about landmark locations estimated at the previous stages of the algorithm. The use of entire face images rather than patches allows DAN to handle face images with large variation in head pose and difficult initializations. An extensive evaluation on two publicly available datasets shows that DAN reduces the state-of-the-art failure rate by up to 70%. Our method has also been submitted for evaluation as part of the Menpo challenge.

* IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW) 2017

Via

Access Paper or Ask Questions

Face Alignment Using K-Cluster Regression Forests With Weighted Splitting

Jun 06, 2017

Marek Kowalski, Jacek Naruniec

Figure 1 for Face Alignment Using K-Cluster Regression Forests With Weighted Splitting

Figure 2 for Face Alignment Using K-Cluster Regression Forests With Weighted Splitting

Figure 3 for Face Alignment Using K-Cluster Regression Forests With Weighted Splitting

Figure 4 for Face Alignment Using K-Cluster Regression Forests With Weighted Splitting

Abstract:In this work we present a face alignment pipeline based on two novel methods: weighted splitting for K-cluster Regression Forests and 3D Affine Pose Regression for face shape initialization. Our face alignment method is based on the Local Binary Feature framework, where instead of standard regression forests and pixel difference features used in the original method, we use our K-cluster Regression Forests with Weighted Splitting (KRFWS) and Pyramid HOG features. We also use KRFWS to perform Affine Pose Regression (APR) and 3D-Affine Pose Regression (3D-APR), which intend to improve the face shape initialization. APR applies a rigid 2D transform to the initial face shape that compensates for inaccuracy in the initial face location, size and in-plane rotation. 3D-APR estimates the parameters of a 3D transform that additionally compensates for out-of-plane rotation. The resulting pipeline, consisting of APR and 3D-APR followed by face alignment, shows an improvement of 20% over standard LBF on the challenging IBUG dataset, and state-of-theart accuracy on the entire 300-W dataset.

* IEEE Signal Processing Letters, vol. 23, no. 11, pp. 1567-1571 (Nov. 2016)
* Postprint of an article published in IEEE Signal Processing Letters in 2016. A video explaining the method: https://www.youtube.com/watch?v=F4tgihZLrYw

Via

Access Paper or Ask Questions