Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tianshu Kuai

Towards Unsupervised Blind Face Restoration using Diffusion Prior

Oct 06, 2024

Tianshu Kuai, Sina Honari, Igor Gilitschenski, Alex Levinshtein

Abstract:Blind face restoration methods have shown remarkable performance, particularly when trained on large-scale synthetic datasets with supervised learning. These datasets are often generated by simulating low-quality face images with a handcrafted image degradation pipeline. The models trained on such synthetic degradations, however, cannot deal with inputs of unseen degradations. In this paper, we address this issue by using only a set of input images, with unknown degradations and without ground truth targets, to fine-tune a restoration model that learns to map them to clean and contextually consistent outputs. We utilize a pre-trained diffusion model as a generative prior through which we generate high quality images from the natural image distribution while maintaining the input image content through consistency constraints. These generated images are then used as pseudo targets to fine-tune a pre-trained restoration model. Unlike many recent approaches that employ diffusion models at test time, we only do so during training and thus maintain an efficient inference-time performance. Extensive experiments show that the proposed approach can consistently improve the perceptual quality of pre-trained blind face restoration models while maintaining great consistency with the input contents. Our best model also achieves the state-of-the-art results on both synthetic and real-world datasets.

* Project page: https://dt-bfr.github.io/

Via

Access Paper or Ask Questions

CAMM: Building Category-Agnostic and Animatable 3D Models from Monocular Videos

Apr 14, 2023

Tianshu Kuai, Akash Karthikeyan, Yash Kant, Ashkan Mirzaei, Igor Gilitschenski

Figure 1 for CAMM: Building Category-Agnostic and Animatable 3D Models from Monocular Videos

Figure 2 for CAMM: Building Category-Agnostic and Animatable 3D Models from Monocular Videos

Figure 3 for CAMM: Building Category-Agnostic and Animatable 3D Models from Monocular Videos

Figure 4 for CAMM: Building Category-Agnostic and Animatable 3D Models from Monocular Videos

Abstract:Animating an object in 3D often requires an articulated structure, e.g. a kinematic chain or skeleton of the manipulated object with proper skinning weights, to obtain smooth movements and surface deformations. However, existing models that allow direct pose manipulations are either limited to specific object categories or built with specialized equipment. To reduce the work needed for creating animatable 3D models, we propose a novel reconstruction method that learns an animatable kinematic chain for any articulated object. Our method operates on monocular videos without prior knowledge of the object's shape or underlying structure. Our approach is on par with state-of-the-art 3D surface reconstruction methods on various articulated object categories while enabling direct pose manipulations by re-posing the learned kinematic chain.

* Project Page: https://camm3d.github.io/

Via

Access Paper or Ask Questions

Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss

Jan 12, 2023

Anas Mahmoud, Jordan S. K. Hu, Tianshu Kuai, Ali Harakeh, Liam Paull, Steven L. Waslander

Figure 1 for Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss

Figure 2 for Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss

Figure 3 for Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss

Figure 4 for Self-Supervised Image-to-Point Distillation via Semantically Tolerant Contrastive Loss

Abstract:An effective framework for learning 3D representations for perception tasks is distilling rich self-supervised image features via contrastive learning. However, image-to point representation learning for autonomous driving datasets faces two main challenges: 1) the abundance of self-similarity, which results in the contrastive losses pushing away semantically similar point and image regions and thus disturbing the local semantic structure of the learned representations, and 2) severe class imbalance as pretraining gets dominated by over-represented classes. We propose to alleviate the self-similarity problem through a novel semantically tolerant image-to-point contrastive loss that takes into consideration the semantic distance between positive and negative image regions to minimize contrasting semantically similar point and image regions. Additionally, we address class imbalance by designing a class-agnostic balanced loss that approximates the degree of class imbalance through an aggregate sample-to-samples semantic similarity measure. We demonstrate that our semantically-tolerant contrastive loss with class balancing improves state-of-the art 2D-to-3D representation learning in all evaluation settings on 3D semantic segmentation. Our method consistently outperforms state-of-the-art 2D-to-3D representation learning frameworks across a wide range of 2D self-supervised pretrained models.

Via

Access Paper or Ask Questions

Point Density-Aware Voxels for LiDAR 3D Object Detection

Mar 22, 2022

Jordan S. K. Hu, Tianshu Kuai, Steven L. Waslander

Figure 1 for Point Density-Aware Voxels for LiDAR 3D Object Detection

Figure 2 for Point Density-Aware Voxels for LiDAR 3D Object Detection

Figure 3 for Point Density-Aware Voxels for LiDAR 3D Object Detection

Figure 4 for Point Density-Aware Voxels for LiDAR 3D Object Detection

Abstract:LiDAR has become one of the primary 3D object detection sensors in autonomous driving. However, LiDAR's diverging point pattern with increasing distance results in a non-uniform sampled point cloud ill-suited to discretized volumetric feature extraction. Current methods either rely on voxelized point clouds or use inefficient farthest point sampling to mitigate detrimental effects caused by density variation but largely ignore point density as a feature and its predictable relationship with distance from the LiDAR sensor. Our proposed solution, Point Density-Aware Voxel network (PDV), is an end-to-end two stage LiDAR 3D object detection architecture that is designed to account for these point density variations. PDV efficiently localizes voxel features from the 3D sparse convolution backbone through voxel point centroids. The spatially localized voxel features are then aggregated through a density-aware RoI grid pooling module using kernel density estimation (KDE) and self-attention with point density positional encoding. Finally, we exploit LiDAR's point density to distance relationship to refine our final bounding box confidences. PDV outperforms all state-of-the-art methods on the Waymo Open Dataset and achieves competitive results on the KITTI dataset. We provide a code release for PDV which is available at https://github.com/TRAILab/PDV.

* Accepted in CVPR 2022

Via

Access Paper or Ask Questions