Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hanbyel Cho

Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation

Feb 04, 2025

JooHyun Kwon, Hanbyel Cho, Junmo Kim

Figure 1 for Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation

Figure 2 for Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation

Figure 3 for Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation

Figure 4 for Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation

Abstract:Recent 4D dynamic scene editing methods require editing thousands of 2D images used for dynamic scene synthesis and updating the entire scene with additional training loops, resulting in several hours of processing to edit a single dynamic scene. Therefore, these methods are not scalable with respect to the temporal dimension of the dynamic scene (i.e., the number of timesteps). In this work, we propose an efficient dynamic scene editing method that is more scalable in terms of temporal dimension. To achieve computational efficiency, we leverage a 4D Gaussian representation that models a 4D dynamic scene by combining static 3D Gaussians with a Hexplane-based deformation field, which handles dynamic information. We then perform editing solely on the static 3D Gaussians, which is the minimal but sufficient component required for visual editing. To resolve the misalignment between the edited 3D Gaussians and the deformation field potentially resulting from the editing process, we additionally conducted a refinement stage using a score distillation mechanism. Extensive editing results demonstrate that our method is efficient, reducing editing time by more than half compared to existing methods, while achieving high editing quality that better follows user instructions.

Via

Access Paper or Ask Questions

Foreseeing Reconstruction Quality of Gradient Inversion: An Optimization Perspective

Dec 19, 2023

HyeongGwon Hong, Yooshin Cho, Hanbyel Cho, Jaesung Ahn, Junmo Kim

Abstract:Gradient inversion attacks can leak data privacy when clients share weight updates with the server in federated learning (FL). Existing studies mainly use L2 or cosine distance as the loss function for gradient matching in the attack. Our empirical investigation shows that the vulnerability ranking varies with the loss function used. Gradient norm, which is commonly used as a vulnerability proxy for gradient inversion attack, cannot explain this as it remains constant regardless of the loss function for gradient matching. In this paper, we propose a loss-aware vulnerability proxy (LAVP) for the first time. LAVP refers to either the maximum or minimum eigenvalue of the Hessian with respect to gradient matching loss at ground truth. This suggestion is based on our theoretical findings regarding the local optimization of the gradient inversion in proximity to the ground truth, which corresponds to the worst case attack scenario. We demonstrate the effectiveness of LAVP on various architectures and datasets, showing its consistent superiority over the gradient norm in capturing sample vulnerabilities. The performance of each proxy is measured in terms of Spearman's rank correlation with respect to several similarity scores. This work will contribute to enhancing FL security against any potential loss functions beyond L2 or cosine distance in the future.

* To appear in AAAI 2024

Via

Access Paper or Ask Questions

Generative Approach for Probabilistic Human Mesh Recovery using Diffusion Models

Aug 16, 2023

Hanbyel Cho, Junmo Kim

Abstract:This work focuses on the problem of reconstructing a 3D human body mesh from a given 2D image. Despite the inherent ambiguity of the task of human mesh recovery, most existing works have adopted a method of regressing a single output. In contrast, we propose a generative approach framework, called "Diffusion-based Human Mesh Recovery (Diff-HMR)" that takes advantage of the denoising diffusion process to account for multiple plausible outcomes. During the training phase, the SMPL parameters are diffused from ground-truth parameters to random distribution, and Diff-HMR learns the reverse process of this diffusion. In the inference phase, the model progressively refines the given random SMPL parameters into the corresponding parameters that align with the input image. Diff-HMR, being a generative approach, is capable of generating diverse results for the same input image as the input noise varies. We conduct validation experiments, and the results demonstrate that the proposed framework effectively models the inherent ambiguity of the task of human mesh recovery in a probabilistic manner. The code is available at https://github.com/hanbyel0105/Diff-HMR

* Accepted to ICCV 2023 CV4Metaverse Workshop

Via

Access Paper or Ask Questions

Implicit 3D Human Mesh Recovery using Consistency with Pose and Shape from Unseen-view

Jul 03, 2023

Hanbyel Cho, Yooshin Cho, Jaesung Ahn, Junmo Kim

Figure 1 for Implicit 3D Human Mesh Recovery using Consistency with Pose and Shape from Unseen-view

Figure 2 for Implicit 3D Human Mesh Recovery using Consistency with Pose and Shape from Unseen-view

Figure 3 for Implicit 3D Human Mesh Recovery using Consistency with Pose and Shape from Unseen-view

Figure 4 for Implicit 3D Human Mesh Recovery using Consistency with Pose and Shape from Unseen-view

Abstract:From an image of a person, we can easily infer the natural 3D pose and shape of the person even if ambiguity exists. This is because we have a mental model that allows us to imagine a person's appearance at different viewing directions from a given image and utilize the consistency between them for inference. However, existing human mesh recovery methods only consider the direction in which the image was taken due to their structural limitations. Hence, we propose "Implicit 3D Human Mesh Recovery (ImpHMR)" that can implicitly imagine a person in 3D space at the feature-level via Neural Feature Fields. In ImpHMR, feature fields are generated by CNN-based image encoder for a given image. Then, the 2D feature map is volume-rendered from the feature field for a given viewing direction, and the pose and shape parameters are regressed from the feature. To utilize consistency with pose and shape from unseen-view, if there are 3D labels, the model predicts results including the silhouette from an arbitrary direction and makes it equal to the rotated ground-truth. In the case of only 2D labels, we perform self-supervised learning through the constraint that the pose and shape parameters inferred from different directions should be the same. Extensive evaluations show the efficacy of the proposed method.

* Accepted to CVPR 2023 (poster)

Via

Access Paper or Ask Questions

Localization using Multi-Focal Spatial Attention for Masked Face Recognition

May 03, 2023

Yooshin Cho, Hanbyel Cho, Hyeong Gwon Hong, Jaesung Ahn, Dongmin Cho, JungWoo Chang, Junmo Kim

Figure 1 for Localization using Multi-Focal Spatial Attention for Masked Face Recognition

Figure 2 for Localization using Multi-Focal Spatial Attention for Masked Face Recognition

Figure 3 for Localization using Multi-Focal Spatial Attention for Masked Face Recognition

Figure 4 for Localization using Multi-Focal Spatial Attention for Masked Face Recognition

Abstract:Since the beginning of world-wide COVID-19 pandemic, facial masks have been recommended to limit the spread of the disease. However, these masks hide certain facial attributes. Hence, it has become difficult for existing face recognition systems to perform identity verification on masked faces. In this context, it is necessary to develop masked Face Recognition (MFR) for contactless biometric recognition systems. Thus, in this paper, we propose Complementary Attention Learning and Multi-Focal Spatial Attention that precisely removes masked region by training complementary spatial attention to focus on two distinct regions: masked regions and backgrounds. In our method, standard spatial attention and networks focus on unmasked regions, and extract mask-invariant features while minimizing the loss of the conventional Face Recognition (FR) performance. For conventional FR, we evaluate the performance on the IJB-C, Age-DB, CALFW, and CPLFW datasets. We evaluate the MFR performance on the ICCV2021-MFR/Insightface track, and demonstrate the improved performance on the both MFR and FR datasets. Additionally, we empirically verify that spatial attention of proposed method is more precisely activated in unmasked regions.

* Accepted at FG 2023 - InterID Workshop

Via

Access Paper or Ask Questions

Rethinking Efficacy of Softmax for Lightweight Non-Local Neural Networks

Jul 27, 2022

Yooshin Cho, Youngsoo Kim, Hanbyel Cho, Jaesung Ahn, Hyeong Gwon Hong, Junmo Kim

Figure 1 for Rethinking Efficacy of Softmax for Lightweight Non-Local Neural Networks

Figure 2 for Rethinking Efficacy of Softmax for Lightweight Non-Local Neural Networks

Figure 3 for Rethinking Efficacy of Softmax for Lightweight Non-Local Neural Networks

Figure 4 for Rethinking Efficacy of Softmax for Lightweight Non-Local Neural Networks

Abstract:Non-local (NL) block is a popular module that demonstrates the capability to model global contexts. However, NL block generally has heavy computation and memory costs, so it is impractical to apply the block to high-resolution feature maps. In this paper, to investigate the efficacy of NL block, we empirically analyze if the magnitude and direction of input feature vectors properly affect the attention between vectors. The results show the inefficacy of softmax operation which is generally used to normalize the attention map of the NL block. Attention maps normalized with softmax operation highly rely upon magnitude of key vectors, and performance is degenerated if the magnitude information is removed. By replacing softmax operation with the scaling factor, we demonstrate improved performance on CIFAR-10, CIFAR-100, and Tiny-ImageNet. In Addition, our method shows robustness to embedding channel reduction and embedding weight initialization. Notably, our method makes multi-head attention employable without additional computational cost.

* ICIP 2022

Via

Access Paper or Ask Questions

Stochastic Attribute Modeling for Face Super-Resolution

Jul 16, 2022

Hanbyel Cho, Yekang Lee, Jaemyung Yu, Junmo Kim

Figure 1 for Stochastic Attribute Modeling for Face Super-Resolution

Figure 2 for Stochastic Attribute Modeling for Face Super-Resolution

Figure 3 for Stochastic Attribute Modeling for Face Super-Resolution

Figure 4 for Stochastic Attribute Modeling for Face Super-Resolution

Abstract:When a high-resolution (HR) image is degraded into a low-resolution (LR) image, the image loses some of the existing information. Consequently, multiple HR images can correspond to the LR image. Most of the existing methods do not consider the uncertainty caused by the stochastic attribute, which can only be probabilistically inferred. Therefore, the predicted HR images are often blurry because the network tries to reflect all possibilities in a single output image. To overcome this limitation, this paper proposes a novel face super-resolution (SR) scheme to take into the uncertainty by stochastic modeling. Specifically, the information in LR images is separately encoded into deterministic and stochastic attributes. Furthermore, an Input Conditional Attribute Predictor is proposed and separately trained to predict the partially alive stochastic attributes from only the LR images. Extensive evaluation shows that the proposed method successfully reduces the uncertainty in the learning process and outperforms the existing state-of-the-art approaches.

Via

Access Paper or Ask Questions

Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning

Dec 03, 2021

Hanbyel Cho, Yooshin Cho, Jaemyung Yu, Junmo Kim

Figure 1 for Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning

Figure 2 for Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning

Figure 3 for Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning

Figure 4 for Camera Distortion-aware 3D Human Pose Estimation in Video with Optimization-based Meta-Learning

Abstract:Existing 3D human pose estimation algorithms trained on distortion-free datasets suffer performance drop when applied to new scenarios with a specific camera distortion. In this paper, we propose a simple yet effective model for 3D human pose estimation in video that can quickly adapt to any distortion environment by utilizing MAML, a representative optimization-based meta-learning algorithm. We consider a sequence of 2D keypoints in a particular distortion as a single task of MAML. However, due to the absence of a large-scale dataset in a distorted environment, we propose an efficient method to generate synthetic distorted data from undistorted 2D keypoints. For the evaluation, we assume two practical testing situations depending on whether a motion capture sensor is available or not. In particular, we propose Inference Stage Optimization using bone-length symmetry and consistency. Extensive evaluation shows that our proposed method successfully adapts to various degrees of distortion in the testing phase and outperforms the existing state-of-the-art approaches. The proposed method is useful in practice because it does not require camera calibration and additional computations in a testing set-up.

* Accepted to ICCV 2021 (poster)

Via

Access Paper or Ask Questions

Improving Generalization of Batch Whitening by Convolutional Unit Optimization

Aug 24, 2021

Yooshin Cho, Hanbyel Cho, Youngsoo Kim, Junmo Kim

Figure 1 for Improving Generalization of Batch Whitening by Convolutional Unit Optimization

Figure 2 for Improving Generalization of Batch Whitening by Convolutional Unit Optimization

Figure 3 for Improving Generalization of Batch Whitening by Convolutional Unit Optimization

Figure 4 for Improving Generalization of Batch Whitening by Convolutional Unit Optimization

Abstract:Batch Whitening is a technique that accelerates and stabilizes training by transforming input features to have a zero mean (Centering) and a unit variance (Scaling), and by removing linear correlation between channels (Decorrelation). In commonly used structures, which are empirically optimized with Batch Normalization, the normalization layer appears between convolution and activation function. Following Batch Whitening studies have employed the same structure without further analysis; even Batch Whitening was analyzed on the premise that the input of a linear layer is whitened. To bridge the gap, we propose a new Convolutional Unit that is in line with the theory, and our method generally improves the performance of Batch Whitening. Moreover, we show the inefficacy of the original Convolutional Unit by investigating rank and correlation of features. As our method is employable off-the-shelf whitening modules, we use Iterative Normalization (IterNorm), the state-of-the-art whitening module, and obtain significantly improved performance on five image classification datasets: CIFAR-10, CIFAR-100, CUB-200-2011, Stanford Dogs, and ImageNet. Notably, we verify that our method improves stability and performance of whitening when using large learning rate, group size, and iteration number.

* ICCV 2021

Via

Access Paper or Ask Questions