Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Donghwan Kim

DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image

Mar 25, 2025

Hyeongjin Nam, Donghwan Kim, Jeongtaek Oh, Kyoung Mu Lee

Abstract:Most existing methods of 3D clothed human reconstruction from a single image treat the clothed human as a single object without distinguishing between cloth and human body. In this regard, we present DeClotH, which separately reconstructs 3D cloth and human body from a single image. This task remains largely unexplored due to the extreme occlusion between cloth and the human body, making it challenging to infer accurate geometries and textures. Moreover, while recent 3D human reconstruction methods have achieved impressive results using text-to-image diffusion models, directly applying such an approach to this problem often leads to incorrect guidance, particularly in reconstructing 3D cloth. To address these challenges, we propose two core designs in our framework. First, to alleviate the occlusion issue, we leverage 3D template models of cloth and human body as regularizations, which provide strong geometric priors to prevent erroneous reconstruction by the occlusion. Second, we introduce a cloth diffusion model specifically designed to provide contextual information about cloth appearance, thereby enhancing the reconstruction of 3D cloth. Qualitative and quantitative experiments demonstrate that our proposed approach is highly effective in reconstructing both 3D cloth and the human body. More qualitative results are provided at https://hygenie1228.github.io/DeClotH/.

* Published at CVPR 2025, 17 pages including the supplementary material

Via

Access Paper or Ask Questions

Stochastic Extragradient with Flip-Flop Shuffling & Anchoring: Provable Improvements

Dec 31, 2024

Jiseok Chae, Chulhee Yun, Donghwan Kim

Abstract:In minimax optimization, the extragradient (EG) method has been extensively studied because it outperforms the gradient descent-ascent method in convex-concave (C-C) problems. Yet, stochastic EG (SEG) has seen limited success in C-C problems, especially for unconstrained cases. Motivated by the recent progress of shuffling-based stochastic methods, we investigate the convergence of shuffling-based SEG in unconstrained finite-sum minimax problems, in search of convergent shuffling-based SEG. Our analysis reveals that both random reshuffling and the recently proposed flip-flop shuffling alone can suffer divergence in C-C problems. However, with an additional simple trick called anchoring, we develop the SEG with flip-flop anchoring (SEG-FFA) method which successfully converges in C-C problems. We also show upper and lower bounds in the strongly-convex-strongly-concave setting, demonstrating that SEG-FFA has a provably faster convergence rate compared to other shuffling-based methods.

* 73+7 pages, 4 figures. Published in NeurIPS 2024

Via

Access Paper or Ask Questions

Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images

Sep 27, 2024

Donghwan Kim, Tae-Kyun Kim

Figure 1 for Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images

Figure 2 for Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images

Figure 3 for Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images

Figure 4 for Multi-hypotheses Conditioned Point Cloud Diffusion for 3D Human Reconstruction from Occluded Images

Abstract:3D human shape reconstruction under severe occlusion due to human-object or human-human interaction is a challenging problem. Parametric models i.e., SMPL(-X), which are based on the statistics across human shapes, can represent whole human body shapes but are limited to minimally-clothed human shapes. Implicit-function-based methods extract features from the parametric models to employ prior knowledge of human bodies and can capture geometric details such as clothing and hair. However, they often struggle to handle misaligned parametric models and inpaint occluded regions given a single RGB image. In this work, we propose a novel pipeline, MHCDIFF, Multi-hypotheses Conditioned Point Cloud Diffusion, composed of point cloud diffusion conditioned on probabilistic distributions for pixel-aligned detailed 3D human reconstruction under occlusion. Compared to previous implicit-function-based methods, the point cloud diffusion model can capture the global consistent features to generate the occluded regions, and the denoising process corrects the misaligned SMPL meshes. The core of MHCDIFF is extracting local features from multiple hypothesized SMPL(-X) meshes and aggregating the set of features to condition the diffusion model. In the experiments on CAPE and MultiHuman datasets, the proposed method outperforms various SOTA methods based on SMPL, implicit functions, point cloud diffusion, and their combined, under synthetic and real occlusions.

* 17 pages, 7 figures, accepted NeurIPS 2024

Via

Access Paper or Ask Questions

Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics

Sep 06, 2024

Woojin Cho, Jihyun Lee, Minjae Yi, Minje Kim, Taeyun Woo, Donghwan Kim, Taewook Ha, Hyokeun Lee, Je-Hwan Ryu, Woontack Woo(+1 more)

Figure 1 for Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics

Figure 2 for Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics

Figure 3 for Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics

Figure 4 for Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics

Abstract:Existing datasets for 3D hand-object interaction are limited either in the data cardinality, data variations in interaction scenarios, or the quality of annotations. In this work, we present a comprehensive new training dataset for hand-object interaction called HOGraspNet. It is the only real dataset that captures full grasp taxonomies, providing grasp annotation and wide intraclass variations. Using grasp taxonomies as atomic actions, their space and time combinatorial can represent complex hand activities around objects. We select 22 rigid objects from the YCB dataset and 8 other compound objects using shape and size taxonomies, ensuring coverage of all hand grasp configurations. The dataset includes diverse hand shapes from 99 participants aged 10 to 74, continuous video frames, and a 1.5M RGB-Depth of sparse frames with annotations. It offers labels for 3D hand and object meshes, 3D keypoints, contact maps, and \emph{grasp labels}. Accurate hand and object 3D meshes are obtained by fitting the hand parametric model (MANO) and the hand implicit function (HALO) to multi-view RGBD frames, with the MoCap system only for objects. Note that HALO fitting does not require any parameter tuning, enabling scalability to the dataset's size with comparable accuracy to MANO. We evaluate HOGraspNet on relevant tasks: grasp classification and 3D hand pose estimation. The result shows performance variations based on grasp type and object class, indicating the potential importance of the interaction space captured by our dataset. The provided data aims at learning universal shape priors or foundation models for 3D hand-object interaction. Our dataset and code are available at https://hograspnet2024.github.io/.

* 14 pages except for references. It will be published at European Conference on Computer Vision(ECCV) 2024

Via

Access Paper or Ask Questions

Extending the Reach of First-Order Algorithms for Nonconvex Min-Max Problems with Cohypomonotonicity

Feb 07, 2024

Ahmet Alacaoglu, Donghwan Kim, Stephen J. Wright

Abstract:We focus on constrained, $L$-smooth, nonconvex-nonconcave min-max problems either satisfying $\rho$-cohypomonotonicity or admitting a solution to the $\rho$-weakly Minty Variational Inequality (MVI), where larger values of the parameter $\rho>0$ correspond to a greater degree of nonconvexity. These problem classes include examples in two player reinforcement learning, interaction dominant min-max problems, and certain synthetic test problems on which classical min-max algorithms fail. It has been conjectured that first-order methods can tolerate value of $\rho$ no larger than $\frac{1}{L}$, but existing results in the literature have stagnated at the tighter requirement $\rho < \frac{1}{2L}$. With a simple argument, we obtain optimal or best-known complexity guarantees with cohypomonotonicity or weak MVI conditions for $\rho < \frac{1}{L}$. The algorithms we analyze are inexact variants of Halpern and Krasnosel'ski\u{\i}-Mann (KM) iterations. We also provide algorithms and complexity guarantees in the stochastic case with the same range on $\rho$. Our main insight for the improvements in the convergence analyses is to harness the recently proposed "conic nonexpansiveness" property of operators. As byproducts, we provide a refined analysis for inexact Halpern iteration and propose a stochastic KM iteration with a multilevel Monte Carlo estimator.

Via

Access Paper or Ask Questions

Joint Downlink and Uplink Optimization for RIS-Aided FDD MIMO Communication Systems

Jan 21, 2024

Gyoseung Lee, Hyeongtaek Lee, Donghwan Kim, Jaehoon Chung, A. Lee. Swindlehurst, Junil Choi

Abstract:This paper investigates reconfigurable intelligent surface (RIS)-aided frequency division duplexing (FDD) communication systems. Since the downlink and uplink signals are simultaneously transmitted in FDD, the phase shifts at the RIS should be designed to support both transmissions. Considering a single-user multiple-input multiple-output system, we formulate a weighted sum-rate maximization problem to jointly maximize the downlink and uplink system performance. To tackle the non-convex optimization problem, we adopt an alternating optimization (AO) algorithm, in which two phase shift optimization techniques are developed to handle the unit-modulus constraints induced by the reflection coefficients at the RIS. The first technique exploits the manifold optimization-based algorithm, while the second uses a lower-complexity AO approach. Numerical results verify that the proposed techniques rapidly converge to local optima and significantly improve the overall system performance compared to existing benchmark schemes.

* Accepted to IEEE Transactions on Wireless Communications

Via

Access Paper or Ask Questions

NeuJeans: Private Neural Network Inference with Joint Optimization of Convolution and Bootstrapping

Dec 07, 2023

Jae Hyung Ju, Jaiyoung Park, Jongmin Kim, Donghwan Kim, Jung Ho Ahn

Abstract:Fully homomorphic encryption (FHE) is a promising cryptographic primitive for realizing private neural network inference (PI) services by allowing a client to fully offload the inference task to a cloud server while keeping the client data oblivious to the server. This work proposes NeuJeans, an FHE-based solution for the PI of deep convolutional neural networks (CNNs). NeuJeans tackles the critical problem of the enormous computational cost for the FHE evaluation of convolutional layers (conv2d), mainly due to the high cost of data reordering and bootstrapping. We first propose an encoding method introducing nested structures inside encoded vectors for FHE, which enables us to develop efficient conv2d algorithms with reduced data reordering costs. However, the new encoding method also introduces additional computations for conversion between encoding methods, which could negate its advantages. We discover that fusing conv2d with bootstrapping eliminates such computations while reducing the cost of bootstrapping. Then, we devise optimized execution flows for various types of conv2d and apply them to end-to-end implementation of CNNs. NeuJeans accelerates the performance of conv2d by up to 5.68 times compared to state-of-the-art FHE-based PI work and performs the PI of a CNN at the scale of ImageNet (ResNet18) within a mere few seconds

* 16 pages, 9 figures

Via

Access Paper or Ask Questions

FourierHandFlow: Neural 4D Hand Representation Using Fourier Query Flow

Jul 16, 2023

Jihyun Lee, Junbong Jang, Donghwan Kim, Minhyuk Sung, Tae-Kyun Kim

Figure 1 for FourierHandFlow: Neural 4D Hand Representation Using Fourier Query Flow

Figure 2 for FourierHandFlow: Neural 4D Hand Representation Using Fourier Query Flow

Figure 3 for FourierHandFlow: Neural 4D Hand Representation Using Fourier Query Flow

Figure 4 for FourierHandFlow: Neural 4D Hand Representation Using Fourier Query Flow

Abstract:Recent 4D shape representations model continuous temporal evolution of implicit shapes by (1) learning query flows without leveraging shape and articulation priors or (2) decoding shape occupancies separately for each time value. Thus, they do not effectively capture implicit correspondences between articulated shapes or regularize jittery temporal deformations. In this work, we present FourierHandFlow, which is a spatio-temporally continuous representation for human hands that combines a 3D occupancy field with articulation-aware query flows represented as Fourier series. Given an input RGB sequence, we aim to learn a fixed number of Fourier coefficients for each query flow to guarantee smooth and continuous temporal shape dynamics. To effectively model spatio-temporal deformations of articulated hands, we compose our 4D representation based on two types of Fourier query flow: (1) pose flow that models query dynamics influenced by hand articulation changes via implicit linear blend skinning and (2) shape flow that models query-wise displacement flow. In the experiments, our method achieves state-of-the-art results on video-based 4D reconstruction while being computationally more efficient than the existing 3D/4D implicit shape representations. We additionally show our results on motion inter- and extrapolation and texture transfer using the learned correspondences of implicit shapes. To the best of our knowledge, FourierHandFlow is the first neural 4D continuous hand representation learned from RGB videos. The code will be publicly accessible.

* 16 pages, 6 figures, under review

Via

Access Paper or Ask Questions

Two-timescale Extragradient for Finding Local Minimax Points

May 25, 2023

Jiseok Chae, Kyuwon Kim, Donghwan Kim

Figure 1 for Two-timescale Extragradient for Finding Local Minimax Points

Figure 2 for Two-timescale Extragradient for Finding Local Minimax Points

Figure 3 for Two-timescale Extragradient for Finding Local Minimax Points

Abstract:Minimax problems are notoriously challenging to optimize. However, we demonstrate that the two-timescale extragradient can be a viable solution. By utilizing dynamical systems theory, we show that it converges to points that satisfy the second-order necessary condition of local minimax points, under a mild condition. This work surpasses all previous results as we eliminate a crucial assumption that the Hessian, with respect to the maximization variable, is nondegenerate.

* 34 pages, 3 figures

Via

Access Paper or Ask Questions

HyPHEN: A Hybrid Packing Method and Optimizations for Homomorphic Encryption-Based Neural Networks

Feb 05, 2023

Donghwan Kim, Jaiyoung Park, Jongmin Kim, Sangpyo Kim, Jung Ho Ahn

Abstract:Convolutional neural network (CNN) inference using fully homomorphic encryption (FHE) is a promising private inference (PI) solution due to the capability of FHE that enables offloading the whole computation process to the server while protecting the privacy of sensitive user data. However, prior FHEbased CNN (HCNN) implementations are far from being practical due to the high computational and memory overheads of FHE. To overcome this limitation, we present HyPHEN, a deep HCNN construction that features an efficient FHE convolution algorithm, data packing methods (hybrid packing and image slicing), and FHE-specific optimizations. Such enhancements enable HyPHEN to substantially reduce the memory footprint and the number of expensive homomorphic operations, such as ciphertext rotation and bootstrapping. As a result, HyPHEN brings the latency of HCNN CIFAR-10 inference down to a practical level at 1.40s (ResNet20) and demonstrates HCNN ImageNet inference for the first time at 16.87s (ResNet18).

* 14 pages, 13 figures

Via

Access Paper or Ask Questions