Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun-Hai Yong

DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization

Oct 22, 2024

Haowei Zhu, Dehua Tang, Ji Liu, Mingjie Lu, Jintu Zheng, Jinzhang Peng, Dong Li, Yu Wang, Fan Jiang, Lu Tian(+5 more)

Figure 1 for DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization

Figure 2 for DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization

Figure 3 for DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization

Figure 4 for DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization

Abstract:Diffusion models have achieved remarkable progress in the field of image generation due to their outstanding capabilities. However, these models require substantial computing resources because of the multi-step denoising process during inference. While traditional pruning methods have been employed to optimize these models, the retraining process necessitates large-scale training datasets and extensive computational costs to maintain generalization ability, making it neither convenient nor efficient. Recent studies attempt to utilize the similarity of features across adjacent denoising stages to reduce computational costs through simple and static strategies. However, these strategies cannot fully harness the potential of the similar feature patterns across adjacent timesteps. In this work, we propose a novel pruning method that derives an efficient diffusion model via a more intelligent and differentiable pruner. At the core of our approach is casting the model pruning process into a SubNet search process. Specifically, we first introduce a SuperNet based on standard diffusion via adding some backup connections built upon the similar features. We then construct a plugin pruner network and design optimization losses to identify redundant computation. Finally, our method can identify an optimal SubNet through few-step gradient optimization and a simple post-processing procedure. We conduct extensive experiments on various diffusion models including Stable Diffusion series and DiTs. Our DiP-GO approach achieves 4.4 x speedup for SD-1.5 without any loss of accuracy, significantly outperforming the previous state-of-the-art methods.

Via

Access Paper or Ask Questions

Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation

Aug 09, 2024

Yifan Feng, Jiangang Huang, Shaoyi Du, Shihui Ying, Jun-Hai Yong, Yipeng Li, Guiguang Ding, Rongrong Ji, Yue Gao

Figure 1 for Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation

Figure 2 for Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation

Figure 3 for Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation

Figure 4 for Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation

Abstract:We introduce Hyper-YOLO, a new object detection method that integrates hypergraph computations to capture the complex high-order correlations among visual features. Traditional YOLO models, while powerful, have limitations in their neck designs that restrict the integration of cross-level features and the exploitation of high-order feature interrelationships. To address these challenges, we propose the Hypergraph Computation Empowered Semantic Collecting and Scattering (HGC-SCS) framework, which transposes visual feature maps into a semantic space and constructs a hypergraph for high-order message propagation. This enables the model to acquire both semantic and structural information, advancing beyond conventional feature-focused learning. Hyper-YOLO incorporates the proposed Mixed Aggregation Network (MANet) in its backbone for enhanced feature extraction and introduces the Hypergraph-Based Cross-Level and Cross-Position Representation Network (HyperC2Net) in its neck. HyperC2Net operates across five scales and breaks free from traditional grid structures, allowing for sophisticated high-order interactions across levels and positions. This synergy of components positions Hyper-YOLO as a state-of-the-art architecture in various scale models, as evidenced by its superior performance on the COCO dataset. Specifically, Hyper-YOLO-N significantly outperforms the advanced YOLOv8-N and YOLOv9-T with 12\% $\text{AP}^{val}$ and 9\% $\text{AP}^{val}$ improvements. The source codes are at ttps://github.com/iMoonLab/Hyper-YOLO.

Via

Access Paper or Ask Questions

Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics

May 04, 2024

Haoyu Hu, Xinyu Yi, Zhe Cao, Jun-Hai Yong, Feng Xu

Figure 1 for Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics

Figure 2 for Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics

Figure 3 for Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics

Figure 4 for Hand-Object Interaction Controller (HOIC): Deep Reinforcement Learning for Reconstructing Interactions with Physics

Abstract:Hand manipulating objects is an important interaction motion in our daily activities. We faithfully reconstruct this motion with a single RGBD camera by a novel deep reinforcement learning method to leverage physics. Firstly, we propose object compensation control which establishes direct object control to make the network training more stable. Meanwhile, by leveraging the compensation force and torque, we seamlessly upgrade the simple point contact model to a more physical-plausible surface contact model, further improving the reconstruction accuracy and physical correctness. Experiments indicate that without involving any heuristic physical rules, this work still successfully involves physics in the reconstruction of hand-object interactions which are complex motions hard to imitate with deep reinforcement learning. Our code and data are available at https://github.com/hu-hy17/HOIC.

* SIGGRAPH 2024 Conference Track

Via

Access Paper or Ask Questions

Distribution-Aware Data Expansion with Diffusion Models

Mar 11, 2024

Haowei Zhu, Ling Yang, Jun-Hai Yong, Wentao Zhang, Bin Wang

Figure 1 for Distribution-Aware Data Expansion with Diffusion Models

Figure 2 for Distribution-Aware Data Expansion with Diffusion Models

Figure 3 for Distribution-Aware Data Expansion with Diffusion Models

Figure 4 for Distribution-Aware Data Expansion with Diffusion Models

Abstract:The scale and quality of a dataset significantly impact the performance of deep models. However, acquiring large-scale annotated datasets is both a costly and time-consuming endeavor. To address this challenge, dataset expansion technologies aim to automatically augment datasets, unlocking the full potential of deep models. Current data expansion methods encompass image transformation-based and synthesis-based methods. The transformation-based methods introduce only local variations, resulting in poor diversity. While image synthesis-based methods can create entirely new content, significantly enhancing informativeness. However, existing synthesis methods carry the risk of distribution deviations, potentially degrading model performance with out-of-distribution samples. In this paper, we propose DistDiff, an effective data expansion framework based on the distribution-aware diffusion model. DistDiff constructs hierarchical prototypes to approximate the real data distribution, optimizing latent data points within diffusion models with hierarchical energy guidance. We demonstrate its ability to generate distribution-consistent samples, achieving substantial improvements in data expansion tasks. Specifically, without additional training, DistDiff achieves a 30.7% improvement in accuracy across six image datasets compared to the model trained on original datasets and a 9.8% improvement compared to the state-of-the-art diffusion-based method. Our code is available at https://github.com/haoweiz23/DistDiff

* Project: https://github.com/haoweiz23/DistDiff

Via

Access Paper or Ask Questions

Relightable and Animatable Neural Avatars from Videos

Dec 20, 2023

Wenbin Lin, Chengwei Zheng, Jun-Hai Yong, Feng Xu

Figure 1 for Relightable and Animatable Neural Avatars from Videos

Figure 2 for Relightable and Animatable Neural Avatars from Videos

Figure 3 for Relightable and Animatable Neural Avatars from Videos

Figure 4 for Relightable and Animatable Neural Avatars from Videos

Abstract:Lightweight creation of 3D digital avatars is a highly desirable but challenging task. With only sparse videos of a person under unknown illumination, we propose a method to create relightable and animatable neural avatars, which can be used to synthesize photorealistic images of humans under novel viewpoints, body poses, and lighting. The key challenge here is to disentangle the geometry, material of the clothed body, and lighting, which becomes more difficult due to the complex geometry and shadow changes caused by body motions. To solve this ill-posed problem, we propose novel techniques to better model the geometry and shadow changes. For geometry change modeling, we propose an invertible deformation field, which helps to solve the inverse skinning problem and leads to better geometry quality. To model the spatial and temporal varying shading cues, we propose a pose-aware part-wise light visibility network to estimate light occlusion. Extensive experiments on synthetic and real datasets show that our approach reconstructs high-quality geometry and generates realistic shadows under different body poses. Code and data are available at \url{https://wenbin-lin.github.io/RelightableAvatar-page/}.

* Accepted by AAAI 2024

Via

Access Paper or Ask Questions

Focused and Collaborative Feedback Integration for Interactive Image Segmentation

Mar 21, 2023

Qiaoqiao Wei, Hui Zhang, Jun-Hai Yong

Figure 1 for Focused and Collaborative Feedback Integration for Interactive Image Segmentation

Figure 2 for Focused and Collaborative Feedback Integration for Interactive Image Segmentation

Figure 3 for Focused and Collaborative Feedback Integration for Interactive Image Segmentation

Figure 4 for Focused and Collaborative Feedback Integration for Interactive Image Segmentation

Abstract:Interactive image segmentation aims at obtaining a segmentation mask for an image using simple user annotations. During each round of interaction, the segmentation result from the previous round serves as feedback to guide the user's annotation and provides dense prior information for the segmentation model, effectively acting as a bridge between interactions. Existing methods overlook the importance of feedback or simply concatenate it with the original input, leading to underutilization of feedback and an increase in the number of required annotations. To address this, we propose an approach called Focused and Collaborative Feedback Integration (FCFI) to fully exploit the feedback for click-based interactive image segmentation. FCFI first focuses on a local area around the new click and corrects the feedback based on the similarities of high-level features. It then alternately and collaboratively updates the feedback and deep features to integrate the feedback into the features. The efficacy and efficiency of FCFI were validated on four benchmarks, namely GrabCut, Berkeley, SBD, and DAVIS. Experimental results show that FCFI achieved new state-of-the-art performance with less computational overhead than previous methods. The source code is available at https://github.com/veizgyauzgyauz/FCFI.

* Accepted for publication at CVPR 2023

Via

Access Paper or Ask Questions

Physical Interaction: Reconstructing Hand-object Interactions with Physics

Sep 22, 2022

Haoyu Hu, Xinyu Yi, Hao Zhang, Jun-Hai Yong, Feng Xu

Figure 1 for Physical Interaction: Reconstructing Hand-object Interactions with Physics

Figure 2 for Physical Interaction: Reconstructing Hand-object Interactions with Physics

Figure 3 for Physical Interaction: Reconstructing Hand-object Interactions with Physics

Figure 4 for Physical Interaction: Reconstructing Hand-object Interactions with Physics

Abstract:Single view-based reconstruction of hand-object interaction is challenging due to the severe observation missing caused by occlusions. This paper proposes a physics-based method to better solve the ambiguities in the reconstruction. It first proposes a force-based dynamic model of the in-hand object, which not only recovers the unobserved contacts but also solves for plausible contact forces. Next, a confidence-based slide prevention scheme is proposed, which combines both the kinematic confidences and the contact forces to jointly model static and sliding contact motion. Qualitative and quantitative experiments show that the proposed technique reconstructs both physically plausible and more accurate hand-object interaction and estimates plausible contact forces in real-time with a single RGBD sensor.

* SIGGRAPH Asia 2022 Conference Track

Via

Access Paper or Ask Questions

OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction

Mar 15, 2022

Wenbin Lin, Chengwei Zheng, Jun-Hai Yong, Feng Xu

Figure 1 for OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction

Figure 2 for OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction

Figure 3 for OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction

Figure 4 for OcclusionFusion: Occlusion-aware Motion Estimation for Real-time Dynamic 3D Reconstruction

Abstract:RGBD-based real-time dynamic 3D reconstruction suffers from inaccurate inter-frame motion estimation as errors may accumulate with online tracking. This problem is even more severe for single-view-based systems due to strong occlusions. Based on these observations, we propose OcclusionFusion, a novel method to calculate occlusion-aware 3D motion to guide the reconstruction. In our technique, the motion of visible regions is first estimated and combined with temporal information to infer the motion of the occluded regions through an LSTM-involved graph neural network. Furthermore, our method computes the confidence of the estimated motion by modeling the network output with a probabilistic model, which alleviates untrustworthy motions and enables robust tracking. Experimental results on public datasets and our own recorded data show that our technique outperforms existing single-view-based real-time methods by a large margin. With the reduction of the motion errors, the proposed technique can handle long and challenging motion sequences. Please check out the project page for sequence results: https://wenbin-lin.github.io/OcclusionFusion.

* Accepted by CVPR 2022. Project page: https://wenbin-lin.github.io/OcclusionFusion

Via

Access Paper or Ask Questions

Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

Nov 18, 2021

Chen Ma, Xiangyu Guo, Li Chen, Jun-Hai Yong, Yisen Wang

Figure 1 for Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

Figure 2 for Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

Figure 3 for Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

Figure 4 for Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

Abstract:One major problem in black-box adversarial attacks is the high query complexity in the hard-label attack setting, where only the top-1 predicted label is available. In this paper, we propose a novel geometric-based approach called Tangent Attack (TA), which identifies an optimal tangent point of a virtual hemisphere located on the decision boundary to reduce the distortion of the attack. Assuming the decision boundary is locally flat, we theoretically prove that the minimum $\ell_2$ distortion can be obtained by reaching the decision boundary along the tangent line passing through such tangent point in each iteration. To improve the robustness of our method, we further propose a generalized method which replaces the hemisphere with a semi-ellipsoid to adapt to curved decision boundaries. Our approach is free of hyperparameters and pre-training. Extensive experiments conducted on the ImageNet and CIFAR-10 datasets demonstrate that our approach can consume only a small number of queries to achieve the low-magnitude distortion. The implementation source code is released online at https://github.com/machanic/TangentAttack.

* accepted at NeurIPS 2021, including the appendix

Via

Access Paper or Ask Questions