Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dong Hwan Kim

Spectral-Adaptive Modulation Networks for Visual Perception

Mar 31, 2025

Guhnoo Yun, Juhan Yoo, Kijung Kim, Jeongho Lee, Paul Hongsuck Seo, Dong Hwan Kim

Abstract:Recent studies have shown that 2D convolution and self-attention exhibit distinct spectral behaviors, and optimizing their spectral properties can enhance vision model performance. However, theoretical analyses remain limited in explaining why 2D convolution is more effective in high-pass filtering than self-attention and why larger kernels favor shape bias, akin to self-attention. In this paper, we employ graph spectral analysis to theoretically simulate and compare the frequency responses of 2D convolution and self-attention within a unified framework. Our results corroborate previous empirical findings and reveal that node connectivity, modulated by window size, is a key factor in shaping spectral functions. Leveraging this insight, we introduce a \textit{spectral-adaptive modulation} (SPAM) mixer, which processes visual features in a spectral-adaptive manner using multi-scale convolutional kernels and a spectral re-scaling mechanism to refine spectral components. Based on SPAM, we develop SPANetV2 as a novel vision backbone. Extensive experiments demonstrate that SPANetV2 outperforms state-of-the-art models across multiple vision tasks, including ImageNet-1K classification, COCO object detection, and ADE20K semantic segmentation.

Via

Access Paper or Ask Questions

SPANet: Frequency-balancing Token Mixer using Spectral Pooling Aggregation Modulation

Aug 22, 2023

Guhnoo Yun, Juhan Yoo, Kijung Kim, Jeongho Lee, Dong Hwan Kim

Abstract:Recent studies show that self-attentions behave like low-pass filters (as opposed to convolutions) and enhancing their high-pass filtering capability improves model performance. Contrary to this idea, we investigate existing convolution-based models with spectral analysis and observe that improving the low-pass filtering in convolution operations also leads to performance improvement. To account for this observation, we hypothesize that utilizing optimal token mixers that capture balanced representations of both high- and low-frequency components can enhance the performance of models. We verify this by decomposing visual features into the frequency domain and combining them in a balanced manner. To handle this, we replace the balancing problem with a mask filtering problem in the frequency domain. Then, we introduce a novel token-mixer named SPAM and leverage it to derive a MetaFormer model termed as SPANet. Experimental results show that the proposed method provides a way to achieve this balance, and the balanced representations of both high- and low-frequency components can improve the performance of models on multiple computer vision tasks. Our code is available at $\href{https://doranlyong.github.io/projects/spanet/}{\text{https://doranlyong.github.io/projects/spanet/}}$.

* Accepted paper at ICCV 2023

Via

Access Paper or Ask Questions

CycleMorph: Cycle Consistent Unsupervised Deformable Image Registration

Aug 13, 2020

Boah Kim, Dong Hwan Kim, Seong Ho Park, Jieun Kim, June-Goo Lee, Jong Chul Ye

Figure 1 for CycleMorph: Cycle Consistent Unsupervised Deformable Image Registration

Figure 2 for CycleMorph: Cycle Consistent Unsupervised Deformable Image Registration

Figure 3 for CycleMorph: Cycle Consistent Unsupervised Deformable Image Registration

Figure 4 for CycleMorph: Cycle Consistent Unsupervised Deformable Image Registration

Abstract:Image registration is a fundamental task in medical image analysis. Recently, deep learning based image registration methods have been extensively investigated due to their excellent performance despite the ultra-fast computational time. However, the existing deep learning methods still have limitation in the preservation of original topology during the deformation with registration vector fields. To address this issues, here we present a cycle-consistent deformable image registration. The cycle consistency enhances image registration performance by providing an implicit regularization to preserve topology during the deformation. The proposed method is so flexible that can be applied for both 2D and 3D registration problems for various applications, and can be easily extended to multi-scale implementation to deal with the memory issues in large volume registration. Experimental results on various datasets from medical and non-medical applications demonstrate that the proposed method provides effective and accurate registration on diverse image pairs within a few seconds. Qualitative and quantitative evaluations on deformation fields also verify the effectiveness of the cycle consistency of the proposed method.

Via

Access Paper or Ask Questions

Planning for target retrieval using a robotic manipulator in cluttered and occluded environments

Jul 09, 2019

Changjoo Nam, Jinhwi Lee, Younggil Cho, Jeongho Lee, Dong Hwan Kim, ChangHwan Kim

Figure 1 for Planning for target retrieval using a robotic manipulator in cluttered and occluded environments

Figure 2 for Planning for target retrieval using a robotic manipulator in cluttered and occluded environments

Figure 3 for Planning for target retrieval using a robotic manipulator in cluttered and occluded environments

Figure 4 for Planning for target retrieval using a robotic manipulator in cluttered and occluded environments

Abstract:This paper presents planning algorithms for a robotic manipulator with a fixed base in order to grasp a target object in cluttered environments. We consider a configuration of objects in a confined space with a high density so no collision-free path to the target exists. The robot must relocate some objects to retrieve the target while avoiding collisions. For fast completion of the retrieval task, the robot needs to compute a plan optimizing an appropriate objective value directly related to the execution time of the relocation plan. We propose planning algorithms that aim to minimize the number of objects to be relocated. Our objective value is appropriate for the object retrieval task because grasping and releasing objects often dominate the total running time. In addition to the algorithm working in fully known and static environments, we propose algorithms that can deal with uncertain and dynamic situations incurred by occluded views. The proposed algorithms are shown to be complete and run in polynomial time. Our methods reduce the total running time significantly compared to a baseline method (e.g., 25.1% of reduction in a known static environment with 10 objects

* 8 pages, 14 figures

Via

Access Paper or Ask Questions

Unsupervised Deformable Image Registration Using Cycle-Consistent CNN

Jul 02, 2019

Boah Kim, Jieun Kim, June-Goo Lee, Dong Hwan Kim, Seong Ho Park, Jong Chul Ye

Figure 1 for Unsupervised Deformable Image Registration Using Cycle-Consistent CNN

Figure 2 for Unsupervised Deformable Image Registration Using Cycle-Consistent CNN

Figure 3 for Unsupervised Deformable Image Registration Using Cycle-Consistent CNN

Figure 4 for Unsupervised Deformable Image Registration Using Cycle-Consistent CNN

Abstract:Medical image registration is one of the key processing steps for biomedical image analysis such as cancer diagnosis. Recently, deep learning based supervised and unsupervised image registration methods have been extensively studied due to its excellent performance in spite of ultra-fast computational time compared to the classical approaches. In this paper, we present a novel unsupervised medical image registration method that trains deep neural network for deformable registration of 3D volumes using a cycle-consistency. Thanks to the cycle consistency, the proposed deep neural networks can take diverse pair of image data with severe deformation for accurate registration. Experimental results using multiphase liver CT images demonstrate that our method provides very precise 3D image registration within a few seconds, resulting in more accurate cancer size estimation.

* accepted for MICCAI 2019

Via

Access Paper or Ask Questions