Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kyong Hwan Jin

Towards Lossless Implicit Neural Representation via Bit Plane Decomposition

Feb 28, 2025

Woo Kyoung Han, Byeonghun Lee, Hyunmin Cho, Sunghoon Im, Kyong Hwan Jin

Abstract:We quantify the upper bound on the size of the implicit neural representation (INR) model from a digital perspective. The upper bound of the model size increases exponentially as the required bit-precision increases. To this end, we present a bit-plane decomposition method that makes INR predict bit-planes, producing the same effect as reducing the upper bound of the model size. We validate our hypothesis that reducing the upper bound leads to faster convergence with constant model size. Our method achieves lossless representation in 2D image and audio fitting, even for high bit-depth signals, such as 16-bit, which was previously unachievable. We pioneered the presence of bit bias, which INR prioritizes as the most significant bit (MSB). We expand the application of the INR task to bit depth expansion, lossless image compression, and extreme network quantization. Our source code is available at https://github.com/WooKyoungHan/LosslessINR

Via

Access Paper or Ask Questions

Identity-preserving Distillation Sampling by Fixed-Point Iterator

Feb 27, 2025

SeonHwa Kim, Jiwon Kim, Soobin Park, Donghoon Ahn, Jiwon Kang, Seungryong Kim, Kyong Hwan Jin, Eunju Cha

Abstract:Score distillation sampling (SDS) demonstrates a powerful capability for text-conditioned 2D image and 3D object generation by distilling the knowledge from learned score functions. However, SDS often suffers from blurriness caused by noisy gradients. When SDS meets the image editing, such degradations can be reduced by adjusting bias shifts using reference pairs, but the de-biasing techniques are still corrupted by erroneous gradients. To this end, we introduce Identity-preserving Distillation Sampling (IDS), which compensates for the gradient leading to undesired changes in the results. Based on the analysis that these errors come from the text-conditioned scores, a new regularization technique, called fixed-point iterative regularization (FPR), is proposed to modify the score itself, driving the preservation of the identity even including poses and structures. Thanks to a self-correction by FPR, the proposed method provides clear and unambiguous representations corresponding to the given prompts in image-to-image editing and editable neural radiance field (NeRF). The structural consistency between the source and the edited data is obviously maintained compared to other state-of-the-art methods.

Via

Access Paper or Ask Questions

BF-STVSR: B-Splines and Fourier-Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution

Jan 19, 2025

Eunjin Kim, Hyeonjin Kim, Kyong Hwan Jin, Jaejun Yoo

Abstract:Enhancing low-resolution, low-frame-rate videos to high-resolution, high-frame-rate quality is essential for a seamless user experience, motivating advancements in Continuous Spatial-Temporal Video Super Resolution (C-STVSR). While prior methods employ Implicit Neural Representation (INR) for continuous encoding, they often struggle to capture the complexity of video data, relying on simple coordinate concatenation and pre-trained optical flow network for motion representation. Interestingly, we find that adding position encoding, contrary to common observations, does not improve-and even degrade performance. This issue becomes particularly pronounced when combined with pre-trained optical flow networks, which can limit the model's flexibility. To address these issues, we propose BF-STVSR, a C-STVSR framework with two key modules tailored to better represent spatial and temporal characteristics of video: 1) B-spline Mapper for smooth temporal interpolation, and 2) Fourier Mapper for capturing dominant spatial frequencies. Our approach achieves state-of-the-art PSNR and SSIM performance, showing enhanced spatial details and natural temporal consistency.

* 11pages, 5 figures

Via

Access Paper or Ask Questions

A Noise is Worth Diffusion Guidance

Dec 05, 2024

Donghoon Ahn, Jiwon Kang, Sanghyun Lee, Jaewon Min, Minjae Kim, Wooseok Jang, Hyoungwon Cho, Sayak Paul, SeonHwa Kim, Eunju Cha(+2 more)

Figure 1 for A Noise is Worth Diffusion Guidance

Abstract:Diffusion models excel in generating high-quality images. However, current diffusion models struggle to produce reliable images without guidance methods, such as classifier-free guidance (CFG). Are guidance methods truly necessary? Observing that noise obtained via diffusion inversion can reconstruct high-quality images without guidance, we focus on the initial noise of the denoising pipeline. By mapping Gaussian noise to `guidance-free noise', we uncover that small low-magnitude low-frequency components significantly enhance the denoising process, removing the need for guidance and thus improving both inference throughput and memory. Expanding on this, we propose \ours, a novel method that replaces guidance methods with a single refinement of the initial noise. This refined noise enables high-quality image generation without guidance, within the same diffusion pipeline. Our noise-refining model leverages efficient noise-space learning, achieving rapid convergence and strong performance with just 50K text-image pairs. We validate its effectiveness across diverse metrics and analyze how refined noise can eliminate the need for guidance. See our project page: https://cvlab-kaist.github.io/NoiseRefine/.

* Project page: https://cvlab-kaist.github.io/NoiseRefine/

Via

Access Paper or Ask Questions

BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow

Sep 21, 2024

EungGu Kang, Byeonghun Lee, Sunghoon Im, Kyong Hwan Jin

Figure 1 for BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow

Figure 2 for BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow

Figure 3 for BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow

Figure 4 for BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow

Abstract:Multi frame super-resolution(MFSR) achieves higher performance than single image super-resolution (SISR), because MFSR leverages abundant information from multiple frames. Recent MFSR approaches adapt the deformable convolution network (DCN) to align the frames. However, the existing MFSR suffers from misalignments between the reference and source frames due to the limitations of DCN, such as small receptive fields and the predefined number of kernels. From these problems, existing MFSR approaches struggle to represent high-frequency information. To this end, we propose Deep Burst Multi-scale SR using Fourier Space with Optical Flow (BurstM). The proposed method estimates the optical flow offset for accurate alignment and predicts the continuous Fourier coefficient of each frame for representing high-frequency textures. In addition, we have enhanced the network flexibility by supporting various super-resolution (SR) scale factors with the unimodel. We demonstrate that our method has the highest performance and flexibility than the existing MFSR methods. Our source code is available at https://github.com/Egkang-Luis/burstm

* 12 pages

Via

Access Paper or Ask Questions

JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients

Apr 03, 2024

Woo Kyoung Han, Sunghoon Im, Jaedeok Kim, Kyong Hwan Jin

Figure 1 for JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients

Figure 2 for JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients

Figure 3 for JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients

Figure 4 for JDEC: JPEG Decoding via Enhanced Continuous Cosine Coefficients

Abstract:We propose a practical approach to JPEG image decoding, utilizing a local implicit neural representation with continuous cosine formulation. The JPEG algorithm significantly quantizes discrete cosine transform (DCT) spectra to achieve a high compression rate, inevitably resulting in quality degradation while encoding an image. We have designed a continuous cosine spectrum estimator to address the quality degradation issue that restores the distorted spectrum. By leveraging local DCT formulations, our network has the privilege to exploit dequantization and upsampling simultaneously. Our proposed model enables decoding compressed images directly across different quality factors using a single pre-trained model without relying on a conventional JPEG decoder. As a result, our proposed network achieves state-of-the-art performance in flexible color image JPEG artifact removal tasks. Our source code is available at https://github.com/WooKyoungHan/JDEC.

Via

Access Paper or Ask Questions

Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

Mar 26, 2024

Donghoon Ahn, Hyoungwon Cho, Jaewon Min, Wooseok Jang, Jungwoo Kim, SeonHwa Kim, Hyun Hee Park, Kyong Hwan Jin, Seungryong Kim

Figure 1 for Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

Figure 2 for Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

Figure 3 for Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

Figure 4 for Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

Abstract:Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel sampling guidance, called Perturbed-Attention Guidance (PAG), which improves diffusion sample quality across both unconditional and conditional settings, achieving this without requiring additional training or the integration of external modules. PAG is designed to progressively enhance the structure of samples throughout the denoising process. It involves generating intermediate samples with degraded structure by substituting selected self-attention maps in diffusion U-Net with an identity matrix, by considering the self-attention mechanisms' ability to capture structural information, and guiding the denoising process away from these degraded samples. In both ADM and Stable Diffusion, PAG surprisingly improves sample quality in conditional and even unconditional scenarios. Moreover, PAG significantly improves the baseline performance in various downstream tasks where existing guidances such as CG or CFG cannot be fully utilized, including ControlNet with empty prompts and image restoration such as inpainting and deblurring.

* Project page is available at https://ku-cvlab.github.io/Perturbed-Attention-Guidance

Via

Access Paper or Ask Questions

BroadBEV: Collaborative LiDAR-camera Fusion for Broad-sighted Bird's Eye View Map Construction

Sep 25, 2023

Minsu Kim, Giseop Kim, Kyong Hwan Jin, Sunwook Choi

Abstract:A recent sensor fusion in a Bird's Eye View (BEV) space has shown its utility in various tasks such as 3D detection, map segmentation, etc. However, the approach struggles with inaccurate camera BEV estimation, and a perception of distant areas due to the sparsity of LiDAR points. In this paper, we propose a broad BEV fusion (BroadBEV) that addresses the problems with a spatial synchronization approach of cross-modality. Our strategy aims to enhance camera BEV estimation for a broad-sighted perception while simultaneously improving the completion of LiDAR's sparsity in the entire BEV space. Toward that end, we devise Point-scattering that scatters LiDAR BEV distribution to camera depth distribution. The method boosts the learning of depth estimation of the camera branch and induces accurate location of dense camera features in BEV space. For an effective BEV fusion between the spatially synchronized features, we suggest ColFusion that applies self-attention weights of LiDAR and camera BEV features to each other. Our extensive experiments demonstrate that BroadBEV provides a broad-sighted BEV perception with remarkable performance gains.

Via

Access Paper or Ask Questions

Implicit Neural Image Stitching With Enhanced and Blended Feature Reconstruction

Sep 12, 2023

Minsu Kim, Jaewon Lee, Byeonghun Lee, Sunghoon Im, Kyong Hwan Jin

Abstract:Existing frameworks for image stitching often provide visually reasonable stitchings. However, they suffer from blurry artifacts and disparities in illumination, depth level, etc. Although the recent learning-based stitchings relax such disparities, the required methods impose sacrifice of image qualities failing to capture high-frequency details for stitched images. To address the problem, we propose a novel approach, implicit Neural Image Stitching (NIS) that extends arbitrary-scale super-resolution. Our method estimates Fourier coefficients of images for quality-enhancing warps. Then, the suggested model blends color mismatches and misalignment in the latent space and decodes the features into RGB values of stitched images. Our experiments show that our approach achieves improvement in resolving the low-definition imaging of the previous deep image stitching with favorable accelerated image-enhancing methods. Our source code is available at https://github.com/minshu-kim/NIS.

Via

Access Paper or Ask Questions

Learning Residual Elastic Warps for Image Stitching under Dirichlet Boundary Condition

Sep 07, 2023

Minsu Kim, Yongjun Lee, Woo Kyoung Han, Kyong Hwan Jin

Figure 1 for Learning Residual Elastic Warps for Image Stitching under Dirichlet Boundary Condition

Figure 2 for Learning Residual Elastic Warps for Image Stitching under Dirichlet Boundary Condition

Figure 3 for Learning Residual Elastic Warps for Image Stitching under Dirichlet Boundary Condition

Figure 4 for Learning Residual Elastic Warps for Image Stitching under Dirichlet Boundary Condition

Abstract:Trendy suggestions for learning-based elastic warps enable the deep image stitchings to align images exposed to large parallax errors. Despite the remarkable alignments, the methods struggle with occasional holes or discontinuity between overlapping and non-overlapping regions of a target image as the applied training strategy mostly focuses on overlap region alignment. As a result, they require additional modules such as seam finder and image inpainting for hiding discontinuity and filling holes, respectively. In this work, we suggest Recurrent Elastic Warps (REwarp) that address the problem with Dirichlet boundary condition and boost performances by residual learning for recurrent misalign correction. Specifically, REwarp predicts a homography and a Thin-plate Spline (TPS) under the boundary constraint for discontinuity and hole-free image stitching. Our experiments show the favorable aligns and the competitive computational costs of REwarp compared to the existing stitching methods. Our source code is available at https://github.com/minshu-kim/REwarp.

Via

Access Paper or Ask Questions