Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sung-Ho Bae

I-INR: Iterative Implicit Neural Representations

Apr 24, 2025

Ali Haider, Muhammad Salman Ali, Maryam Qamar, Tahir Khalil, Soo Ye Kim, Jihyong Oh, Enzo Tartaglione, Sung-Ho Bae

Abstract:Implicit Neural Representations (INRs) have revolutionized signal processing and computer vision by modeling signals as continuous, differentiable functions parameterized by neural networks. However, their inherent formulation as a regression problem makes them prone to regression to the mean, limiting their ability to capture fine details, retain high-frequency information, and handle noise effectively. To address these challenges, we propose Iterative Implicit Neural Representations (I-INRs) a novel plug-and-play framework that enhances signal reconstruction through an iterative refinement process. I-INRs effectively recover high-frequency details, improve robustness to noise, and achieve superior reconstruction quality. Our framework seamlessly integrates with existing INR architectures, delivering substantial performance gains across various tasks. Extensive experiments show that I-INRs outperform baseline methods, including WIRE, SIREN, and Gauss, in diverse computer vision applications such as image restoration, image denoising, and object occupancy prediction.

Via

Access Paper or Ask Questions

ELMGS: Enhancing memory and computation scaLability through coMpression for 3D Gaussian Splatting

Oct 30, 2024

Muhammad Salman Ali, Sung-Ho Bae, Enzo Tartaglione

Figure 1 for ELMGS: Enhancing memory and computation scaLability through coMpression for 3D Gaussian Splatting

Figure 2 for ELMGS: Enhancing memory and computation scaLability through coMpression for 3D Gaussian Splatting

Figure 3 for ELMGS: Enhancing memory and computation scaLability through coMpression for 3D Gaussian Splatting

Figure 4 for ELMGS: Enhancing memory and computation scaLability through coMpression for 3D Gaussian Splatting

Abstract:3D models have recently been popularized by the potentiality of end-to-end training offered first by Neural Radiance Fields and most recently by 3D Gaussian Splatting models. The latter has the big advantage of naturally providing fast training convergence and high editability. However, as the research around these is still in its infancy, there is still a gap in the literature regarding the model's scalability. In this work, we propose an approach enabling both memory and computation scalability of such models. More specifically, we propose an iterative pruning strategy that removes redundant information encoded in the model. We also enhance compressibility for the model by including in the optimization strategy a differentiable quantization and entropy coding estimator. Our results on popular benchmarks showcase the effectiveness of the proposed approach and open the road to the broad deployability of such a solution even on resource-constrained devices.

Via

Access Paper or Ask Questions

Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning

Jun 26, 2024

Muhammad Salman Ali, Maryam Qamar, Sung-Ho Bae, Enzo Tartaglione

Figure 1 for Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning

Figure 2 for Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning

Figure 3 for Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning

Figure 4 for Trimming the Fat: Efficient Compression of 3D Gaussian Splats through Pruning

Abstract:In recent times, the utilization of 3D models has gained traction, owing to the capacity for end-to-end training initially offered by Neural Radiance Fields and more recently by 3D Gaussian Splatting (3DGS) models. The latter holds a significant advantage by inherently easing rapid convergence during training and offering extensive editability. However, despite rapid advancements, the literature still lives in its infancy regarding the scalability of these models. In this study, we take some initial steps in addressing this gap, showing an approach that enables both the memory and computational scalability of such models. Specifically, we propose "Trimming the fat", a post-hoc gradient-informed iterative pruning technique to eliminate redundant information encoded in the model. Our experimental findings on widely acknowledged benchmarks attest to the effectiveness of our approach, revealing that up to 75% of the Gaussians can be removed while maintaining or even improving upon baseline performance. Our approach achieves around 50$\times$ compression while preserving performance similar to the baseline model, and is able to speed-up computation up to 600~FPS.

Via

Access Paper or Ask Questions

Revisiting Learning-based Video Motion Magnification for Real-time Processing

Mar 04, 2024

Hyunwoo Ha, Oh Hyun-Bin, Kim Jun-Seong, Kwon Byung-Ki, Kim Sung-Bin, Linh-Tam Tran, Ji-Yun Kim, Sung-Ho Bae, Tae-Hyun Oh

Figure 1 for Revisiting Learning-based Video Motion Magnification for Real-time Processing

Figure 2 for Revisiting Learning-based Video Motion Magnification for Real-time Processing

Figure 3 for Revisiting Learning-based Video Motion Magnification for Real-time Processing

Figure 4 for Revisiting Learning-based Video Motion Magnification for Real-time Processing

Abstract:Video motion magnification is a technique to capture and amplify subtle motion in a video that is invisible to the naked eye. The deep learning-based prior work successfully demonstrates the modelling of the motion magnification problem with outstanding quality compared to conventional signal processing-based ones. However, it still lags behind real-time performance, which prevents it from being extended to various online applications. In this paper, we investigate an efficient deep learning-based motion magnification model that runs in real time for full-HD resolution videos. Due to the specified network design of the prior art, i.e. inhomogeneous architecture, the direct application of existing neural architecture search methods is complicated. Instead of automatic search, we carefully investigate the architecture module by module for its role and importance in the motion magnification task. Two key findings are 1) Reducing the spatial resolution of the latent motion representation in the decoder provides a good trade-off between computational efficiency and task quality, and 2) surprisingly, only a single linear layer and a single branch in the encoder are sufficient for the motion magnification task. Based on these findings, we introduce a real-time deep learning-based motion magnification model with4.2X fewer FLOPs and is 2.7X faster than the prior art while maintaining comparable quality.

* 19 pages

Via

Access Paper or Ask Questions

Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model

Feb 08, 2024

Junghun Cha, Ali Haider, Seoyun Yang, Hoeyeong Jin, Subin Yang, A. F. M. Shahab Uddin, Jaehyoung Kim, Soo Ye Kim, Sung-Ho Bae

Figure 1 for Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model

Figure 2 for Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model

Figure 3 for Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model

Figure 4 for Descanning: From Scanned to the Original Images with a Color Correction Diffusion Model

Abstract:A significant volume of analog information, i.e., documents and images, have been digitized in the form of scanned copies for storing, sharing, and/or analyzing in the digital world. However, the quality of such contents is severely degraded by various distortions caused by printing, storing, and scanning processes in the physical world. Although restoring high-quality content from scanned copies has become an indispensable task for many products, it has not been systematically explored, and to the best of our knowledge, no public datasets are available. In this paper, we define this problem as Descanning and introduce a new high-quality and large-scale dataset named DESCAN-18K. It contains 18K pairs of original and scanned images collected in the wild containing multiple complex degradations. In order to eliminate such complex degradations, we propose a new image restoration model called DescanDiffusion consisting of a color encoder that corrects the global color degradation and a conditional denoising diffusion probabilistic model (DDPM) that removes local degradations. To further improve the generalization ability of DescanDiffusion, we also design a synthetic data generation scheme by reproducing prominent degradations in scanned images. We demonstrate that our DescanDiffusion outperforms other baselines including commercial restoration products, objectively and subjectively, via comprehensive experiments and analyses.

* Accepted to AAAI 2024

Via

Access Paper or Ask Questions

MST-compression: Compressing and Accelerating Binary Neural Networks with Minimum Spanning Tree

Aug 26, 2023

Quang Hieu Vo, Linh-Tam Tran, Sung-Ho Bae, Lok-Won Kim, Choong Seon Hong

Abstract:Binary neural networks (BNNs) have been widely adopted to reduce the computational cost and memory storage on edge-computing devices by using one-bit representation for activations and weights. However, as neural networks become wider/deeper to improve accuracy and meet practical requirements, the computational burden remains a significant challenge even on the binary version. To address these issues, this paper proposes a novel method called Minimum Spanning Tree (MST) compression that learns to compress and accelerate BNNs. The proposed architecture leverages an observation from previous works that an output channel in a binary convolution can be computed using another output channel and XNOR operations with weights that differ from the weights of the reused channel. We first construct a fully connected graph with vertices corresponding to output channels, where the distance between two vertices is the number of different values between the weight sets used for these outputs. Then, the MST of the graph with the minimum depth is proposed to reorder output calculations, aiming to reduce computational cost and latency. Moreover, we propose a new learning algorithm to reduce the total MST distance during training. Experimental results on benchmark models demonstrate that our method achieves significant compression ratios with negligible accuracy drops, making it a promising approach for resource-constrained edge-computing devices.

* 11 pages, 9 figures, ICCV 2023

Via

Access Paper or Ask Questions

Faster Segment Anything: Towards Lightweight SAM for Mobile Applications

Jul 01, 2023

Chaoning Zhang, Dongshen Han, Yu Qiao, Jung Uk Kim, Sung-Ho Bae, Seungkyu Lee, Choong Seon Hong

Abstract:Segment Anything Model (SAM) has attracted significant attention due to its impressive zero-shot transfer performance and high versatility for numerous vision applications (like image editing with fine-grained control). Many of such applications need to be run on resource-constraint edge devices, like mobile phones. In this work, we aim to make SAM mobile-friendly by replacing the heavyweight image encoder with a lightweight one. A naive way to train such a new SAM as in the original SAM paper leads to unsatisfactory performance, especially when limited training sources are available. We find that this is mainly caused by the coupled optimization of the image encoder and mask decoder, motivated by which we propose decoupled distillation. Concretely, we distill the knowledge from the heavy image encoder (ViT-H in the original SAM) to a lightweight image encoder, which can be automatically compatible with the mask decoder in the original SAM. The training can be completed on a single GPU within less than one day, and the resulting lightweight SAM is termed MobileSAM which is more than 60 times smaller yet performs on par with the original SAM. For inference speed, With a single GPU, MobileSAM runs around 10ms per image: 8ms on the image encoder and 4ms on the mask decoder. With superior performance, our MobileSAM is around 5 times faster than the concurrent FastSAM and 7 times smaller, making it more suitable for mobile applications. Moreover, we show that MobileSAM can run relatively smoothly on CPU. The code for our project is provided at \href{https://github.com/ChaoningZhang/MobileSAM}{\textcolor{red}{MobileSAM}}), with a demo showing that MobileSAM can run relatively smoothly on CPU.

* First work to make SAM lightweight for mobile applications

Via

Access Paper or Ask Questions

Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

May 10, 2023

Chenghao Li, Chaoning Zhang, Atish Waghwase, Lik-Hang Lee, Francois Rameau, Yang Yang, Sung-Ho Bae, Choong Seon Hong

Figure 1 for Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Figure 2 for Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Figure 3 for Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Figure 4 for Generative AI meets 3D: A Survey on Text-to-3D in AIGC Era

Abstract:Generative AI (AIGC, a.k.a. AI generated content) has made remarkable progress in the past few years, among which text-guided content generation is the most practical one since it enables the interaction between human instruction and AIGC. Due to the development in text-to-image as well 3D modeling technologies (like NeRF), text-to-3D has become a newly emerging yet highly active research field. Our work conducts the first yet comprehensive survey on text-to-3D to help readers interested in this direction quickly catch up with its fast development. First, we introduce 3D data representations, including both Euclidean data and non-Euclidean data. On top of that, we introduce various foundation technologies as well as summarize how recent works combine those foundation technologies to realize satisfactory text-to-3D. Moreover, we summarize how text-to-3D technology is used in various applications, including avatar generation, texture generation, shape transformation, and scene generation.

Via

Access Paper or Ask Questions

Attack-SAM: Towards Attacking Segment Anything Model With Adversarial Examples

May 08, 2023

Chenshuang Zhang, Chaoning Zhang, Taegoo Kang, Donghun Kim, Sung-Ho Bae, In So Kweon

Figure 1 for Attack-SAM: Towards Attacking Segment Anything Model With Adversarial Examples

Figure 2 for Attack-SAM: Towards Attacking Segment Anything Model With Adversarial Examples

Figure 3 for Attack-SAM: Towards Attacking Segment Anything Model With Adversarial Examples

Figure 4 for Attack-SAM: Towards Attacking Segment Anything Model With Adversarial Examples

Abstract:Segment Anything Model (SAM) has attracted significant attention recently, due to its impressive performance on various downstream tasks in a zero-short manner. Computer vision (CV) area might follow the natural language processing (NLP) area to embark on a path from task-specific vision models toward foundation models. However, deep vision models are widely recognized as vulnerable to adversarial examples, which fool the model to make wrong predictions with imperceptible perturbation. Such vulnerability to adversarial attacks causes serious concerns when applying deep models to security-sensitive applications. Therefore, it is critical to know whether the vision foundation model SAM can also be fooled by adversarial attacks. To the best of our knowledge, our work is the first of its kind to conduct a comprehensive investigation on how to attack SAM with adversarial examples. With the basic attack goal set to mask removal, we investigate the adversarial robustness of SAM in the full white-box setting and transfer-based black-box settings. Beyond the basic goal of mask removal, we further investigate and find that it is possible to generate any desired mask by the adversarial attack.

* The first work to attack Segment Anything Model with adversarial examples

Via

Access Paper or Ask Questions

Segment Anything Model Meets Glass: Mirror and Transparent Objects Cannot Be Easily Detected

Apr 29, 2023

Dongsheng Han, Chaoning Zhang, Yu Qiao, Maryam Qamar, Yuna Jung, SeungKyu Lee, Sung-Ho Bae, Choong Seon Hong

Figure 1 for Segment Anything Model Meets Glass: Mirror and Transparent Objects Cannot Be Easily Detected

Figure 2 for Segment Anything Model Meets Glass: Mirror and Transparent Objects Cannot Be Easily Detected

Figure 3 for Segment Anything Model Meets Glass: Mirror and Transparent Objects Cannot Be Easily Detected

Figure 4 for Segment Anything Model Meets Glass: Mirror and Transparent Objects Cannot Be Easily Detected

Abstract:Meta AI Research has recently released SAM (Segment Anything Model) which is trained on a large segmentation dataset of over 1 billion masks. As a foundation model in the field of computer vision, SAM (Segment Anything Model) has gained attention for its impressive performance in generic object segmentation. Despite its strong capability in a wide range of zero-shot transfer tasks, it remains unknown whether SAM can detect things in challenging setups like transparent objects. In this work, we perform an empirical evaluation of two glass-related challenging scenarios: mirror and transparent objects. We found that SAM often fails to detect the glass in both scenarios, which raises concern for deploying the SAM in safety-critical situations that have various forms of glass.

Via

Access Paper or Ask Questions