Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ki-Ung Song

Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features

Apr 01, 2025

Jewon Lee, Ki-Ung Song, Seungmin Yang, Donguk Lim, Jaeyeon Kim, Wooksu Shin, Bo-Kyeong Kim, Yong Jae Lee, Tae-Ho Kim

Abstract:Visual token reduction lowers inference costs caused by extensive image features in large vision-language models (LVLMs). Unlike relevant studies that prune tokens in self-attention-only LVLMs, our work uniquely addresses cross-attention-based models, which achieve superior performance. We identify that the key-value (KV) cache size for image tokens in cross-attention layers significantly exceeds that of text tokens in self-attention layers, posing a major compute bottleneck. To mitigate this issue, we exploit the sparse nature in cross-attention maps to selectively prune redundant visual features. Our Trimmed Llama effectively reduces KV cache demands without requiring additional training. By benefiting from 50%-reduced visual features, our model can reduce inference latency and memory usage while achieving benchmark parity.

* accepted at CVPR 2025 Workshop on ELVM

Via

Access Paper or Ask Questions

Applying Regularized Schrödinger-Bridge-Based Stochastic Process in Generative Modeling

Aug 15, 2022

Ki-Ung Song

Figure 1 for Applying Regularized Schrödinger-Bridge-Based Stochastic Process in Generative Modeling

Figure 2 for Applying Regularized Schrödinger-Bridge-Based Stochastic Process in Generative Modeling

Figure 3 for Applying Regularized Schrödinger-Bridge-Based Stochastic Process in Generative Modeling

Figure 4 for Applying Regularized Schrödinger-Bridge-Based Stochastic Process in Generative Modeling

Abstract:Compared to the existing function-based models in deep generative modeling, the recently proposed diffusion models have achieved outstanding performance with a stochastic-process-based approach. But a long sampling time is required for this approach due to many timesteps for discretization. Schr\"odinger bridge (SB)-based models attempt to tackle this problem by training bidirectional stochastic processes between distributions. However, they still have a slow sampling speed compared to generative models such as generative adversarial networks. And due to the training of the bidirectional stochastic processes, they require a relatively long training time. Therefore, this study tried to reduce the number of timesteps and training time required and proposed regularization terms to the existing SB models to make the bidirectional stochastic processes consistent and stable with a reduced number of timesteps. Each regularization term was integrated into a single term to enable more efficient training in computation time and memory usage. Applying this regularized stochastic process to various generation tasks, the desired translations between different distributions were obtained, and accordingly, the possibility of generative modeling based on a stochastic process with faster sampling speed could be confirmed. The code is available at https://github.com/KiUngSong/RSB.

* A dissertation submitted for the degree of Master of Science

Via

Access Paper or Ask Questions

FS-NCSR: Increasing Diversity of the Super-Resolution Space via Frequency Separation and Noise-Conditioned Normalizing Flow

Apr 20, 2022

Ki-Ung Song, Dongseok Shim, Kang-wook Kim, Jae-young Lee, Younggeun Kim

Figure 1 for FS-NCSR: Increasing Diversity of the Super-Resolution Space via Frequency Separation and Noise-Conditioned Normalizing Flow

Figure 2 for FS-NCSR: Increasing Diversity of the Super-Resolution Space via Frequency Separation and Noise-Conditioned Normalizing Flow

Figure 3 for FS-NCSR: Increasing Diversity of the Super-Resolution Space via Frequency Separation and Noise-Conditioned Normalizing Flow

Figure 4 for FS-NCSR: Increasing Diversity of the Super-Resolution Space via Frequency Separation and Noise-Conditioned Normalizing Flow

Abstract:Super-resolution suffers from an innate ill-posed problem that a single low-resolution (LR) image can be from multiple high-resolution (HR) images. Recent studies on the flow-based algorithm solve this ill-posedness by learning the super-resolution space and predicting diverse HR outputs. Unfortunately, the diversity of the super-resolution outputs is still unsatisfactory, and the outputs from the flow-based model usually suffer from undesired artifacts which causes low-quality outputs. In this paper, we propose FS-NCSR which produces diverse and high-quality super-resolution outputs using frequency separation and noise conditioning compared to the existing flow-based approaches. As the sharpness and high-quality detail of the image rely on its high-frequency information, FS-NCSR only estimates the high-frequency information of the high-resolution outputs without redundant low-frequency components. Through this, FS-NCSR significantly improves the diversity score without significant image quality degradation compared to the NCSR, the winner of the previous NTIRE 2021 challenge.

* CVPRW 2022, First three authors are equally contributed

Via

Access Paper or Ask Questions