Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhihui Fu

GreedyPrune: Retenting Critical Visual Token Set for Large Vision Language Models

Jun 16, 2025

Ruiguang Pei, Weiqing Sun, Zhihui Fu, Jun Wang

Abstract:Although Large Vision Language Models (LVLMs) have demonstrated remarkable performance in image understanding tasks, their computational efficiency remains a significant challenge, particularly on resource-constrained devices due to the high cost of processing large numbers of visual tokens. Recently, training-free visual token pruning methods have gained popularity as a low-cost solution to this issue. However, existing approaches suffer from two key limitations: semantic saliency-based strategies primarily focus on high cross-attention visual tokens, often neglecting visual diversity, whereas visual diversity-based methods risk inadvertently discarding semantically important tokens, especially under high compression ratios. In this paper, we introduce GreedyPrune, a training-free plug-and-play visual token pruning algorithm designed to jointly optimize semantic saliency and visual diversity. We formalize the token pruning process as a combinatorial optimization problem and demonstrate that greedy algorithms effectively balance computational efficiency with model accuracy. Extensive experiments validate the effectiveness of our approach, showing that GreedyPrune achieves state-of-the-art accuracy across various multimodal tasks and models while significantly reducing end-to-end inference latency.

Via

Access Paper or Ask Questions

Accelerating Prefilling for Long-Context LLMs via Sparse Pattern Sharing

May 26, 2025

Dan Peng, Zhihui Fu, Zewen Ye, Zhuoran Song, Jun Wang

Abstract:Sparse attention methods exploit the inherent sparsity in attention to speed up the prefilling phase of long-context inference, mitigating the quadratic complexity of full attention computation. While existing sparse attention methods rely on predefined patterns or inaccurate estimations to approximate attention behavior, they often fail to fully capture the true dynamics of attention, resulting in reduced efficiency and compromised accuracy. Instead, we propose a highly accurate sparse attention mechanism that shares similar yet precise attention patterns across heads, enabling a more realistic capture of the dynamic behavior of attention. Our approach is grounded in two key observations: (1) attention patterns demonstrate strong inter-head similarity, and (2) this similarity remains remarkably consistent across diverse inputs. By strategically sharing computed accurate patterns across attention heads, our method effectively captures actual patterns while requiring full attention computation for only a small subset of heads. Comprehensive evaluations demonstrate that our approach achieves superior or comparable speedup relative to state-of-the-art methods while delivering the best overall accuracy.

* Under review

Via

Access Paper or Ask Questions

Joint Similarity Item Exploration and Overlapped User Guidance for Multi-Modal Cross-Domain Recommendation

Feb 22, 2025

Weiming Liu, Chaochao Chen, Jiahe Xu, Xinting Liao, Fan Wang, Xiaolin Zheng, Zhihui Fu, Ruiguang Pei, Jun Wang

Abstract:Cross-Domain Recommendation (CDR) has been widely investigated for solving long-standing data sparsity problem via knowledge sharing across domains. In this paper, we focus on the Multi-Modal Cross-Domain Recommendation (MMCDR) problem where different items have multi-modal information while few users are overlapped across domains. MMCDR is particularly challenging in two aspects: fully exploiting diverse multi-modal information within each domain and leveraging useful knowledge transfer across domains. However, previous methods fail to cluster items with similar characteristics while filtering out inherit noises within different modalities, hurdling the model performance. What is worse, conventional CDR models primarily rely on overlapped users for domain adaptation, making them ill-equipped to handle scenarios where the majority of users are non-overlapped. To fill this gap, we propose Joint Similarity Item Exploration and Overlapped User Guidance (SIEOUG) for solving the MMCDR problem. SIEOUG first proposes similarity item exploration module, which not only obtains pair-wise and group-wise item-item graph knowledge, but also reduces irrelevant noise for multi-modal modeling. Then SIEOUG proposes user-item collaborative filtering module to aggregate user/item embeddings with the attention mechanism for collaborative filtering. Finally SIEOUG proposes overlapped user guidance module with optimal user matching for knowledge sharing across domains. Our empirical study on Amazon dataset with several different tasks demonstrates that SIEOUG significantly outperforms the state-of-the-art models under the MMCDR setting.

Via

Access Paper or Ask Questions

PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs

Jul 01, 2024

Dan Peng, Zhihui Fu, Jun Wang

Figure 1 for PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs

Figure 2 for PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs

Figure 3 for PocketLLM: Enabling On-Device Fine-Tuning for Personalized LLMs

Abstract:Recent advancements in large language models (LLMs) have indeed showcased their impressive capabilities. On mobile devices, the wealth of valuable, non-public data generated daily holds great promise for locally fine-tuning personalized LLMs, while maintaining privacy through on-device processing. However, the constraints of mobile device resources pose challenges to direct on-device LLM fine-tuning, mainly due to the memory-intensive nature of derivative-based optimization required for saving gradients and optimizer states. To tackle this, we propose employing derivative-free optimization techniques to enable on-device fine-tuning of LLM, even on memory-limited mobile devices. Empirical results demonstrate that the RoBERTa-large model and OPT-1.3B can be fine-tuned locally on the OPPO Reno 6 smartphone using around 4GB and 6.5GB of memory respectively, using derivative-free optimization techniques. This highlights the feasibility of on-device LLM fine-tuning on mobile devices, paving the way for personalized LLMs on resource-constrained devices while safeguarding data privacy.

* Accepted to the ACL 2024 Workshop on Privacy in Natural Language Processing (PrivateNLP)

Via

Access Paper or Ask Questions

Arbitrary-sized Image Training and Residual Kernel Learning: Towards Image Fraud Identification

May 22, 2020

Hongyu Li, Xiaogang Huang, Zhihui Fu, Xiaolin Li

Figure 1 for Arbitrary-sized Image Training and Residual Kernel Learning: Towards Image Fraud Identification

Figure 2 for Arbitrary-sized Image Training and Residual Kernel Learning: Towards Image Fraud Identification

Figure 3 for Arbitrary-sized Image Training and Residual Kernel Learning: Towards Image Fraud Identification

Figure 4 for Arbitrary-sized Image Training and Residual Kernel Learning: Towards Image Fraud Identification

Abstract:Preserving original noise residuals in images are critical to image fraud identification. Since the resizing operation during deep learning will damage the microstructures of image noise residuals, we propose a framework for directly training images of original input scales without resizing. Our arbitrary-sized image training method mainly depends on the pseudo-batch gradient descent (PBGD), which bridges the gap between the input batch and the update batch to assure that model updates can normally run for arbitrary-sized images. In addition, a 3-phase alternate training strategy is designed to learn optimal residual kernels for image fraud identification. With the learnt residual kernels and PBGD, the proposed framework achieved the state-of-the-art results in image fraud identification, especially for images with small tampered regions or unseen images with different tampering distributions.

Via

Access Paper or Ask Questions

Two-Layer Mixture Network Ensemble for Apparel Attributes Classification

Jul 11, 2018

Tianqi Han, Zhihui Fu, Hongyu Li

Figure 1 for Two-Layer Mixture Network Ensemble for Apparel Attributes Classification

Figure 2 for Two-Layer Mixture Network Ensemble for Apparel Attributes Classification

Figure 3 for Two-Layer Mixture Network Ensemble for Apparel Attributes Classification

Figure 4 for Two-Layer Mixture Network Ensemble for Apparel Attributes Classification

Abstract:Recognizing apparel attributes has recently drawn great interest in the computer vision community. Methods based on various deep neural networks have been proposed for image classification, which could be applied to apparel attributes recognition. An interesting problem raised is how to ensemble these methods to further improve the accuracy. In this paper, we propose a two-layer mixture framework for ensemble different networks. In the first layer of this framework, two types of ensemble learning methods, bagging and boosting, are separately applied. Different from traditional methods, our bagging process makes use of the whole training set, not random subsets, to train each model in the ensemble, where several differentiated deep networks are used to promote model variance. To avoid the bias of small-scale samples, the second layer only adopts bagging to mix the results obtained with bagging and boosting in the first layer. Experimental results demonstrate that the proposed mixture framework outperforms any individual network model or either independent ensemble method in apparel attributes classification.

* To be published in Proc. of AIFT 2018

Via

Access Paper or Ask Questions