Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Panpan Xu

Collaborative LLM Numerical Reasoning with Local Data Protection

Apr 01, 2025

Min Zhang, Yuzhe Lu, Yun Zhou, Panpan Xu, Lin Lee Cheong, Chang-Tien Lu, Haozhu Wang

Abstract:Numerical reasoning over documents, which demands both contextual understanding and logical inference, is challenging for low-capacity local models deployed on computation-constrained devices. Although such complex reasoning queries could be routed to powerful remote models like GPT-4, exposing local data raises significant data leakage concerns. Existing mitigation methods generate problem descriptions or examples for remote assistance. However, the inherent complexity of numerical reasoning hinders the local model from generating logically equivalent queries and accurately inferring answers with remote guidance. In this paper, we present a model collaboration framework with two key innovations: (1) a context-aware synthesis strategy that shifts the query domains while preserving logical consistency; and (2) a tool-based answer reconstruction approach that reuses the remote-generated problem-solving pattern with code snippets. Experimental results demonstrate that our method achieves better reasoning accuracy than solely using local models while providing stronger data protection than fully relying on remote models. Furthermore, our method improves accuracy by 16.2% - 43.6% while reducing data leakage by 2.3% - 44.6% compared to existing data protection approaches.

Via

Access Paper or Ask Questions

Enhancing Multi-hop Reasoning in Vision-Language Models via Self-Distillation with Multi-Prompt Ensembling

Mar 03, 2025

Guande Wu, Huan Song, Yawei Wang, Qiaojing Yan, Yijun Tian, Lin Lee Cheong, Panpan Xu

Abstract:Multi-modal large language models have seen rapid advancement alongside large language models. However, while language models can effectively leverage chain-of-thought prompting for zero or few-shot learning, similar prompting strategies are less effective for multi-modal LLMs due to modality gaps and task complexity. To address this challenge, we explore two prompting approaches: a dual-query method that separates multi-modal input analysis and answer generation into two prompting steps, and an ensemble prompting method that combines multiple prompt variations to arrive at the final answer. Although these approaches enhance the model's reasoning capabilities without fine-tuning, they introduce significant inference overhead. Therefore, building on top of these two prompting techniques, we propose a self-distillation framework such that the model can improve itself without any annotated data. Our self-distillation framework learns representation intervention modules from the reasoning traces collected from ensembled dual-query prompts, in the form of hidden representations. The lightweight intervention modules operate in parallel with the frozen original model, which makes it possible to maintain computational efficiency while significantly improving model capability. We evaluate our method on five widely-used VQA benchmarks, demonstrating its effectiveness in performing multi-hop reasoning for complex tasks.

Via

Access Paper or Ask Questions

A Systematic Survey of Automatic Prompt Optimization Techniques

Feb 24, 2025

Kiran Ramnath, Kang Zhou, Sheng Guan, Soumya Smruti Mishra, Xuan Qi, Zhengyuan Shen, Shuai Wang, Sangmin Woo, Sullam Jeoung, Yawei Wang(+11 more)

Abstract:Since the advent of large language models (LLMs), prompt engineering has been a crucial step for eliciting desired responses for various Natural Language Processing (NLP) tasks. However, prompt engineering remains an impediment for end users due to rapid advances in models, tasks, and associated best practices. To mitigate this, Automatic Prompt Optimization (APO) techniques have recently emerged that use various automated techniques to help improve the performance of LLMs on various tasks. In this paper, we present a comprehensive survey summarizing the current progress and remaining challenges in this field. We provide a formal definition of APO, a 5-part unifying framework, and then proceed to rigorously categorize all relevant works based on their salient features therein. We hope to spur further research guided by our framework.

* 8 main pages, 31 total pages, 1 figure

Via

Access Paper or Ask Questions

VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration

Oct 29, 2024

Dezhan Tu, Danylo Vashchilenko, Yuzhe Lu, Panpan Xu

Figure 1 for VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration

Figure 2 for VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration

Figure 3 for VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration

Figure 4 for VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration

Abstract:Vision-Language Models (VLMs) have demonstrated impressive performance across a versatile set of tasks. A key challenge in accelerating VLMs is storing and accessing the large Key-Value (KV) cache that encodes long visual contexts, such as images or videos. While existing KV cache compression methods are effective for Large Language Models (LLMs), directly migrating them to VLMs yields suboptimal accuracy and speedup. To bridge the gap, we propose VL-Cache, a novel KV cache compression recipe tailored for accelerating VLM inference. In this paper, we first investigate the unique sparsity pattern of VLM attention by distinguishing visual and text tokens in prefill and decoding phases. Based on these observations, we introduce a layer-adaptive sparsity-aware cache budget allocation method that effectively distributes the limited cache budget across different layers, further reducing KV cache size without compromising accuracy. Additionally, we develop a modality-aware token scoring policy to better evaluate the token importance. Empirical results on multiple benchmark datasets demonstrate that retaining only 10% of KV cache achieves accuracy comparable to that with full cache. In a speed benchmark, our method accelerates end-to-end latency of generating 100 tokens by up to 2.33x and speeds up decoding by up to 7.08x, while reducing the memory footprint of KV cache in GPU by 90%.

Via

Access Paper or Ask Questions

Customize Your Own Paired Data via Few-shot Way

May 21, 2024

Jinshu Chen, Bingchuan Li, Miao Hua, Panpan Xu, Qian He

Figure 1 for Customize Your Own Paired Data via Few-shot Way

Figure 2 for Customize Your Own Paired Data via Few-shot Way

Figure 3 for Customize Your Own Paired Data via Few-shot Way

Figure 4 for Customize Your Own Paired Data via Few-shot Way

Abstract:Existing solutions to image editing tasks suffer from several issues. Though achieving remarkably satisfying generated results, some supervised methods require huge amounts of paired training data, which greatly limits their usages. The other unsupervised methods take full advantage of large-scale pre-trained priors, thus being strictly restricted to the domains where the priors are trained on and behaving badly in out-of-distribution cases. The task we focus on is how to enable the users to customize their desired effects through only few image pairs. In our proposed framework, a novel few-shot learning mechanism based on the directional transformations among samples is introduced and expands the learnable space exponentially. Adopting a diffusion model pipeline, we redesign the condition calculating modules in our model and apply several technical improvements. Experimental results demonstrate the capabilities of our method in various cases.

* Accepted by AI4CC CVPR2024 WorkShop

Via

Access Paper or Ask Questions

Efficient generation of realistic guided wave signals for reliability estimation

Mar 25, 2024

Panpan Xu, Robin Jones, Georgios Sarris, Peter Huthwaite

Figure 1 for Efficient generation of realistic guided wave signals for reliability estimation

Figure 2 for Efficient generation of realistic guided wave signals for reliability estimation

Figure 3 for Efficient generation of realistic guided wave signals for reliability estimation

Figure 4 for Efficient generation of realistic guided wave signals for reliability estimation

Abstract:Across non-destructive testing (NDT) and structural health monitoring (SHM), accurate knowledge of the systems' reliability for detecting defects, such as Probability of Detection (POD) analysis is essential to enabling widespread adoption. Traditionally this relies on access to extensive experimental data to cover all critical areas of the parametric space, which becomes expensive, and heavily undermines the benefit such systems bring. In response to these challenges, reliability estimation based on numerical simulation emerges as a practical solution, offering enhanced efficiency and cost-effectiveness. Nevertheless, precise reliability estimation demands that the simulated data faithfully represents the real-world performance. In this context, a numerical framework tailored to generate realistic signals for reliability estimation purposes is presented here, focusing on the application of guided wave SHM for pipe monitoring. It specifically incorporates key characteristics of real signals: random noise and coherent noise caused by the imbalance in transducer performance within guided wave monitoring systems. The effectiveness of our proposed methodology is demonstrated through a comprehensive comparative analysis between simulation-generated signals and experimental signals both individually and statistically. Furthermore, to assess the reliability of a guided wave system in terms of the inspection range for pipe monitoring, a series of POD analyses using simulation-generated data were conducted. The comparison of POD curves derived from ideal and realistic simulation data underscores the necessity of considering coherent noise for accurate POD curve calculations. Moreover, the POD analysis based on realistic simulation-generated data provides a quantitative estimation of the inspection range with more details compared to the current industry practice.

Via

Access Paper or Ask Questions

Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation

Dec 03, 2023

Yuzhe Lu, Sungmin Hong, Yash Shah, Panpan Xu

Figure 1 for Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation

Figure 2 for Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation

Figure 3 for Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation

Figure 4 for Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation

Abstract:Writing radiology reports from medical images requires a high level of domain expertise. It is time-consuming even for trained radiologists and can be error-prone for inexperienced radiologists. It would be appealing to automate this task by leveraging generative AI, which has shown drastic progress in vision and language understanding. In particular, Large Language Models (LLM) have demonstrated impressive capabilities recently and continued to set new state-of-the-art performance on almost all natural language tasks. While many have proposed architectures to combine vision models with LLMs for multimodal tasks, few have explored practical fine-tuning strategies. In this work, we proposed a simple yet effective two-stage fine-tuning protocol to align visual features to LLM's text embedding space as soft visual prompts. Our framework with OpenLLaMA-7B achieved state-of-the-art level performance without domain-specific pretraining. Moreover, we provide detailed analyses of soft visual prompts and attention mechanisms, shedding light on future research directions.

* Accepted to Deep Generative Models for Health Workshop at NeurIPS 2023

Via

Access Paper or Ask Questions

Graph Neural Prompting with Large Language Models

Sep 27, 2023

Yijun Tian, Huan Song, Zichen Wang, Haozhu Wang, Ziqing Hu, Fang Wang, Nitesh V. Chawla, Panpan Xu

Abstract:Large Language Models (LLMs) have shown remarkable generalization capability with exceptional performance in various language modeling tasks. However, they still exhibit inherent limitations in precisely capturing and returning grounded knowledge. While existing work has explored utilizing knowledge graphs to enhance language modeling via joint training and customized model architectures, applying this to LLMs is problematic owing to their large number of parameters and high computational cost. In addition, how to leverage the pre-trained LLMs and avoid training a customized model from scratch remains an open question. In this work, we propose Graph Neural Prompting (GNP), a novel plug-and-play method to assist pre-trained LLMs in learning beneficial knowledge from KGs. GNP encompasses various designs, including a standard graph neural network encoder, a cross-modality pooling module, a domain projector, and a self-supervised link prediction objective. Extensive experiments on multiple datasets demonstrate the superiority of GNP on both commonsense and biomedical reasoning tasks across different LLM sizes and settings.

Via

Access Paper or Ask Questions

ESCAPE: Countering Systematic Errors from Machine's Blind Spots via Interactive Visual Analysis

Mar 16, 2023

Yongsu Ahn, Yu-Ru Lin, Panpan Xu, Zeng Dai

Abstract:Classification models learn to generalize the associations between data samples and their target classes. However, researchers have increasingly observed that machine learning practice easily leads to systematic errors in AI applications, a phenomenon referred to as AI blindspots. Such blindspots arise when a model is trained with training samples (e.g., cat/dog classification) where important patterns (e.g., black cats) are missing or periphery/undesirable patterns (e.g., dogs with grass background) are misleading towards a certain class. Even more sophisticated techniques cannot guarantee to capture, reason about, and prevent the spurious associations. In this work, we propose ESCAPE, a visual analytic system that promotes a human-in-the-loop workflow for countering systematic errors. By allowing human users to easily inspect spurious associations, the system facilitates users to spontaneously recognize concepts associated misclassifications and evaluate mitigation strategies that can reduce biased associations. We also propose two statistical approaches, relative concept association to better quantify the associations between a concept and instances, and debias method to mitigate spurious associations. We demonstrate the utility of our proposed ESCAPE system and statistical measures through extensive evaluation including quantitative experiments, usage scenarios, expert interviews, and controlled user experiments.

Via

Access Paper or Ask Questions

Interactive Visual Pattern Search on Graph Data via Graph Representation Learning

Feb 18, 2022

Huan Song, Zeng Dai, Panpan Xu, Liu Ren

Figure 1 for Interactive Visual Pattern Search on Graph Data via Graph Representation Learning

Figure 2 for Interactive Visual Pattern Search on Graph Data via Graph Representation Learning

Figure 3 for Interactive Visual Pattern Search on Graph Data via Graph Representation Learning

Figure 4 for Interactive Visual Pattern Search on Graph Data via Graph Representation Learning

Abstract:Graphs are a ubiquitous data structure to model processes and relations in a wide range of domains. Examples include control-flow graphs in programs and semantic scene graphs in images. Identifying subgraph patterns in graphs is an important approach to understanding their structural properties. We propose a visual analytics system GraphQ to support human-in-the-loop, example-based, subgraph pattern search in a database containing many individual graphs. To support fast, interactive queries, we use graph neural networks (GNNs) to encode a graph as fixed-length latent vector representation, and perform subgraph matching in the latent space. Due to the complexity of the problem, it is still difficult to obtain accurate one-to-one node correspondences in the matching results that are crucial for visualization and interpretation. We, therefore, propose a novel GNN for node-alignment called NeuroAlign, to facilitate easy validation and interpretation of the query results. GraphQ provides a visual query interface with a query editor and a multi-scale visualization of the results, as well as a user feedback mechanism for refining the results with additional constraints. We demonstrate GraphQ through two example usage scenarios: analyzing reusable subroutines in program workflows and semantic scene graph search in images. Quantitative experiments show that NeuroAlign achieves 19-29% improvement in node-alignment accuracy compared to baseline GNN and provides up to 100x speedup compared to combinatorial algorithms. Our qualitative study with domain experts confirms the effectiveness for both usage scenarios.

* IEEE Transactions on Visualization and Computer Graphics. Published version: https://ieeexplore.ieee.org/abstract/document/9552902

Via

Access Paper or Ask Questions