Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bo Pan

Alzheimer's Disease Neuroimaging Initiative, the Australian Imaging Biomarkers and Lifestyle flagship study of ageing

VIS-Shepherd: Constructing Critic for LLM-based Data Visualization Generation

Jun 16, 2025

Bo Pan, Yixiao Fu, Ke Wang, Junyu Lu, Lunke Pan, Ziyang Qian, Yuhan Chen, Guoliang Wang, Yitao Zhou, Li Zheng(+8 more)

Abstract:Data visualization generation using Large Language Models (LLMs) has shown promising results but often produces suboptimal visualizations that require human intervention for improvement. In this work, we introduce VIS-Shepherd, a specialized Multimodal Large Language Model (MLLM)-based critic to evaluate and provide feedback for LLM-generated data visualizations. At the core of our approach is a framework to construct a high-quality visualization critique dataset, where we collect human-created visualization instances, synthesize corresponding LLM-generated instances, and construct high-quality critiques. We conduct both model-based automatic evaluation and human preference studies to evaluate the effectiveness of our approach. Our experiments show that even small (7B parameters) open-source MLLM models achieve substantial performance gains by leveraging our high-quality visualization critique dataset, reaching levels comparable to much larger open-source or even proprietary models. Our work demonstrates significant potential for MLLM-based automated visualization critique and indicates promising directions for enhancing LLM-based data visualization generation. Our project page: https://github.com/bopan3/VIS-Shepherd.

Via

Access Paper or Ask Questions

Can Past Experience Accelerate LLM Reasoning?

May 27, 2025

Bo Pan, Liang Zhao

Figure 1 for Can Past Experience Accelerate LLM Reasoning?

Figure 2 for Can Past Experience Accelerate LLM Reasoning?

Figure 3 for Can Past Experience Accelerate LLM Reasoning?

Figure 4 for Can Past Experience Accelerate LLM Reasoning?

Abstract:Allocating more compute to large language models (LLMs) reasoning has generally been demonstrated to improve their effectiveness, but also results in increased inference time. In contrast, humans can perform tasks faster and better with increased experience and exposure. Hence, this paper aims to investigate the question: Can LLMs also become faster at reasoning through recurrent exposure on relevant tasks, and if so, how can it be achieved? To address these questions, we first formalize the problem setting of LLM reasoning speedup systematically in the dimensions of task relevancy and compute budget calculation. We then propose SpeedupLLM, a theoretically guaranteed framework to implement and benchmark such reasoning speedup behaviour based on adaptive compute allocation and memory mechanisms. We further conduct comprehensive experiments to benchmark such behaviour across different question similarity levels, memory methods, and reasoning methods. Results show that LLMs can generally reason faster with past experience, achieving up to a 56% reduction in compute cost when equipped with appropriate memory and reasoning methods.

Via

Access Paper or Ask Questions

Exploring Multimodal Prompt for Visualization Authoring with Large Language Models

Apr 18, 2025

Zhen Wen, Luoxuan Weng, Yinghao Tang, Runjin Zhang, Yuxin Liu, Bo Pan, Minfeng Zhu, Wei Chen

Figure 1 for Exploring Multimodal Prompt for Visualization Authoring with Large Language Models

Figure 2 for Exploring Multimodal Prompt for Visualization Authoring with Large Language Models

Figure 3 for Exploring Multimodal Prompt for Visualization Authoring with Large Language Models

Figure 4 for Exploring Multimodal Prompt for Visualization Authoring with Large Language Models

Abstract:Recent advances in large language models (LLMs) have shown great potential in automating the process of visualization authoring through simple natural language utterances. However, instructing LLMs using natural language is limited in precision and expressiveness for conveying visualization intent, leading to misinterpretation and time-consuming iterations. To address these limitations, we conduct an empirical study to understand how LLMs interpret ambiguous or incomplete text prompts in the context of visualization authoring, and the conditions making LLMs misinterpret user intent. Informed by the findings, we introduce visual prompts as a complementary input modality to text prompts, which help clarify user intent and improve LLMs' interpretation abilities. To explore the potential of multimodal prompting in visualization authoring, we design VisPilot, which enables users to easily create visualizations using multimodal prompts, including text, sketches, and direct manipulations on existing visualizations. Through two case studies and a controlled user study, we demonstrate that VisPilot provides a more intuitive way to create visualizations without affecting the overall task efficiency compared to text-only prompting approaches. Furthermore, we analyze the impact of text and visual prompts in different visualization tasks. Our findings highlight the importance of multimodal prompting in improving the usability of LLMs for visualization authoring. We discuss design implications for future visualization systems and provide insights into how multimodal prompts can enhance human-AI collaboration in creative visualization tasks. All materials are available at https://OSF.IO/2QRAK.

* 11 pages, 8 figures

Via

Access Paper or Ask Questions

MEGL: Multimodal Explanation-Guided Learning

Nov 20, 2024

Yifei Zhang, Tianxu Jiang, Bo Pan, Jingyu Wang, Guangji Bai, Liang Zhao

Figure 1 for MEGL: Multimodal Explanation-Guided Learning

Figure 2 for MEGL: Multimodal Explanation-Guided Learning

Figure 3 for MEGL: Multimodal Explanation-Guided Learning

Figure 4 for MEGL: Multimodal Explanation-Guided Learning

Abstract:Explaining the decision-making processes of Artificial Intelligence (AI) models is crucial for addressing their "black box" nature, particularly in tasks like image classification. Traditional eXplainable AI (XAI) methods typically rely on unimodal explanations, either visual or textual, each with inherent limitations. Visual explanations highlight key regions but often lack rationale, while textual explanations provide context without spatial grounding. Further, both explanation types can be inconsistent or incomplete, limiting their reliability. To address these challenges, we propose a novel Multimodal Explanation-Guided Learning (MEGL) framework that leverages both visual and textual explanations to enhance model interpretability and improve classification performance. Our Saliency-Driven Textual Grounding (SDTG) approach integrates spatial information from visual explanations into textual rationales, providing spatially grounded and contextually rich explanations. Additionally, we introduce Textual Supervision on Visual Explanations to align visual explanations with textual rationales, even in cases where ground truth visual annotations are missing. A Visual Explanation Distribution Consistency loss further reinforces visual coherence by aligning the generated visual explanations with dataset-level patterns, enabling the model to effectively learn from incomplete multimodal supervision. We validate MEGL on two new datasets, Object-ME and Action-ME, for image classification with multimodal explanations. Experimental results demonstrate that MEGL outperforms previous approaches in prediction accuracy and explanation quality across both visual and textual domains. Our code will be made available upon the acceptance of the paper.

Via

Access Paper or Ask Questions

TAGExplainer: Narrating Graph Explanations for Text-Attributed Graph Learning Models

Oct 20, 2024

Bo Pan, Zhen Xiong, Guanchen Wu, Zheng Zhang, Yifei Zhang, Liang Zhao

Abstract:Representation learning of Text-Attributed Graphs (TAGs) has garnered significant attention due to its applications in various domains, including recommendation systems and social networks. Despite advancements in TAG learning methodologies, challenges remain in explainability due to the black-box nature of existing TAG representation learning models. This paper presents TAGExplainer, the first method designed to generate natural language explanations for TAG learning. TAGExplainer employs a generative language model that maps input-output pairs to explanations reflecting the model's decision-making process. To address the lack of annotated ground truth explanations in real-world scenarios, we propose first generating pseudo-labels that capture the model's decisions from saliency-based explanations, then the pseudo-label generator is iteratively trained based on three training objectives focusing on faithfulness and brevity via Expert Iteration, to improve the quality of generated pseudo-labels. The high-quality pseudo-labels are finally utilized to train an end-to-end explanation generator model. Extensive experiments are conducted to demonstrate the effectiveness of TAGExplainer in producing faithful and concise natural language explanations.

Via

Access Paper or Ask Questions

Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark

Aug 14, 2024

Senmao Wang, Haifan Gong, Runmeng Cui, Boyao Wan, Yicheng Liu, Zhonglin Hu, Haiqing Yang, Jingyang Zhou, Bo Pan, Lin Lin(+1 more)

Figure 1 for Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark

Figure 2 for Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark

Figure 3 for Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark

Figure 4 for Costal Cartilage Segmentation with Topology Guided Deformable Mamba: Method and Benchmark

Abstract:Costal cartilage segmentation is crucial to various medical applications, necessitating precise and reliable techniques due to its complex anatomy and the importance of accurate diagnosis and surgical planning. We propose a novel deep learning-based approach called topology-guided deformable Mamba (TGDM) for costal cartilage segmentation. The TGDM is tailored to capture the intricate long-range costal cartilage relationships. Our method leverages a deformable model that integrates topological priors to enhance the adaptability and accuracy of the segmentation process. Furthermore, we developed a comprehensive benchmark that contains 165 cases for costal cartilage segmentation. This benchmark sets a new standard for evaluating costal cartilage segmentation techniques and provides a valuable resource for future research. Extensive experiments conducted on both in-domain benchmarks and out-of domain test sets demonstrate the superiority of our approach over existing methods, showing significant improvements in segmentation precision and robustness.

Via

Access Paper or Ask Questions

TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations

May 27, 2024

Zheng Zhang, Yuntong Hu, Bo Pan, Chen Ling, Liang Zhao

Figure 1 for TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations

Figure 2 for TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations

Figure 3 for TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations

Figure 4 for TAGA: Text-Attributed Graph Self-Supervised Learning by Synergizing Graph and Text Mutual Transformations

Abstract:Text-Attributed Graphs (TAGs) enhance graph structures with natural language descriptions, enabling detailed representation of data and their relationships across a broad spectrum of real-world scenarios. Despite the potential for deeper insights, existing TAG representation learning primarily relies on supervised methods, necessitating extensive labeled data and limiting applicability across diverse contexts. This paper introduces a new self-supervised learning framework, Text-And-Graph Multi-View Alignment (TAGA), which overcomes these constraints by integrating TAGs' structural and semantic dimensions. TAGA constructs two complementary views: Text-of-Graph view, which organizes node texts into structured documents based on graph topology, and the Graph-of-Text view, which converts textual nodes and connections into graph data. By aligning representations from both views, TAGA captures joint textual and structural information. In addition, a novel structure-preserving random walk algorithm is proposed for efficient training on large-sized TAGs. Our framework demonstrates strong performance in zero-shot and few-shot scenarios across eight real-world datasets.

Via

Access Paper or Ask Questions

GRAG: Graph Retrieval-Augmented Generation

May 26, 2024

Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, Liang Zhao

Figure 1 for GRAG: Graph Retrieval-Augmented Generation

Figure 2 for GRAG: Graph Retrieval-Augmented Generation

Figure 3 for GRAG: Graph Retrieval-Augmented Generation

Figure 4 for GRAG: Graph Retrieval-Augmented Generation

Abstract:While Retrieval-Augmented Generation (RAG) enhances the accuracy and relevance of responses by generative language models, it falls short in graph-based contexts where both textual and topological information are important. Naive RAG approaches inherently neglect the structural intricacies of textual graphs, resulting in a critical gap in the generation process. To address this challenge, we introduce $\textbf{Graph Retrieval-Augmented Generation (GRAG)}$, which significantly enhances both the retrieval and generation processes by emphasizing the importance of subgraph structures. Unlike RAG approaches that focus solely on text-based entity retrieval, GRAG maintains an acute awareness of graph topology, which is crucial for generating contextually and factually coherent responses. Our GRAG approach consists of four main stages: indexing of $k$-hop ego-graphs, graph retrieval, soft pruning to mitigate the impact of irrelevant entities, and generation with pruned textual subgraphs. GRAG's core workflow-retrieving textual subgraphs followed by soft pruning-efficiently identifies relevant subgraph structures while avoiding the computational infeasibility typical of exhaustive subgraph searches, which are NP-hard. Moreover, we propose a novel prompting strategy that achieves lossless conversion from textual subgraphs to hierarchical text descriptions. Extensive experiments on graph multi-hop reasoning benchmarks demonstrate that in scenarios requiring multi-hop reasoning on textual graphs, our GRAG approach significantly outperforms current state-of-the-art RAG methods while effectively mitigating hallucinations.

* 14 pages, 4 figures

Via

Access Paper or Ask Questions

Deep Causal Generative Models with Property Control

May 25, 2024

Qilong Zhao, Shiyu Wang, Guangji Bai, Bo Pan, Zhaohui Qin, Liang Zhao

Abstract:Generating data with properties of interest by external users while following the right causation among its intrinsic factors is important yet has not been well addressed jointly. This is due to the long-lasting challenge of jointly identifying key latent variables, their causal relations, and their correlation with properties of interest, as well as how to leverage their discoveries toward causally controlled data generation. To address these challenges, we propose a novel deep generative framework called the Correlation-aware Causal Variational Auto-encoder (C2VAE). This framework simultaneously recovers the correlation and causal relationships between properties using disentangled latent vectors. Specifically, causality is captured by learning the causal graph on latent variables through a structural causal model, while correlation is learned via a novel correlation pooling algorithm. Extensive experiments demonstrate C2VAE's ability to accurately recover true causality and correlation, as well as its superiority in controllable data generation compared to baseline models.

* 13 pages, 6 figures

Via

Access Paper or Ask Questions

ELAD: Explanation-Guided Large Language Models Active Distillation

Feb 20, 2024

Yifei Zhang, Bo Pan, Chen Ling, Yuntong Hu, Liang Zhao

Figure 1 for ELAD: Explanation-Guided Large Language Models Active Distillation

Figure 2 for ELAD: Explanation-Guided Large Language Models Active Distillation

Figure 3 for ELAD: Explanation-Guided Large Language Models Active Distillation

Figure 4 for ELAD: Explanation-Guided Large Language Models Active Distillation

Abstract:The deployment and application of Large Language Models (LLMs) is hindered by their memory inefficiency, computational demands, and the high costs of API inferences. Traditional distillation methods, which transfer the capabilities of LLMs to smaller models, often fail to determine whether the knowledge has been sufficiently transferred, potentially resulting in high costs or incomplete distillation. In this paper, we propose an Explanation-Guided LLMs Active Distillation (ELAD) framework that employs an active learning strategy to optimize the balance between annotation costs and model performance. To improve efficient sample selection, we introduce an explanation-guided sample selection method that identifies samples challenging its reasoning by exploiting uncertainties in explanation steps. Additionally, we present a customized LLM-annotated explanation revision technique where the teacher model detects and corrects flaws in the student model's reasoning. Our experiments across various reasoning datasets demonstrate that our framework significantly enhances the efficiency of LLM knowledge distillation.

Via

Access Paper or Ask Questions