Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Changjian Chen

InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts

May 25, 2025

Minzhi Lin, Tianchi Xie, Mengchen Liu, Yilin Ye, Changjian Chen, Shixia Liu

Abstract:Understanding infographic charts with design-driven visual elements (e.g., pictograms, icons) requires both visual recognition and reasoning, posing challenges for multimodal large language models (MLLMs). However, existing visual-question answering benchmarks fall short in evaluating these capabilities of MLLMs due to the lack of paired plain charts and visual-element-based questions. To bridge this gap, we introduce InfoChartQA, a benchmark for evaluating MLLMs on infographic chart understanding. It includes 5,642 pairs of infographic and plain charts, each sharing the same underlying data but differing in visual presentations. We further design visual-element-based questions to capture their unique visual designs and communicative intent. Evaluation of 20 MLLMs reveals a substantial performance decline on infographic charts, particularly for visual-element-based questions related to metaphors. The paired infographic and plain charts enable fine-grained error analysis and ablation studies, which highlight new opportunities for advancing MLLMs in infographic chart understanding. We release InfoChartQA at https://github.com/CoolDawnAnt/InfoChartQA.

Via

Access Paper or Ask Questions

Human-Guided Image Generation for Expanding Small-Scale Training Image Datasets

Dec 24, 2024

Changjian Chen, Fei Lv, Yalong Guan, Pengcheng Wang, Shengjie Yu, Yifan Zhang, Zhuo Tang

Figure 1 for Human-Guided Image Generation for Expanding Small-Scale Training Image Datasets

Figure 2 for Human-Guided Image Generation for Expanding Small-Scale Training Image Datasets

Figure 3 for Human-Guided Image Generation for Expanding Small-Scale Training Image Datasets

Figure 4 for Human-Guided Image Generation for Expanding Small-Scale Training Image Datasets

Abstract:The performance of computer vision models in certain real-world applications (e.g., rare wildlife observation) is limited by the small number of available images. Expanding datasets using pre-trained generative models is an effective way to address this limitation. However, since the automatic generation process is uncontrollable, the generated images are usually limited in diversity, and some of them are undesired. In this paper, we propose a human-guided image generation method for more controllable dataset expansion. We develop a multi-modal projection method with theoretical guarantees to facilitate the exploration of both the original and generated images. Based on the exploration, users refine the prompts and re-generate images for better performance. Since directly refining the prompts is challenging for novice users, we develop a sample-level prompt refinement method to make it easier. With this method, users only need to provide sample-level feedback (e.g., which samples are undesired) to obtain better prompts. The effectiveness of our method is demonstrated through the quantitative evaluation of the multi-modal projection method, improved model performance in the case study for both classification and object detection tasks, and positive feedback from the experts.

* Accepted by TVCG2025

Via

Access Paper or Ask Questions

A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

Sep 05, 2024

Zhen Li, Weikai Yang, Jun Yuan, Jing Wu, Changjian Chen, Yao Ming, Fan Yang, Hui Zhang, Shixia Liu

Figure 1 for A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

Figure 2 for A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

Figure 3 for A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

Figure 4 for A Scalable Matrix Visualization for Understanding Tree Ensemble Classifiers

Abstract:The high performance of tree ensemble classifiers benefits from a large set of rules, which, in turn, makes the models hard to understand. To improve interpretability, existing methods extract a subset of rules for approximation using model reduction techniques. However, by focusing on the reduced rule set, these methods often lose fidelity and ignore anomalous rules that, despite their infrequency, play crucial roles in real-world applications. This paper introduces a scalable visual analysis method to explain tree ensemble classifiers that contain tens of thousands of rules. The key idea is to address the issue of losing fidelity by adaptively organizing the rules as a hierarchy rather than reducing them. To ensure the inclusion of anomalous rules, we develop an anomaly-biased model reduction method to prioritize these rules at each hierarchical level. Synergized with this hierarchical organization of rules, we develop a matrix-based hierarchical visualization to support exploration at different levels of detail. Our quantitative experiments and case studies demonstrate how our method fosters a deeper understanding of both common and anomalous rules, thereby enhancing interpretability without sacrificing comprehensiveness.

* 15 pages, 10 figures

Via

Access Paper or Ask Questions

A Unified Interactive Model Evaluation for Classification, Object Detection, and Instance Segmentation in Computer Vision

Aug 09, 2023

Changjian Chen, Yukai Guo, Fengyuan Tian, Shilong Liu, Weikai Yang, Zhaowei Wang, Jing Wu, Hang Su, Hanspeter Pfister, Shixia Liu

Figure 1 for A Unified Interactive Model Evaluation for Classification, Object Detection, and Instance Segmentation in Computer Vision

Figure 2 for A Unified Interactive Model Evaluation for Classification, Object Detection, and Instance Segmentation in Computer Vision

Figure 3 for A Unified Interactive Model Evaluation for Classification, Object Detection, and Instance Segmentation in Computer Vision

Figure 4 for A Unified Interactive Model Evaluation for Classification, Object Detection, and Instance Segmentation in Computer Vision

Abstract:Existing model evaluation tools mainly focus on evaluating classification models, leaving a gap in evaluating more complex models, such as object detection. In this paper, we develop an open-source visual analysis tool, Uni-Evaluator, to support a unified model evaluation for classification, object detection, and instance segmentation in computer vision. The key idea behind our method is to formulate both discrete and continuous predictions in different tasks as unified probability distributions. Based on these distributions, we develop 1) a matrix-based visualization to provide an overview of model performance; 2) a table visualization to identify the problematic data subsets where the model performs poorly; 3) a grid visualization to display the samples of interest. These visualizations work together to facilitate the model evaluation from a global overview to individual samples. Two case studies demonstrate the effectiveness of Uni-Evaluator in evaluating model performance and making informed improvements.

* Accepted to IEEE VIS 2023

Via

Access Paper or Ask Questions

VideoPro: A Visual Analytics Approach for Interactive Video Programming

Aug 01, 2023

Jianben He, Xingbo Wang, Kam Kwai Wong, Xijie Huang, Changjian Chen, Zixin Chen, Fengjie Wang, Min Zhu, Huamin Qu

Figure 1 for VideoPro: A Visual Analytics Approach for Interactive Video Programming

Figure 2 for VideoPro: A Visual Analytics Approach for Interactive Video Programming

Figure 3 for VideoPro: A Visual Analytics Approach for Interactive Video Programming

Figure 4 for VideoPro: A Visual Analytics Approach for Interactive Video Programming

Abstract:Constructing supervised machine learning models for real-world video analysis require substantial labeled data, which is costly to acquire due to scarce domain expertise and laborious manual inspection. While data programming shows promise in generating labeled data at scale with user-defined labeling functions, the high dimensional and complex temporal information in videos poses additional challenges for effectively composing and evaluating labeling functions. In this paper, we propose VideoPro, a visual analytics approach to support flexible and scalable video data programming for model steering with reduced human effort. We first extract human-understandable events from videos using computer vision techniques and treat them as atomic components of labeling functions. We further propose a two-stage template mining algorithm that characterizes the sequential patterns of these events to serve as labeling function templates for efficient data labeling. The visual interface of VideoPro facilitates multifaceted exploration, examination, and application of the labeling templates, allowing for effective programming of video data at scale. Moreover, users can monitor the impact of programming on model performance and make informed adjustments during the iterative programming process. We demonstrate the efficiency and effectiveness of our approach with two case studies and expert interviews.

* 11 pages, 7 figures

Via

Access Paper or Ask Questions

OoDAnalyzer: Interactive Analysis of Out-of-Distribution Samples

Feb 08, 2020

Changjian Chen, Jun Yuan, Yafeng Lu, Yang Liu, Hang Su, Songtao Yuan, Shixia Liu

Figure 1 for OoDAnalyzer: Interactive Analysis of Out-of-Distribution Samples

Figure 2 for OoDAnalyzer: Interactive Analysis of Out-of-Distribution Samples

Figure 3 for OoDAnalyzer: Interactive Analysis of Out-of-Distribution Samples

Figure 4 for OoDAnalyzer: Interactive Analysis of Out-of-Distribution Samples

Abstract:One major cause of performance degradation in predictive models is that the test samples are not well covered by the training data. Such not well-represented samples are called OoD samples. In this paper, we propose OoDAnalyzer, a visual analysis approach for interactively identifying OoD samples and explaining them in context. Our approach integrates an ensemble OoD detection method and a grid-based visualization. The detection method is improved from deep ensembles by combining more features with algorithms in the same family. To better analyze and understand the OoD samples in context, we have developed a novel kNN-based grid layout algorithm motivated by Hall's theorem. The algorithm approximates the optimal layout and has $O(kN^2)$ time complexity, faster than the grid layout algorithm with overall best performance but $O(N^3)$ time complexity. Quantitative evaluation and case studies were performed on several datasets to demonstrate the effectiveness and usefulness of OoDAnalyzer.

* 14 pages, 13 figures

Via

Access Paper or Ask Questions

Quaternion Convolutional Neural Networks

Mar 02, 2019

Xuanyu Zhu, Yi Xu, Hongteng Xu, Changjian Chen

Figure 1 for Quaternion Convolutional Neural Networks

Figure 2 for Quaternion Convolutional Neural Networks

Figure 3 for Quaternion Convolutional Neural Networks

Figure 4 for Quaternion Convolutional Neural Networks

Abstract:Neural networks in the real domain have been studied for a long time and achieved promising results in many vision tasks for recent years. However, the extensions of the neural network models in other number fields and their potential applications are not fully-investigated yet. Focusing on color images, which can be naturally represented as quaternion matrices, we propose a quaternion convolutional neural network (QCNN) model to obtain more representative features. In particular, we redesign the basic modules like convolution layer and fully-connected layer in the quaternion domain, which can be used to establish fully-quaternion convolutional neural networks. Moreover, these modules are compatible with almost all deep learning techniques and can be plugged into traditional CNNs easily. We test our QCNN models in both color image classification and denoising tasks. Experimental results show that they outperform the real-valued CNNs with same structures.

Via

Access Paper or Ask Questions

Recent Research Advances on Interactive Machine Learning

Nov 12, 2018

Liu Jiang, Shixia Liu, Changjian Chen

Figure 1 for Recent Research Advances on Interactive Machine Learning

Figure 2 for Recent Research Advances on Interactive Machine Learning

Figure 3 for Recent Research Advances on Interactive Machine Learning

Figure 4 for Recent Research Advances on Interactive Machine Learning

Abstract:Interactive Machine Learning (IML) is an iterative learning process that tightly couples a human with a machine learner, which is widely used by researchers and practitioners to effectively solve a wide variety of real-world application problems. Although recent years have witnessed the proliferation of IML in the field of visual analytics, most recent surveys either focus on a specific area of IML or aim to summarize a visualization field that is too generic for IML. In this paper, we systematically review the recent literature on IML and classify them into a task-oriented taxonomy built by us. We conclude the survey with a discussion of open challenges and research opportunities that we believe are inspiring for future work in IML.

Via

Access Paper or Ask Questions