Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shanshan Liu

MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning

Jul 29, 2025

Tianhong Gao, Yannian Fu, Weiqun Wu, Haixiao Yue, Shanshan Liu, Gang Zhang

Abstract:Large Language Models (LLMs), enhanced through agent tuning, have demonstrated remarkable capabilities in Chain-of-Thought (CoT) and tool utilization, significantly surpassing the performance of standalone models. However, the multimodal domain still lacks a large-scale, high-quality agent tuning dataset to unlock the full potential of multimodal large language models. To bridge this gap, we introduce MMAT-1M, the first million-scale multimodal agent tuning dataset designed to support CoT, reflection, and dynamic tool usage. Our dataset is constructed through a novel four-stage data engine: 1) We first curate publicly available multimodal datasets containing question-answer pairs; 2) Then, leveraging GPT-4o, we generate rationales for the original question-answer pairs and dynamically integrate API calls and Retrieval Augmented Generation (RAG) information through a multi-turn paradigm; 3) Furthermore, we refine the rationales through reflection to ensure logical consistency and accuracy, creating a multi-turn dialogue dataset with both Rationale and Reflection (RR); 4) Finally, to enhance efficiency, we optionally compress multi-turn dialogues into a One-turn Rationale and Reflection (ORR) format. By fine-tuning open-source multimodal models on the MMAT-1M, we observe significant performance gains. For instance, the InternVL2.5-8B-RR model achieves an average improvement of 2.7% across eight public benchmarks and 8.8% on the RAG benchmark Dyn-VQA, demonstrating the dataset's effectiveness in enhancing multimodal reasoning and tool-based capabilities. The dataset is publicly available at https://github.com/VIS-MPU-Agent/MMAT-1M.

Via

Access Paper or Ask Questions

MA-COIR: Leveraging Semantic Search Index and Generative Models for Ontology-Driven Biomedical Concept Recognition

May 19, 2025

Shanshan Liu, Noriki Nishida, Rumana Ferdous Munne, Narumi Tokunaga, Yuki Yamagata, Kouji Kozaki, Yuji Matsumoto

Abstract:Recognizing biomedical concepts in the text is vital for ontology refinement, knowledge graph construction, and concept relationship discovery. However, traditional concept recognition methods, relying on explicit mention identification, often fail to capture complex concepts not explicitly stated in the text. To overcome this limitation, we introduce MA-COIR, a framework that reformulates concept recognition as an indexing-recognition task. By assigning semantic search indexes (ssIDs) to concepts, MA-COIR resolves ambiguities in ontology entries and enhances recognition efficiency. Using a pretrained BART-based model fine-tuned on small datasets, our approach reduces computational requirements to facilitate adoption by domain experts. Furthermore, we incorporate large language models (LLMs)-generated queries and synthetic data to improve recognition in low-resource settings. Experimental results on three scenarios (CDR, HPO, and HOIP) highlight the effectiveness of MA-COIR in recognizing both explicit and implicit concepts without the need for mention-level annotations during inference, advancing ontology-driven concept recognition in biomedical domain applications. Our code and constructed data are available at https://github.com/sl-633/macoir-master.

* preprint

Via

Access Paper or Ask Questions

On Data Synthesis and Post-training for Visual Abstract Reasoning

Apr 02, 2025

Ke Zhu, Yu Wang, Jiangjiang Liu, Qunyi Xie, Shanshan Liu, Gang Zhang

Figure 1 for On Data Synthesis and Post-training for Visual Abstract Reasoning

Figure 2 for On Data Synthesis and Post-training for Visual Abstract Reasoning

Figure 3 for On Data Synthesis and Post-training for Visual Abstract Reasoning

Figure 4 for On Data Synthesis and Post-training for Visual Abstract Reasoning

Abstract:This paper is a pioneering work attempting to address abstract visual reasoning (AVR) problems for large vision-language models (VLMs). We make a common LLaVA-NeXT 7B model capable of perceiving and reasoning about specific AVR problems, surpassing both open-sourced (e.g., Qwen-2-VL-72B) and closed-sourced powerful VLMs (e.g., GPT-4o) with significant margin. This is a great breakthrough since almost all previous VLMs fail or show nearly random performance on representative AVR benchmarks. Our key success is our innovative data synthesis and post-training process, aiming to fully relieve the task difficulty and elicit the model to learn, step by step. Our 7B model is also shown to be behave well on AVR without sacrificing common multimodal comprehension abilities. We hope our paper could serve as an early effort in this area and would inspire further research in abstract visual reasoning.

Via

Access Paper or Ask Questions

Speed and Conversational Large Language Models: Not All Is About Tokens per Second

Feb 23, 2025

Javier Conde, Miguel González, Pedro Reviriego, Zhen Gao, Shanshan Liu, Fabrizio Lombardi

Abstract:The speed of open-weights large language models (LLMs) and its dependency on the task at hand, when run on GPUs, is studied to present a comparative analysis of the speed of the most popular open LLMs.

* Computer (Volume: 57, Issue: 8, August 2024)

Via

Access Paper or Ask Questions

Can ChatGPT Learn to Count Letters?

Feb 23, 2025

Javier Conde, Gonzalo Martínez, Pedro Reviriego, Zhen Gao, Shanshan Liu, Fabrizio Lombardi

Abstract:Large language models (LLMs) struggle on simple tasks such as counting the number of occurrences of a letter in a word. In this paper, we investigate if ChatGPT can learn to count letters and propose an efficient solution.

* Computer (Volume: 58, Issue: 3, March 2025)

Via

Access Paper or Ask Questions

Adaptive Resolution Inference (ARI): Energy-Efficient Machine Learning for Internet of Things

Aug 26, 2024

Ziheng Wang, Pedro Reviriego, Farzad Niknia, Javier Conde, Shanshan Liu, Fabrizio Lombardi

Figure 1 for Adaptive Resolution Inference (ARI): Energy-Efficient Machine Learning for Internet of Things

Figure 2 for Adaptive Resolution Inference (ARI): Energy-Efficient Machine Learning for Internet of Things

Figure 3 for Adaptive Resolution Inference (ARI): Energy-Efficient Machine Learning for Internet of Things

Figure 4 for Adaptive Resolution Inference (ARI): Energy-Efficient Machine Learning for Internet of Things

Abstract:The implementation of machine learning in Internet of Things devices poses significant operational challenges due to limited energy and computation resources. In recent years, significant efforts have been made to implement simplified ML models that can achieve reasonable performance while reducing computation and energy, for example by pruning weights in neural networks, or using reduced precision for the parameters and arithmetic operations. However, this type of approach is limited by the performance of the ML implementation, i.e., by the loss for example in accuracy due to the model simplification. In this article, we present adaptive resolution inference (ARI), a novel approach that enables to evaluate new tradeoffs between energy dissipation and model performance in ML implementations. The main principle of the proposed approach is to run inferences with reduced precision (quantization) and use the margin over the decision threshold to determine if either the result is reliable, or the inference must run with the full model. The rationale is that quantization only introduces small deviations in the inference scores, such that if the scores have a sufficient margin over the decision threshold, it is unlikely that the full model would have a different result. Therefore, we can run the quantized model first, and only when the scores do not have a sufficient margin, the full model is run. This enables most inferences to run with the reduced precision model and only a small fraction requires the full model, so significantly reducing computation and energy while not affecting model performance. The proposed ARI approach is presented, analyzed in detail, and evaluated using different data sets for floating-point and stochastic computing implementations. The results show that ARI can significantly reduce the energy for inference in different configurations with savings between 40% and 85%.

* IEEE Internet of Things Journal 2023 Volume:11, Issue:8

Via

Access Paper or Ask Questions

Dependability Evaluation of Stable Diffusion with Soft Errors on the Model Parameters

Mar 30, 2024

Zhen Gao, Lini Yuan, Pedro Reviriego, Shanshan Liu, Fabrizio Lombardi

Abstract:Stable Diffusion is a popular Transformer-based model for image generation from text; it applies an image information creator to the input text and the visual knowledge is added in a step-by-step fashion to create an image that corresponds to the input text. However, this diffusion process can be corrupted by errors from the underlying hardware, which are especially relevant for implementations at the nanoscales. In this paper, the dependability of Stable Diffusion is studied focusing on soft errors in the memory that stores the model parameters; specifically, errors are injected into some critical layers of the Transformer in different blocks of the image information creator, to evaluate their impact on model performance. The simulations results reveal several conclusions: 1) errors on the down blocks of the creator have a larger impact on the quality of the generated images than those on the up blocks, while the errors on middle block have negligible effect; 2) errors on the self-attention (SA) layers have larger impact on the results than those on the cross-attention (CA) layers; 3) for CA layers, errors on deeper levels result in a larger impact; 4) errors on blocks at the first levels tend to introduce noise in the image, and those on deep layers tend to introduce large colored blocks. These results provide an initial understanding of the impact of errors on Stable Diffusion.

* 6 pages, 16 figures

Via

Access Paper or Ask Questions

Concurrent Linguistic Error Detection (CLED) for Large Language Models

Mar 25, 2024

Jinhua Zhu, Javier Conde, Zhen Gao, Pedro Reviriego, Shanshan Liu, Fabrizio Lombardi

Abstract:The wide adoption of Large language models (LLMs) makes their dependability a pressing concern. Detection of errors is the first step to mitigating their impact on a system and thus, efficient error detection for LLMs is an important issue. In many settings, the LLM is considered as a black box with no access to the internal nodes; this prevents the use of many error detection schemes that need access to the model's internal nodes. An interesting observation is that the output of LLMs in error-free operation should be valid and normal text. Therefore, when the text is not valid or differs significantly from normal text, it is likely that there is an error. Based on this observation we propose to perform Concurrent Linguistic Error Detection (CLED); this scheme extracts some linguistic features of the text generated by the LLM and feeds them to a concurrent classifier that detects errors. Since the proposed error detection mechanism only relies on the outputs of the model, then it can be used on LLMs in which there is no access to the internal nodes. The proposed CLED scheme has been evaluated on the T5 model when used for news summarization and on the OPUS-MT model when used for translation. In both cases, the same set of linguistic features has been used for error detection to illustrate the applicability of the proposed scheme beyond a specific case. The results show that CLED can detect most of the errors at a low overhead penalty. The use of the concurrent classifier also enables a trade-off between error detection effectiveness and its associated overhead, so providing flexibility to a designer.

* 11 pages, 6 figures, 30 references

Via

Access Paper or Ask Questions

Automatic laminectomy cutting plane planning based on artificial intelligence in robot assisted laminectomy surgery

Dec 26, 2023

Zhuofu Li, Yonghong Zhang, Chengxia Wang, Shanshan Liu, Xiongkang Song, Xuquan Ji, Shuai Jiang, Woquan Zhong, Lei Hu, Weishi Li

Abstract:Objective: This study aims to use artificial intelligence to realize the automatic planning of laminectomy, and verify the method. Methods: We propose a two-stage approach for automatic laminectomy cutting plane planning. The first stage was the identification of key points. 7 key points were manually marked on each CT image. The Spatial Pyramid Upsampling Network (SPU-Net) algorithm developed by us was used to accurately locate the 7 key points. In the second stage, based on the identification of key points, a personalized coordinate system was generated for each vertebra. Finally, the transverse and longitudinal cutting planes of laminectomy were generated under the coordinate system. The overall effect of planning was evaluated. Results: In the first stage, the average localization error of the SPU-Net algorithm for the seven key points was 0.65mm. In the second stage, a total of 320 transverse cutting planes and 640 longitudinal cutting planes were planned by the algorithm. Among them, the number of horizontal plane planning effects of grade A, B, and C were 318(99.38%), 1(0.31%), and 1(0.31%), respectively. The longitudinal planning effects of grade A, B, and C were 622(97.18%), 1(0.16%), and 17(2.66%), respectively. Conclusions: In this study, we propose a method for automatic surgical path planning of laminectomy based on the localization of key points in CT images. The results showed that the method achieved satisfactory results. More studies are needed to confirm the reliability of this approach in the future.

Via

Access Paper or Ask Questions

Concurrent Classifier Error Detection (CCED) in Large Scale Machine Learning Systems

Jun 02, 2023

Pedro Reviriego, Ziheng Wang, Alvaro Alonso, Zhen Gao, Farzad Niknia, Shanshan Liu, Fabrizio Lombardi

Figure 1 for Concurrent Classifier Error Detection (CCED) in Large Scale Machine Learning Systems

Figure 2 for Concurrent Classifier Error Detection (CCED) in Large Scale Machine Learning Systems

Figure 3 for Concurrent Classifier Error Detection (CCED) in Large Scale Machine Learning Systems

Figure 4 for Concurrent Classifier Error Detection (CCED) in Large Scale Machine Learning Systems

Abstract:The complexity of Machine Learning (ML) systems increases each year, with current implementations of large language models or text-to-image generators having billions of parameters and requiring billions of arithmetic operations. As these systems are widely utilized, ensuring their reliable operation is becoming a design requirement. Traditional error detection mechanisms introduce circuit or time redundancy that significantly impacts system performance. An alternative is the use of Concurrent Error Detection (CED) schemes that operate in parallel with the system and exploit their properties to detect errors. CED is attractive for large ML systems because it can potentially reduce the cost of error detection. In this paper, we introduce Concurrent Classifier Error Detection (CCED), a scheme to implement CED in ML systems using a concurrent ML classifier to detect errors. CCED identifies a set of check signals in the main ML system and feeds them to the concurrent ML classifier that is trained to detect errors. The proposed CCED scheme has been implemented and evaluated on two widely used large-scale ML models: Contrastive Language Image Pretraining (CLIP) used for image classification and Bidirectional Encoder Representations from Transformers (BERT) used for natural language applications. The results show that more than 95 percent of the errors are detected when using a simple Random Forest classifier that is order of magnitude simpler than CLIP or BERT. These results illustrate the potential of CCED to implement error detection in large-scale ML models.

Via

Access Paper or Ask Questions