Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chris McCarthy

Swinburne University of Technology

A Lightweight Large Vision-language Model for Multimodal Medical Images

Apr 08, 2025

Belal Alsinglawi, Chris McCarthy, Sara Webb, Christopher Fluke, Navid Toosy Saidy

Abstract:Medical Visual Question Answering (VQA) enhances clinical decision-making by enabling systems to interpret medical images and answer clinical queries. However, developing efficient, high-performance VQA models is challenging due to the complexity of medical imagery and diverse modalities. In this paper, we introduce a lightweight, multimodal VQA model integrating BiomedCLIP for image feature extraction and LLaMA-3 for text processing. Designed for medical VQA tasks, our model achieves state-of-the-art performance on the OmniMedVQA dataset. With approximately 8 billion parameters, it requires only two NVIDIA 40 GB A100 GPUs, demonstrating superior efficiency over larger models. Our results show 73.4% accuracy for open-end questions, surpassing existing models and validating its potential for real-world medical applications. Key contributions include a specialized multimodal VQA model, a resource-efficient architecture, and strong performance in answering open-ended clinical questions.

* 10 pages, 4 figures

Via

Access Paper or Ask Questions

Linked Adapters: Linking Past and Future to Present for Effective Continual Learning

Dec 14, 2024

Dupati Srikar Chandra, P. K. Srijith, Dana Rezazadegan, Chris McCarthy

Abstract:Continual learning allows the system to learn and adapt to new tasks while retaining the knowledge acquired from previous tasks. However, deep learning models suffer from catastrophic forgetting of knowledge learned from earlier tasks while learning a new task. Moreover, retraining large models like transformers from scratch for every new task is costly. An effective approach to address continual learning is to use a large pre-trained model with task-specific adapters to adapt to the new tasks. Though this approach can mitigate catastrophic forgetting, they fail to transfer knowledge across tasks as each task is learning adapters separately. To address this, we propose a novel approach Linked Adapters that allows knowledge transfer through a weighted attention mechanism to other task-specific adapters. Linked adapters use a multi-layer perceptron (MLP) to model the attention weights, which overcomes the challenge of backward knowledge transfer in continual learning in addition to modeling the forward knowledge transfer. During inference, our proposed approach effectively leverages knowledge transfer through MLP-based attention weights across all the lateral task adapters. Through numerous experiments conducted on diverse image classification datasets, we effectively demonstrated the improvement in performance on the continual learning tasks using Linked Adapters.

* 13 Pages, 5 Figures

Via

Access Paper or Ask Questions

Towards self-attention based visual navigation in the real world

Sep 19, 2022

Jaime Ruiz-Serra, Jack White, Stephen Petrie, Tatiana Kameneva, Chris McCarthy

Figure 1 for Towards self-attention based visual navigation in the real world

Figure 2 for Towards self-attention based visual navigation in the real world

Figure 3 for Towards self-attention based visual navigation in the real world

Figure 4 for Towards self-attention based visual navigation in the real world

Abstract:Vision guided navigation requires processing complex visual information to inform task-orientated decisions. Applications include autonomous robots, self-driving cars, and assistive vision for humans. A key element is the extraction and selection of relevant features in pixel space upon which to base action choices, for which Machine Learning techniques are well suited. However, Deep Reinforcement Learning agents trained in simulation often exhibit unsatisfactory results when deployed in the real-world due to perceptual differences known as the $\textit{reality gap}$. An approach that is yet to be explored to bridge this gap is self-attention. In this paper we (1) perform a systematic exploration of the hyperparameter space for self-attention based navigation of 3D environments and qualitatively appraise behaviour observed from different hyperparameter sets, including their ability to generalise; (2) present strategies to improve the agents' generalisation abilities and navigation behaviour; and (3) show how models trained in simulation are capable of processing real world images meaningfully in real time. To our knowledge, this is the first demonstration of a self-attention based agent successfully trained in navigating a 3D action space, using less than 4000 parameters.

* Submitted to The 2022 Australian Conference on Robotics and Automation (ACRA 2022)

Via

Access Paper or Ask Questions

Adapting a General Purpose Social Robot for Paediatric Rehabilitation through In-situ Design

Mar 08, 2018

Felip Martí, Jo Butchart, Sarah Knight, Adam Scheinberg, Lisa Wise, Leon Sterling, Chris McCarthy

Figure 1 for Adapting a General Purpose Social Robot for Paediatric Rehabilitation through In-situ Design

Figure 2 for Adapting a General Purpose Social Robot for Paediatric Rehabilitation through In-situ Design

Figure 3 for Adapting a General Purpose Social Robot for Paediatric Rehabilitation through In-situ Design

Figure 4 for Adapting a General Purpose Social Robot for Paediatric Rehabilitation through In-situ Design

Abstract:Socially Assistive Robots (SARs) offer great promise for improving outcomes in paediatric rehabilitation. However, the design of software and interactive capabilities for SARs must be carefully considered in the context of their intended clinical use. While previous work has explored specific roles and functionalities to support paediatric rehabilitation, few have considered the design of such capabilities in the context of ongoing clinical deployment. In this paper we present a two-phase In-situ design process for SARs in health care, emphasising stakeholder engagement and on-site development. We explore this in the context of developing the humanoid social robot NAO as a socially assistive rehabilitation aid for children with cerebral palsy. We present and evaluate our design process, outcomes achieved, and preliminary results from ongoing clinical testing with 9 patients and 5 therapists over 14 sessions. We argue that our in-situ Design methodology has been central to the rapid and successful deployment of our system.

* Submitted to the Journal of Human-Robot Interaction (JHRI). Journal rebranded to Transactions of Human-Robot Interaction (THRI). Paper presented in the 13th Annual ACM/IEEE International Conference on Human Robot Interaction, Chicago, 8 March 2018

Via

Access Paper or Ask Questions