Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ayan Banerjee

Generating customized prompts for Zero-Shot Rare Event Medical Image Classification using LLM

Jan 27, 2025

Payal Kamboj, Ayan Banerjee, Bin Xu, Sandeep Gupta

Abstract:Rare events, due to their infrequent occurrences, do not have much data, and hence deep learning techniques fail in estimating the distribution for such data. Open-vocabulary models represent an innovative approach to image classification. Unlike traditional models, these models classify images into any set of categories specified with natural language prompts during inference. These prompts usually comprise manually crafted templates (e.g., 'a photo of a {}') that are filled in with the names of each category. This paper introduces a simple yet effective method for generating highly accurate and contextually descriptive prompts containing discriminative characteristics. Rare event detection, especially in medicine, is more challenging due to low inter-class and high intra-class variability. To address these, we propose a novel approach that uses domain-specific expert knowledge on rare events to generate customized and contextually relevant prompts, which are then used by large language models for image classification. Our zero-shot, privacy-preserving method enhances rare event classification without additional training, outperforming state-of-the-art techniques.

* Accepted in IEEE ISBI, 2025

Via

Access Paper or Ask Questions

Recovering implicit physics model under real-world constraints

Dec 03, 2024

Ayan Banerjee, Sandeep K. S. Gupta

Figure 1 for Recovering implicit physics model under real-world constraints

Figure 2 for Recovering implicit physics model under real-world constraints

Figure 3 for Recovering implicit physics model under real-world constraints

Figure 4 for Recovering implicit physics model under real-world constraints

Abstract:Recovering a physics-driven model, i.e. a governing set of equations of the underlying dynamical systems, from the real-world data has been of recent interest. Most existing methods either operate on simulation data with unrealistically high sampling rates or require explicit measurements of all system variables, which is not amenable in real-world deployments. Moreover, they assume the timestamps of external perturbations to the physical system are known a priori, without uncertainty, implicitly discounting any sensor time-synchronization or human reporting errors. In this paper, we propose a novel liquid time constant neural network (LTC-NN) based architecture to recover underlying model of physical dynamics from real-world data. The automatic differentiation property of LTC-NN nodes overcomes problems associated with low sampling rates, the input dependent time constant in the forward pass of the hidden layer of LTC-NN nodes creates a massive search space of implicit physical dynamics, the physics model solver based data reconstruction loss guides the search for the correct set of implicit dynamics, and the use of the dropout regularization in the dense layer ensures extraction of the sparsest model. Further, to account for the perturbation timing error, we utilize dense layer nodes to search through input shifts that results in the lowest reconstruction loss. Experiments on four benchmark dynamical systems, three with simulation data and one with the real-world data show that the LTC-NN architecture is more accurate in recovering implicit physics model coefficients than the state-of-the-art sparse model recovery approaches. We also introduce four additional case studies (total eight) on real-life medical examples in simulation and with real-world clinical data to show effectiveness of our approach in recovering underlying model in practice.

* 27 th European conference on Artificial Intelligence 2024
* This paper is published in ECAI 2024, https://ebooks.iospress.nl/volumearticle/69651

Via

Access Paper or Ask Questions

STORM: Strategic Orchestration of Modalities for Rare Event Classification

Dec 03, 2024

Payal Kamboj, Ayan Banerjee, Sandeep K. S. Gupta

Abstract:In domains such as biomedical, expert insights are crucial for selecting the most informative modalities for artificial intelligence (AI) methodologies. However, using all available modalities poses challenges, particularly in determining the impact of each modality on performance and optimizing their combinations for accurate classification. Traditional approaches resort to manual trial and error methods, lacking systematic frameworks for discerning the most relevant modalities. Moreover, although multi-modal learning enables the integration of information from diverse sources, utilizing all available modalities is often impractical and unnecessary. To address this, we introduce an entropy-based algorithm STORM to solve the modality selection problem for rare event. This algorithm systematically evaluates the information content of individual modalities and their combinations, identifying the most discriminative features essential for rare class classification tasks. Through seizure onset zone detection case study, we demonstrate the efficacy of our algorithm in enhancing classification performance. By selecting useful subset of modalities, our approach paves the way for more efficient AI-driven biomedical analyses, thereby advancing disease diagnosis in clinical settings.

* Accepted in IEEE Asilomar Conference on Signals, Systems, and Computers, 2024

Via

Access Paper or Ask Questions

Framework for developing and evaluating ethical collaboration between expert and machine

Nov 17, 2024

Ayan Banerjee, Payal Kamboj, Sandeep Gupta

Abstract:Precision medicine is a promising approach for accessible disease diagnosis and personalized intervention planning in high-mortality diseases such as coronary artery disease (CAD), drug-resistant epilepsy (DRE), and chronic illnesses like Type 1 diabetes (T1D). By leveraging artificial intelligence (AI), precision medicine tailors diagnosis and treatment solutions to individual patients by explicitly modeling variance in pathophysiology. However, the adoption of AI in medical applications faces significant challenges, including poor generalizability across centers, demographics, and comorbidities, limited explainability in clinical terms, and a lack of trust in ethical decision-making. This paper proposes a framework to develop and ethically evaluate expert-guided multi-modal AI, addressing these challenges in AI integration within precision medicine. We illustrate this framework with case study on insulin management for T1D. To ensure ethical considerations and clinician engagement, we adopt a co-design approach where AI serves an assistive role, with final diagnoses or treatment plans emerging from collaboration between clinicians and AI.

* Accepted in ECAI Workshop AIEB

Via

Access Paper or Ask Questions

DistilDoc: Knowledge Distillation for Visually-Rich Document Applications

Jun 12, 2024

Jordy Van Landeghem, Subhajit Maity, Ayan Banerjee, Matthew Blaschko, Marie-Francine Moens, Josep Lladós, Sanket Biswas

Figure 1 for DistilDoc: Knowledge Distillation for Visually-Rich Document Applications

Figure 2 for DistilDoc: Knowledge Distillation for Visually-Rich Document Applications

Figure 3 for DistilDoc: Knowledge Distillation for Visually-Rich Document Applications

Figure 4 for DistilDoc: Knowledge Distillation for Visually-Rich Document Applications

Abstract:This work explores knowledge distillation (KD) for visually-rich document (VRD) applications such as document layout analysis (DLA) and document image classification (DIC). While VRD research is dependent on increasingly sophisticated and cumbersome models, the field has neglected to study efficiency via model compression. Here, we design a KD experimentation methodology for more lean, performant models on document understanding (DU) tasks that are integral within larger task pipelines. We carefully selected KD strategies (response-based, feature-based) for distilling knowledge to and from backbones with different architectures (ResNet, ViT, DiT) and capacities (base, small, tiny). We study what affects the teacher-student knowledge gap and find that some methods (tuned vanilla KD, MSE, SimKD with an apt projector) can consistently outperform supervised student training. Furthermore, we design downstream task setups to evaluate covariate shift and the robustness of distilled DLA models on zero-shot layout-aware document visual question answering (DocVQA). DLA-KD experiments result in a large mAP knowledge gap, which unpredictably translates to downstream robustness, accentuating the need to further explore how to efficiently obtain more semantic document layout awareness.

* Accepted to ICDAR 2024 (Athens, Greece)

Via

Access Paper or Ask Questions

CPS-LLM: Large Language Model based Safe Usage Plan Generator for Human-in-the-Loop Human-in-the-Plant Cyber-Physical System

May 19, 2024

Ayan Banerjee, Aranyak Maity, Payal Kamboj, Sandeep K. S. Gupta

Abstract:We explore the usage of large language models (LLM) in human-in-the-loop human-in-the-plant cyber-physical systems (CPS) to translate a high-level prompt into a personalized plan of actions, and subsequently convert that plan into a grounded inference of sequential decision-making automated by a real-world CPS controller to achieve a control goal. We show that it is relatively straightforward to contextualize an LLM so it can generate domain-specific plans. However, these plans may be infeasible for the physical system to execute or the plan may be unsafe for human users. To address this, we propose CPS-LLM, an LLM retrained using an instruction tuning framework, which ensures that generated plans not only align with the physical system dynamics of the CPS but are also safe for human users. The CPS-LLM consists of two innovative components: a) a liquid time constant neural network-based physical dynamics coefficient estimator that can derive coefficients of dynamical models with some unmeasured state variables; b) the model coefficients are then used to train an LLM with prompts embodied with traces from the dynamical system and the corresponding model coefficients. We show that when the CPS-LLM is integrated with a contextualized chatbot such as BARD it can generate feasible and safe plans to manage external events such as meals for automated insulin delivery systems used by Type 1 Diabetes subjects.

* Accepted for publication in AAAI 2024, Planning for Cyber Physical Systems

Via

Access Paper or Ask Questions

SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout

Mar 30, 2024

Ayan Banerjee, Nityanand Mathur, Josep Lladós, Umapada Pal, Anjan Dutta

Figure 1 for SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout

Figure 2 for SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout

Figure 3 for SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout

Figure 4 for SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout

Abstract:Generating VectorArt from text prompts is a challenging vision task, requiring diverse yet realistic depictions of the seen as well as unseen entities. However, existing research has been mostly limited to the generation of single objects, rather than comprehensive scenes comprising multiple elements. In response, this work introduces SVGCraft, a novel end-to-end framework for the creation of vector graphics depicting entire scenes from textual descriptions. Utilizing a pre-trained LLM for layout generation from text prompts, this framework introduces a technique for producing masked latents in specified bounding boxes for accurate object placement. It introduces a fusion mechanism for integrating attention maps and employs a diffusion U-Net for coherent composition, speeding up the drawing process. The resulting SVG is optimized using a pre-trained encoder and LPIPS loss with opacity modulation to maximize similarity. Additionally, this work explores the potential of primitive shapes in facilitating canvas completion in constrained environments. Through both qualitative and quantitative assessments, SVGCraft is demonstrated to surpass prior works in abstraction, recognizability, and detail, as evidenced by its performance metrics (CLIP-T: 0.4563, Cosine Similarity: 0.6342, Confusion: 0.66, Aesthetic: 6.7832). The code will be available at https://github.com/ayanban011/SVGCraft.

Via

Access Paper or Ask Questions

GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

Feb 20, 2024

Ayan Banerjee, Sanket Biswas, Josep Lladós, Umapada Pal

Figure 1 for GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

Figure 2 for GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

Figure 3 for GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

Figure 4 for GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation

Abstract:Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constrained devices. Knowledge distillation allows us to create small and more efficient models that retain much of the performance of their larger counterparts. Here we present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image. Here, we design a structured graph with nodes containing proposal-level features and edges representing the relationship between the different proposal regions. Also, to reduce text bias an adaptive node sampling strategy is designed to prune the weight distribution and put more weightage on non-text nodes. We encode the complete graph as a knowledge representation and transfer it from the teacher to the student through the proposed distillation loss by effectively capturing both local and global information concurrently. Extensive experimentation on competitive benchmarks demonstrates that the proposed framework outperforms the current state-of-the-art approaches. The code will be available at: https://github.com/ayanban011/GraphKD.

Via

Access Paper or Ask Questions

The Expert Knowledge combined with AI outperforms AI Alone in Seizure Onset Zone Localization using resting state fMRI

Dec 14, 2023

Payal Kamboj, Ayan Banerjee, Varina L. Boerwinkle, Sandeep K. S. Gupta

Abstract:We evaluated whether integration of expert guidance on seizure onset zone (SOZ) identification from resting state functional MRI (rs-fMRI) connectomics combined with deep learning (DL) techniques enhances the SOZ delineation in patients with refractory epilepsy (RE), compared to utilizing DL alone. Rs-fMRI were collected from 52 children with RE who had subsequently undergone ic-EEG and then, if indicated, surgery for seizure control (n = 25). The resting state functional connectomics data were previously independently classified by two expert epileptologists, as indicative of measurement noise, typical resting state network connectivity, or SOZ. An expert knowledge integrated deep network was trained on functional connectomics data to identify SOZ. Expert knowledge integrated with DL showed a SOZ localization accuracy of 84.8& and F1 score, harmonic mean of positive predictive value and sensitivity, of 91.7%. Conversely, a DL only model yielded an accuracy of less than 50% (F1 score 63%). Activations that initiate in gray matter, extend through white matter and end in vascular regions are seen as the most discriminative expert identified SOZ characteristics. Integration of expert knowledge of functional connectomics can not only enhance the performance of DL in localizing SOZ in RE, but also lead toward potentially useful explanations of prevalent co-activation patterns in SOZ. RE with surgical outcomes and pre-operative rs-fMRI studies can yield expert knowledge most salient for SOZ identification.

* Accepted in Frontiers in Neurology journal, section Artificial Intelligence

Via

Access Paper or Ask Questions

Harnessing the Power of Multi-Lingual Datasets for Pre-training: Towards Enhancing Text Spotting Performance

Oct 06, 2023

Alloy Das, Sanket Biswas, Ayan Banerjee, Saumik Bhattacharya, Josep Lladós, Umapada Pal

Abstract:The adaptation capability to a wide range of domains is crucial for scene text spotting models when deployed to real-world conditions. However, existing state-of-the-art (SOTA) approaches usually incorporate scene text detection and recognition simply by pretraining on natural scene text datasets, which do not directly exploit the intermediate feature representations between multiple domains. Here, we investigate the problem of domain-adaptive scene text spotting, i.e., training a model on multi-domain source data such that it can directly adapt to target domains rather than being specialized for a specific domain or scenario. Further, we investigate a transformer baseline called Swin-TESTR to focus on solving scene-text spotting for both regular and arbitrary-shaped scene text along with an exhaustive evaluation. The results clearly demonstrate the potential of intermediate representations to achieve significant performance on text spotting benchmarks across multiple domains (e.g. language, synth-to-real, and documents). both in terms of accuracy and efficiency.

* 8 pages

Via

Access Paper or Ask Questions