Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chaitanya Devaguptapu

Hybrid Graphs for Table-and-Text based Question Answering using LLMs

Jan 29, 2025

Ankush Agarwal, Ganesh S, Chaitanya Devaguptapu

Figure 1 for Hybrid Graphs for Table-and-Text based Question Answering using LLMs

Figure 2 for Hybrid Graphs for Table-and-Text based Question Answering using LLMs

Figure 3 for Hybrid Graphs for Table-and-Text based Question Answering using LLMs

Figure 4 for Hybrid Graphs for Table-and-Text based Question Answering using LLMs

Abstract:Answering questions that require reasoning and aggregation across both structured (tables) and unstructured (raw text) data sources presents significant challenges. Current methods rely on fine-tuning and high-quality, human-curated data, which is difficult to obtain. Recent advances in Large Language Models (LLMs) have shown promising results for multi-hop question answering (QA) over single-source text data in a zero-shot setting, yet exploration into multi-source Table-Text QA remains limited. In this paper, we present a novel Hybrid Graph-based approach for Table-Text QA that leverages LLMs without fine-tuning. Our method constructs a unified Hybrid Graph from textual and tabular data, pruning information based on the input question to provide the LLM with relevant context concisely. We evaluate our approach on the challenging Hybrid-QA and OTT-QA datasets using state-of-the-art LLMs, including GPT-3.5, GPT-4, and LLaMA-3. Our method achieves the best zero-shot performance on both datasets, improving Exact Match scores by up to 10% on Hybrid-QA and 5.4% on OTT-QA. Moreover, our approach reduces token usage by up to 53% compared to the original context.

* Accepted at NAACL 2025 Main Track

Via

Access Paper or Ask Questions

Towards a Training Free Approach for 3D Scene Editing

Dec 17, 2024

Vivek Madhavaram, Shivangana Rawat, Chaitanya Devaguptapu, Charu Sharma, Manohar Kaul

Abstract:Text driven diffusion models have shown remarkable capabilities in editing images. However, when editing 3D scenes, existing works mostly rely on training a NeRF for 3D editing. Recent NeRF editing methods leverages edit operations by deploying 2D diffusion models and project these edits into 3D space. They require strong positional priors alongside text prompt to identify the edit location. These methods are operational on small 3D scenes and are more generalized to particular scene. They require training for each specific edit and cannot be exploited in real-time edits. To address these limitations, we propose a novel method, FreeEdit, to make edits in training free manner using mesh representations as a substitute for NeRF. Training-free methods are now a possibility because of the advances in foundation model's space. We leverage these models to bring a training-free alternative and introduce solutions for insertion, replacement and deletion. We consider insertion, replacement and deletion as basic blocks for performing intricate edits with certain combinations of these operations. Given a text prompt and a 3D scene, our model is capable of identifying what object should be inserted/replaced or deleted and location where edit should be performed. We also introduce a novel algorithm as part of FreeEdit to find the optimal location on grounding object for placement. We evaluate our model by comparing it with baseline models on a wide range of scenes using quantitative and qualitative metrics and showcase the merits of our method with respect to others.

Via

Access Paper or Ask Questions

Semantic Graph Consistency: Going Beyond Patches for Regularizing Self-Supervised Vision Transformers

Jun 18, 2024

Chaitanya Devaguptapu, Sumukh Aithal, Shrinivas Ramasubramanian, Moyuru Yamada, Manohar Kaul

Abstract:Self-supervised learning (SSL) with vision transformers (ViTs) has proven effective for representation learning as demonstrated by the impressive performance on various downstream tasks. Despite these successes, existing ViT-based SSL architectures do not fully exploit the ViT backbone, particularly the patch tokens of the ViT. In this paper, we introduce a novel Semantic Graph Consistency (SGC) module to regularize ViT-based SSL methods and leverage patch tokens effectively. We reconceptualize images as graphs, with image patches as nodes and infuse relational inductive biases by explicit message passing using Graph Neural Networks into the SSL framework. Our SGC loss acts as a regularizer, leveraging the underexploited patch tokens of ViTs to construct a graph and enforcing consistency between graph features across multiple views of an image. Extensive experiments on various datasets including ImageNet, RESISC and Food-101 show that our approach significantly improves the quality of learned representations, resulting in a 5-10\% increase in performance when limited labeled data is used for linear evaluation. These experiments coupled with a comprehensive set of ablations demonstrate the promise of our approach in various settings.

Via

Access Paper or Ask Questions

HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs

Jun 10, 2024

Pranoy Panda, Ankush Agarwal, Chaitanya Devaguptapu, Manohar Kaul, Prathosh A P

Figure 1 for HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs

Figure 2 for HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs

Figure 3 for HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs

Figure 4 for HOLMES: Hyper-Relational Knowledge Graphs for Multi-hop Question Answering using LLMs

Abstract:Given unstructured text, Large Language Models (LLMs) are adept at answering simple (single-hop) questions. However, as the complexity of the questions increase, the performance of LLMs degrade. We believe this is due to the overhead associated with understanding the complex question followed by filtering and aggregating unstructured information in the raw text. Recent methods try to reduce this burden by integrating structured knowledge triples into the raw text, aiming to provide a structured overview that simplifies information processing. However, this simplistic approach is query-agnostic and the extracted facts are ambiguous as they lack context. To address these drawbacks and to enable LLMs to answer complex (multi-hop) questions with ease, we propose to use a knowledge graph (KG) that is context-aware and is distilled to contain query-relevant information. The use of our compressed distilled KG as input to the LLM results in our method utilizing up to $67\%$ fewer tokens to represent the query relevant information present in the supporting documents, compared to the state-of-the-art (SoTA) method. Our experiments show consistent improvements over the SoTA across several metrics (EM, F1, BERTScore, and Human Eval) on two popular benchmark datasets (HotpotQA and MuSiQue).

* Accepted at ACL 2024 in the main track

Via

Access Paper or Ask Questions

Synergizing Contrastive Learning and Optimal Transport for 3D Point Cloud Domain Adaptation

Aug 27, 2023

Siddharth Katageri, Arkadipta De, Chaitanya Devaguptapu, VSSV Prasad, Charu Sharma, Manohar Kaul

Figure 1 for Synergizing Contrastive Learning and Optimal Transport for 3D Point Cloud Domain Adaptation

Figure 2 for Synergizing Contrastive Learning and Optimal Transport for 3D Point Cloud Domain Adaptation

Figure 3 for Synergizing Contrastive Learning and Optimal Transport for 3D Point Cloud Domain Adaptation

Figure 4 for Synergizing Contrastive Learning and Optimal Transport for 3D Point Cloud Domain Adaptation

Abstract:Recently, the fundamental problem of unsupervised domain adaptation (UDA) on 3D point clouds has been motivated by a wide variety of applications in robotics, virtual reality, and scene understanding, to name a few. The point cloud data acquisition procedures manifest themselves as significant domain discrepancies and geometric variations among both similar and dissimilar classes. The standard domain adaptation methods developed for images do not directly translate to point cloud data because of their complex geometric nature. To address this challenge, we leverage the idea of multimodality and alignment between distributions. We propose a new UDA architecture for point cloud classification that benefits from multimodal contrastive learning to get better class separation in both domains individually. Further, the use of optimal transport (OT) aims at learning source and target data distributions jointly to reduce the cross-domain shift and provide a better alignment. We conduct a comprehensive empirical study on PointDA-10 and GraspNetPC-10 and show that our method achieves state-of-the-art performance on GraspNetPC-10 (with approx 4-12% margin) and best average performance on PointDA-10. Our ablation studies and decision boundary analysis also validate the significance of our contrastive learning module and OT alignment.

Via

Access Paper or Ask Questions

$Δ$-Networks for Efficient Model Patching

Mar 26, 2023

Chaitanya Devaguptapu, Samarth Sinha, K J Joseph, Vineeth N Balasubramanian, Animesh Garg

Abstract:Models pre-trained on large-scale datasets are often finetuned to support newer tasks and datasets that arrive over time. This process necessitates storing copies of the model over time for each task that the pre-trained model is finetuned to. Building on top of recent model patching work, we propose $\Delta$-Patching for finetuning neural network models in an efficient manner, without the need to store model copies. We propose a simple and lightweight method called $\Delta$-Networks to achieve this objective. Our comprehensive experiments across setting and architecture variants show that $\Delta$-Networks outperform earlier model patching work while only requiring a fraction of parameters to be trained. We also show that this approach can be used for other problem settings such as transfer learning and zero-shot domain adaptation, as well as other tasks such as detection and segmentation.

Via

Access Paper or Ask Questions

Learning Modular Structures That Generalize Out-of-Distribution

Aug 07, 2022

Arjun Ashok, Chaitanya Devaguptapu, Vineeth Balasubramanian

Figure 1 for Learning Modular Structures That Generalize Out-of-Distribution

Abstract:Out-of-distribution (O.O.D.) generalization remains to be a key challenge for real-world machine learning systems. We describe a method for O.O.D. generalization that, through training, encourages models to only preserve features in the network that are well reused across multiple training domains. Our method combines two complementary neuron-level regularizers with a probabilistic differentiable binary mask over the network, to extract a modular sub-network that achieves better O.O.D. performance than the original network. Preliminary evaluation on two benchmark datasets corroborates the promise of our method.

* Accepted at AAAI 2022 Student Abstract and Poster Program

Via

Access Paper or Ask Questions

On Initial Pools for Deep Active Learning

Nov 30, 2020

Akshay L Chandra, Sai Vikas Desai, Chaitanya Devaguptapu, Vineeth N Balasubramanian

Figure 1 for On Initial Pools for Deep Active Learning

Abstract:Active Learning (AL) techniques aim to minimize the training data required to train a model for a given task. Pool-based AL techniques start with a small initial labeled pool and then iteratively pick batches of the most informative samples for labeling. Generally, the initial pool is sampled randomly and labeled to seed the AL iterations. While recent` studies have focused on evaluating the robustness of various query functions in AL, little to no attention has been given to the design of the initial labeled pool. Given the recent successes of learning representations in self-supervised/unsupervised ways, we propose to study if an intelligently sampled initial labeled pool can improve deep AL performance. We will investigate the effect of intelligently sampled initial labeled pools, including the use of self-supervised and unsupervised strategies, on deep AL methods. We describe our experimental details, implementation details, datasets, performance metrics as well as planned ablation studies in this proposal. If intelligently sampled initial pools improve AL performance, our work could make a positive contribution to boosting AL performance with no additional annotation, developing datasets with lesser annotation cost in general, and promoting further research in the use of unsupervised learning methods for AL.

* Accepted at Pre-registration Workshop NeurIPS (2020) - http://preregister.science/

Via

Access Paper or Ask Questions

An Empirical Study on the Robustness of NAS based Architectures

Jul 16, 2020

Chaitanya Devaguptapu, Devansh Agarwal, Gaurav Mittal, Vineeth N Balasubramanian

Figure 1 for An Empirical Study on the Robustness of NAS based Architectures

Figure 2 for An Empirical Study on the Robustness of NAS based Architectures

Figure 3 for An Empirical Study on the Robustness of NAS based Architectures

Figure 4 for An Empirical Study on the Robustness of NAS based Architectures

Abstract:Most existing methods for Neural Architecture Search (NAS) focus on achieving state-of-the-art (SOTA) performance on standard datasets and do not explicitly search for adversarially robust models. In this work, we study the adversarial robustness of existing NAS architectures, comparing it with state-of-the-art handcrafted architectures, and provide reasons for why it is essential. We draw some key conclusions on the capacity of current NAS methods to tackle adversarial attacks through experiments on datasets of different sizes.

Via

Access Paper or Ask Questions

Borrow from Anywhere: Pseudo Multi-modal Object Detection in Thermal Imagery

May 21, 2019

Chaitanya Devaguptapu, Ninad Akolekar, Manuj M Sharma, Vineeth N Balasubramanian

Figure 1 for Borrow from Anywhere: Pseudo Multi-modal Object Detection in Thermal Imagery

Figure 2 for Borrow from Anywhere: Pseudo Multi-modal Object Detection in Thermal Imagery

Figure 3 for Borrow from Anywhere: Pseudo Multi-modal Object Detection in Thermal Imagery

Figure 4 for Borrow from Anywhere: Pseudo Multi-modal Object Detection in Thermal Imagery

Abstract:Can we improve detection in the thermal domain by borrowing features from rich domains like visual RGB? In this paper, we propose a pseudo-multimodal object detector trained on natural image domain data to help improve the performance of object detection in thermal images. We assume access to a large-scale dataset in the visual RGB domain and relatively smaller dataset (in terms of instances) in the thermal domain, as is common today. We propose the use of well-known image-to-image translation frameworks to generate pseudo-RGB equivalents of a given thermal image and then use a multi-modal architecture for object detection in the thermal image. We show that our framework outperforms existing benchmarks without the explicit need for paired training examples from the two domains. We also show that our framework has the ability to learn with less data from thermal domain when using our approach.

* Accepted at Perception Beyond Visible Spectrum Workshop (CVPR 2019)

Via

Access Paper or Ask Questions