Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ayush Gupta

Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs

Sep 04, 2025

Ayush Gupta, Ramneet Kaur, Anirban Roy, Adam D. Cobb, Rama Chellappa, Susmit Jha

Figure 1 for Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs

Figure 2 for Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs

Figure 3 for Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs

Figure 4 for Polysemantic Dropout: Conformal OOD Detection for Specialized LLMs

Abstract:We propose a novel inference-time out-of-domain (OOD) detection algorithm for specialized large language models (LLMs). Despite achieving state-of-the-art performance on in-domain tasks through fine-tuning, specialized LLMs remain vulnerable to incorrect or unreliable outputs when presented with OOD inputs, posing risks in critical applications. Our method leverages the Inductive Conformal Anomaly Detection (ICAD) framework, using a new non-conformity measure based on the model's dropout tolerance. Motivated by recent findings on polysemanticity and redundancy in LLMs, we hypothesize that in-domain inputs exhibit higher dropout tolerance than OOD inputs. We aggregate dropout tolerance across multiple layers via a valid ensemble approach, improving detection while maintaining theoretical false alarm bounds from ICAD. Experiments with medical-specialized LLMs show that our approach detects OOD inputs better than baseline methods, with AUROC improvements of $2\%$ to $37\%$ when treating OOD datapoints as positives and in-domain test datapoints as negatives.

* Accepted to EMNLP 2025 main conference

Via

Access Paper or Ask Questions

TOGA: Temporally Grounded Open-Ended Video QA with Weak Supervision

Jun 11, 2025

Ayush Gupta, Anirban Roy, Rama Chellappa, Nathaniel D. Bastian, Alvaro Velasquez, Susmit Jha

Abstract:We address the problem of video question answering (video QA) with temporal grounding in a weakly supervised setup, without any temporal annotations. Given a video and a question, we generate an open-ended answer grounded with the start and end time. For this task, we propose TOGA: a vision-language model for Temporally Grounded Open-Ended Video QA with Weak Supervision. We instruct-tune TOGA to jointly generate the answer and the temporal grounding. We operate in a weakly supervised setup where the temporal grounding annotations are not available. We generate pseudo labels for temporal grounding and ensure the validity of these labels by imposing a consistency constraint between the question of a grounding response and the response generated by a question referring to the same temporal segment. We notice that jointly generating the answers with the grounding improves performance on question answering as well as grounding. We evaluate TOGA on grounded QA and open-ended QA tasks. For grounded QA, we consider the NExT-GQA benchmark which is designed to evaluate weakly supervised grounded question answering. For open-ended QA, we consider the MSVD-QA and ActivityNet-QA benchmarks. We achieve state-of-the-art performance for both tasks on these benchmarks.

Via

Access Paper or Ask Questions

MimicGait: A Model Agnostic approach for Occluded Gait Recognition using Correlational Knowledge Distillation

Jan 26, 2025

Ayush Gupta, Rama Chellappa

Abstract:Gait recognition is an important biometric technique over large distances. State-of-the-art gait recognition systems perform very well in controlled environments at close range. Recently, there has been an increased interest in gait recognition in the wild prompted by the collection of outdoor, more challenging datasets containing variations in terms of illumination, pitch angles, and distances. An important problem in these environments is that of occlusion, where the subject is partially blocked from camera view. While important, this problem has received little attention. Thus, we propose MimicGait, a model-agnostic approach for gait recognition in the presence of occlusions. We train the network using a multi-instance correlational distillation loss to capture both inter-sequence and intra-sequence correlations in the occluded gait patterns of a subject, utilizing an auxiliary Visibility Estimation Network to guide the training of the proposed mimic network. We demonstrate the effectiveness of our approach on challenging real-world datasets like GREW, Gait3D and BRIAR. We release the code in https://github.com/Ayush-00/mimicgait.

* Accepted to WACV 2025 as Poster

Via

Access Paper or Ask Questions

Question Answering on Patient Medical Records with Private Fine-Tuned LLMs

Jan 23, 2025

Sara Kothari, Ayush Gupta

Abstract:Healthcare systems continuously generate vast amounts of electronic health records (EHRs), commonly stored in the Fast Healthcare Interoperability Resources (FHIR) standard. Despite the wealth of information in these records, their complexity and volume make it difficult for users to retrieve and interpret crucial health insights. Recent advances in Large Language Models (LLMs) offer a solution, enabling semantic question answering (QA) over medical data, allowing users to interact with their health records more effectively. However, ensuring privacy and compliance requires edge and private deployments of LLMs. This paper proposes a novel approach to semantic QA over EHRs by first identifying the most relevant FHIR resources for a user query (Task1) and subsequently answering the query based on these resources (Task2). We explore the performance of privately hosted, fine-tuned LLMs, evaluating them against benchmark models such as GPT-4 and GPT-4o. Our results demonstrate that fine-tuned LLMs, while 250x smaller in size, outperform GPT-4 family models by 0.55% in F1 score on Task1 and 42% on Meteor Task in Task2. Additionally, we examine advanced aspects of LLM usage, including sequential fine-tuning, model self-evaluation (narcissistic evaluation), and the impact of training data size on performance. The models and datasets are available here: https://huggingface.co/genloop

Via

Access Paper or Ask Questions

You Can Run but not Hide: Improving Gait Recognition with Intrinsic Occlusion Type Awareness

Dec 04, 2023

Ayush Gupta, Rama Chellappa

Abstract:While gait recognition has seen many advances in recent years, the occlusion problem has largely been ignored. This problem is especially important for gait recognition from uncontrolled outdoor sequences at range - since any small obstruction can affect the recognition system. Most current methods assume the availability of complete body information while extracting the gait features. When parts of the body are occluded, these methods may hallucinate and output a corrupted gait signature as they try to look for body parts which are not present in the input at all. To address this, we exploit the learned occlusion type while extracting identity features from videos. Thus, in this work, we propose an occlusion aware gait recognition method which can be used to model intrinsic occlusion awareness into potentially any state-of-the-art gait recognition method. Our experiments on the challenging GREW and BRIAR datasets show that networks enhanced with this occlusion awareness perform better at recognition tasks than their counterparts trained on similar occlusions.

* This work has been accepted to WACV 2024 as an Oral paper

Via

Access Paper or Ask Questions

DroneARchery: Human-Drone Interaction through Augmented Reality with Haptic Feedback and Multi-UAV Collision Avoidance Driven by Deep Reinforcement Learning

Oct 14, 2022

Ekaterina Dorzhieva, Ahmed Baza, Ayush Gupta, Aleksey Fedoseev, Miguel Altamirano Cabrera, Ekaterina Karmanova, Dzmitry Tsetserukou

Figure 1 for DroneARchery: Human-Drone Interaction through Augmented Reality with Haptic Feedback and Multi-UAV Collision Avoidance Driven by Deep Reinforcement Learning

Figure 2 for DroneARchery: Human-Drone Interaction through Augmented Reality with Haptic Feedback and Multi-UAV Collision Avoidance Driven by Deep Reinforcement Learning

Figure 3 for DroneARchery: Human-Drone Interaction through Augmented Reality with Haptic Feedback and Multi-UAV Collision Avoidance Driven by Deep Reinforcement Learning

Figure 4 for DroneARchery: Human-Drone Interaction through Augmented Reality with Haptic Feedback and Multi-UAV Collision Avoidance Driven by Deep Reinforcement Learning

Abstract:We propose a novel concept of augmented reality (AR) human-drone interaction driven by RL-based swarm behavior to achieve intuitive and immersive control of a swarm formation of unmanned aerial vehicles. The DroneARchery system developed by us allows the user to quickly deploy a swarm of drones, generating flight paths simulating archery. The haptic interface LinkGlide delivers a tactile stimulus of the bowstring tension to the forearm to increase the precision of aiming. The swarm of released drones dynamically avoids collisions between each other, the drone following the user, and external obstacles with behavior control based on deep reinforcement learning. The developed concept was tested in the scenario with a human, where the user shoots from a virtual bow with a real drone to hit the target. The human operator observes the ballistic trajectory of the drone in an AR and achieves a realistic and highly recognizable experience of the bowstring tension through the haptic display. The experimental results revealed that the system improves trajectory prediction accuracy by 63.3% through applying AR technology and conveying haptic feedback of pulling force. DroneARchery users highlighted the naturalness (4.3 out of 5 point Likert scale) and increased confidence (4.7 out of 5) when controlling the drone. We have designed the tactile patterns to present four sliding distances (tension) and three applied force levels (stiffness) of the haptic display. Users demonstrated the ability to distinguish tactile patterns produced by the haptic display representing varying bowstring tension(average recognition rate is of 72.8%) and stiffness (average recognition rate is of 94.2%). The novelty of the research is the development of an AR-based approach for drone control that does not require special skills and training from the operator.

* Accepted to the IEEE Int. Symp. on Mixed and Augmented Reality (ISMAR 2022). Copyright 20XX IEEE. Personal use of this material is permitted

Via

Access Paper or Ask Questions

SwarMan: Anthropomorphic Swarm of Drones Avatar with Body Tracking and Deep Learning-Based Gesture Recognition

Oct 04, 2022

Ahmed Baza, Ayush Gupta, Ekaterina Dorzhieva, Aleksey Fedoseev, Dzmitry Tsetserukou

Abstract:Anthropomorphic robot avatars present a conceptually novel approach to remote affective communication, allowing people across the world a wider specter of emotional and social exchanges over traditional 2D and 3D image data. However, there are several limitations of current telepresence robots, such as the high weight, complexity of the system that prevents its fast deployment, and the limited workspace of the avatars mounted on either static or wheeled mobile platforms. In this paper, we present a novel concept of telecommunication through a robot avatar based on an anthropomorphic swarm of drones; SwarMan. The developed system consists of nine nanocopters controlled remotely by the operator through a gesture recognition interface. SwarMan allows operators to communicate by directly following their motions and by recognizing one of the prerecorded emotional patterns, thus rendering the captured emotion as illumination on the drones. The LSTM MediaPipe network was trained on a collected dataset of 600 short videos with five emotional gestures. The accuracy of achieved emotion recognition was 97% on the test dataset. As communication through the swarm avatar significantly changes the visual appearance of the operator, we investigated the ability of the users to recognize and respond to emotions performed by the swarm of drones. The experimental results revealed a high consistency between the users in rating emotions. Additionally, users indicated low physical demand (2.25 on the Likert scale) and were satisfied with their performance (1.38 on the Likert scale) when communicating by the SwarMan interface.

* 6 pages, 8 figures, IEEE SMC 2022 conference

Via

Access Paper or Ask Questions

DandelionTouch: High Fidelity Haptic Rendering of Soft Objects in VR by a Swarm of Drones

Sep 22, 2022

Aleksey Fedoseev, Ahmed Baza, Ayush Gupta, Ekaterina Dorzhieva, Riya Neelesh Gujarathi, Dzmitry Tsetserukou

Figure 1 for DandelionTouch: High Fidelity Haptic Rendering of Soft Objects in VR by a Swarm of Drones

Figure 2 for DandelionTouch: High Fidelity Haptic Rendering of Soft Objects in VR by a Swarm of Drones

Figure 3 for DandelionTouch: High Fidelity Haptic Rendering of Soft Objects in VR by a Swarm of Drones

Figure 4 for DandelionTouch: High Fidelity Haptic Rendering of Soft Objects in VR by a Swarm of Drones

Abstract:To achieve high fidelity haptic rendering of soft objects in a high mobility virtual environment, we propose a novel haptic display DandelionTouch. The tactile actuators are delivered to the fingertips of the user by a swarm of drones. Users of DandelionTouch are capable of experiencing tactile feedback in a large space that is not limited by the device's working area. Importantly, they will not experience muscle fatigue during long interactions with virtual objects. Hand tracking and swarm control algorithm allow guiding the swarm with hand motions and avoid collisions inside the formation. Several topologies of the impedance connection between swarm units were investigated in this research. The experiment, in which drones performed a point following task on a square trajectory in real-time, revealed that drones connected in a Star topology performed the trajectory with low mean positional error (RMSE decreased by 20.6% in comparison with other impedance topologies and by 40.9% in comparison with potential field-based swarm control). The achieved velocities of the drones in all formations with impedance behavior were 28% higher than for the swarm controlled with the potential field algorithm. Additionally, the perception of several vibrotactile patterns was evaluated in a user study with 7 participants. The study has shown that the proposed combination of temporal delay and frequency modulation allows users to successfully recognize the surface property and motion direction in VR simultaneously (mean recognition rate of 70%, maximum of 93%). DandelionTouch suggests a new type of haptic feedback in VR systems where no hand-held or wearable interface is required.

* Accepted to the 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Copyright 20XX IEEE. Personal use of this material is permitted

Via

Access Paper or Ask Questions

SwarmHawk: Self-Sustaining Multi-Agent System for Landing on a Moving Platform through an Agent Supervision

Jun 17, 2022

Ayush Gupta, Ekaterina Dorzhieva, Ahmed Baza, Mert Alper, Aleksey Fedoseev, Dzmitry Tsetserukou

Figure 1 for SwarmHawk: Self-Sustaining Multi-Agent System for Landing on a Moving Platform through an Agent Supervision

Figure 2 for SwarmHawk: Self-Sustaining Multi-Agent System for Landing on a Moving Platform through an Agent Supervision

Figure 3 for SwarmHawk: Self-Sustaining Multi-Agent System for Landing on a Moving Platform through an Agent Supervision

Figure 4 for SwarmHawk: Self-Sustaining Multi-Agent System for Landing on a Moving Platform through an Agent Supervision

Abstract:Heterogeneous teams of mobile robots and UAVs are offering a substantial benefit in an autonomous exploration of the environment. Nevertheless, although joint exploration scenarios for such systems are widely discussed, they are still suffering from low adaptability to changes in external conditions and faults of swarm agents during the UAV docking. We propose a novel vision-based drone swarm docking system for robust landing on a moving platform when one of the agents lost its position signal. The proposed SwarmHawk system relies on vision-based detection for the mobile platform tracking and navigation of its agents. Each drone of the swarm carries an RGB camera and AprilTag3 QR-code marker on board. SwarmHawk can switch between two modes of operation, acting as a homogeneous swarm in case of global UAV localization or assigning leader drones to navigate its neighbors in case of a camera fault in one of the drones or global localization failure. Two experiments were performed to evaluate SwarmHawk's performance under the global and local localization with static and moving platforms. The experimental results revealed a sufficient accuracy in the swarm landing task on a static mobile platform (error of 4.2 cm in homogeneous formation and 1.9 cm in leader-follower formation) and on moving platform (error of 6.9 cm in homogeneous formation and 4.7 cm in leader-follower formation). Moreover, the drones showed a good landing on a platform moving along a complex trajectory (average error of 19.4 cm) in leader-follower formation. The proposed SwarmHawk technology can be potentially applied in various swarm scenarios, including complex environment exploration, inspection, and drone delivery.

* Accepted paper at IEEE International Conference on Unmanned Aircraft System (ICUAS 2022), IEEE copyright

Via

Access Paper or Ask Questions

SwarmHive: Heterogeneous Swarm of Drones for Robust Autonomous Landing on Moving Robot

Jun 17, 2022

Ayush Gupta, Ahmed Baza, Ekaterina Dorzhieva, Mert Alper, Mariia Makarova, Stepan Perminov, Aleksey Fedoseev, Dzmitry Tsetserukou

Figure 1 for SwarmHive: Heterogeneous Swarm of Drones for Robust Autonomous Landing on Moving Robot

Figure 2 for SwarmHive: Heterogeneous Swarm of Drones for Robust Autonomous Landing on Moving Robot

Figure 3 for SwarmHive: Heterogeneous Swarm of Drones for Robust Autonomous Landing on Moving Robot

Figure 4 for SwarmHive: Heterogeneous Swarm of Drones for Robust Autonomous Landing on Moving Robot

Abstract:The paper focuses on a heterogeneous swarm of drones to achieve a dynamic landing of formation on a moving robot. This challenging task was not yet achieved by scientists. The key technology is that instead of facilitating each agent of the swarm of drones with computer vision that considerably increases the payload and shortens the flight time, we propose to install only one camera on the leader drone. The follower drones receive the commands from the leader UAV and maintain a collision-free trajectory with the artificial potential field. The experimental results revealed a high accuracy of the swarm landing on a static mobile platform (RMSE of 4.48 cm). RMSE of swarm landing on the mobile platform moving with the maximum velocities of 1.0 m/s and 1.5 m/s equals 8.76 cm and 8.98 cm, respectively. The proposed SwarmHive technology will allow the time-saving landing of the swarm for further drone recharging. This will make it possible to achieve self-sustainable operation of a multi-agent robotic system for such scenarios as rescue operations, inspection and maintenance, autonomous warehouse inventory, cargo delivery, and etc.

* Accepted paper at IEEE Vehicular Technology Conference 2022 (IEEE VTC 2022), IEEE copyright

Via

Access Paper or Ask Questions