Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kunal Shah

MAVEN: Multi-modal Attention for Valence-Arousal Emotion Network

Mar 16, 2025

Vrushank Ahire, Kunal Shah, Mudasir Nazir Khan, Nikhil Pakhale, Lownish Rai Sookha, M. A. Ganaie, Abhinav Dhall

Abstract:This paper introduces MAVEN (Multi-modal Attention for Valence-Arousal Emotion Network), a novel architecture for dynamic emotion recognition through dimensional modeling of affect. The model uniquely integrates visual, audio, and textual modalities via a bi-directional cross-modal attention mechanism with six distinct attention pathways, enabling comprehensive interactions between all modality pairs. Our proposed approach employs modality-specific encoders to extract rich feature representations from synchronized video frames, audio segments, and transcripts. The architecture's novelty lies in its cross-modal enhancement strategy, where each modality representation is refined through weighted attention from other modalities, followed by self-attention refinement through modality-specific encoders. Rather than directly predicting valence-arousal values, MAVEN predicts emotions in a polar coordinate form, aligning with psychological models of the emotion circumplex. Experimental evaluation on the Aff-Wild2 dataset demonstrates the effectiveness of our approach, with performance measured using Concordance Correlation Coefficient (CCC). The multi-stage architecture demonstrates superior ability to capture the complex, nuanced nature of emotional expressions in conversational videos, advancing the state-of-the-art (SOTA) in continuous emotion recognition in-the-wild. Code can be found at: https://github.com/Vrushank-Ahire/MAVEN_8th_ABAW.

Via

Access Paper or Ask Questions

Fusing uncalibrated IMUs and handheld smartphone video to reconstruct knee kinematics

May 27, 2024

J. D. Peiffer, Kunal Shah, Shawana Anarwala, Kayan Abdou, R. James Cotton

Abstract:Video and wearable sensor data provide complementary information about human movement. Video provides a holistic understanding of the entire body in the world while wearable sensors provide high-resolution measurements of specific body segments. A robust method to fuse these modalities and obtain biomechanically accurate kinematics would have substantial utility for clinical assessment and monitoring. While multiple video-sensor fusion methods exist, most assume that a time-intensive, and often brittle, sensor-body calibration process has already been performed. In this work, we present a method to combine handheld smartphone video and uncalibrated wearable sensor data at their full temporal resolution. Our monocular, video-only, biomechanical reconstruction already performs well, with only several degrees of error at the knee during walking compared to markerless motion capture. Reconstructing from a fusion of video and wearable sensor data further reduces this error. We validate this in a mixture of people with no gait impairments, lower limb prosthesis users, and individuals with a history of stroke. We also show that sensor data allows tracking through periods of visual occlusion.

* Accepted to International Conference on Biomedical Robotics and Biomechatronics 2024

Via

Access Paper or Ask Questions

Self-Supervised Learning of Gait-Based Biomarkers

Jul 30, 2023

R. James Cotton, J. D. Peiffer, Kunal Shah, Allison DeLillo, Anthony Cimorelli, Shawana Anarwala, Kayan Abdou, Tasos Karakostas

Figure 1 for Self-Supervised Learning of Gait-Based Biomarkers

Figure 2 for Self-Supervised Learning of Gait-Based Biomarkers

Figure 3 for Self-Supervised Learning of Gait-Based Biomarkers

Figure 4 for Self-Supervised Learning of Gait-Based Biomarkers

Abstract:Markerless motion capture (MMC) is revolutionizing gait analysis in clinical settings by making it more accessible, raising the question of how to extract the most clinically meaningful information from gait data. In multiple fields ranging from image processing to natural language processing, self-supervised learning (SSL) from large amounts of unannotated data produces very effective representations for downstream tasks. However, there has only been limited use of SSL to learn effective representations of gait and movement, and it has not been applied to gait analysis with MMC. One SSL objective that has not been applied to gait is contrastive learning, which finds representations that place similar samples closer together in the learned space. If the learned similarity metric captures clinically meaningful differences, this could produce a useful representation for many downstream clinical tasks. Contrastive learning can also be combined with causal masking to predict future timesteps, which is an appealing SSL objective given the dynamical nature of gait. We applied these techniques to gait analyses performed with MMC in a rehabilitation hospital from a diverse clinical population. We find that contrastive learning on unannotated gait data learns a representation that captures clinically meaningful information. We probe this learned representation using the framework of biomarkers and show it holds promise as both a diagnostic and response biomarker, by showing it can accurately classify diagnosis from gait and is responsive to inpatient therapy, respectively. We ultimately hope these learned representations will enable predictive and prognostic gait-based biomarkers that can facilitate precision rehabilitation through greater use of MMC to quantify movement in rehabilitation.

* Ambient Inteligence for Healthcare workshop at MICCAI 2023
* Accepted to Ambient Inteligence for Healthcare workshop at MICCAI 2023

Via

Access Paper or Ask Questions

Markerless Motion Capture and Biomechanical Analysis Pipeline

Mar 19, 2023

R. James Cotton, Allison DeLillo, Anthony Cimorelli, Kunal Shah, J. D. Peiffer, Shawana Anarwala, Kayan Abdou, Tasos Karakostas

Figure 1 for Markerless Motion Capture and Biomechanical Analysis Pipeline

Figure 2 for Markerless Motion Capture and Biomechanical Analysis Pipeline

Figure 3 for Markerless Motion Capture and Biomechanical Analysis Pipeline

Figure 4 for Markerless Motion Capture and Biomechanical Analysis Pipeline

Abstract:Markerless motion capture using computer vision and human pose estimation (HPE) has the potential to expand access to precise movement analysis. This could greatly benefit rehabilitation by enabling more accurate tracking of outcomes and providing more sensitive tools for research. There are numerous steps between obtaining videos to extracting accurate biomechanical results and limited research to guide many critical design decisions in these pipelines. In this work, we analyze several of these steps including the algorithm used to detect keypoints and the keypoint set, the approach to reconstructing trajectories for biomechanical inverse kinematics and optimizing the IK process. Several features we find important are: 1) using a recent algorithm trained on many datasets that produces a dense set of biomechanically-motivated keypoints, 2) using an implicit representation to reconstruct smooth, anatomically constrained marker trajectories for IK, 3) iteratively optimizing the biomechanical model to match the dense markers, 4) appropriate regularization of the IK process. Our pipeline makes it easy to obtain accurate biomechanical estimates of movement in a rehabilitation hospital.

Via

Access Paper or Ask Questions

Improved Trajectory Reconstruction for Markerless Pose Estimation

Mar 08, 2023

R. James Cotton, Anthony Cimorelli, Kunal Shah, Shawana Anarwala, Scott Uhlrich, Tasos Karakostas

Abstract:Markerless pose estimation allows reconstructing human movement from multiple synchronized and calibrated views, and has the potential to make movement analysis easy and quick, including gait analysis. This could enable much more frequent and quantitative characterization of gait impairments, allowing better monitoring of outcomes and responses to interventions. However, the impact of different keypoint detectors and reconstruction algorithms on markerless pose estimation accuracy has not been thoroughly evaluated. We tested these algorithmic choices on data acquired from a multicamera system from a heterogeneous sample of 25 individuals seen in a rehabilitation hospital. We found that using a top-down keypoint detector and reconstructing trajectories with an implicit function enabled accurate, smooth and anatomically plausible trajectories, with a noise in the step width estimates compared to a GaitRite walkway of only 8mm.

Via

Access Paper or Ask Questions

Reciprocal Multi-Robot Collision Avoidance with Asymmetric State Uncertainty

Jul 22, 2021

Kunal Shah, Guillermo Angeris, Mac Schwager

Figure 1 for Reciprocal Multi-Robot Collision Avoidance with Asymmetric State Uncertainty

Figure 2 for Reciprocal Multi-Robot Collision Avoidance with Asymmetric State Uncertainty

Figure 3 for Reciprocal Multi-Robot Collision Avoidance with Asymmetric State Uncertainty

Figure 4 for Reciprocal Multi-Robot Collision Avoidance with Asymmetric State Uncertainty

Abstract:We present a general decentralized formulation for a large class of collision avoidance methods and show that all collision avoidance methods of this form are guaranteed to be collision free. This class includes several existing algorithms in the literature as special cases. We then present a particular instance of this collision avoidance method, CARP (Collision Avoidance by Reciprocal Projections), that is effective even when the estimates of other agents' positions and velocities are noisy. The method's main computational step involves the solution of a small convex optimization problem, which can be quickly solved in practice, even on embedded platforms, making it practical to use on computationally-constrained robots such as quadrotors. This method can be extended to find smooth polynomial trajectories for higher dynamic systems such at quadrotors. We demonstrate this algorithm's performance in simulations and on a team of physical quadrotors. Our method finds optimal projections in a median time of 17.12ms for 285 instances of 100 randomly generated obstacles, and produces safe polynomial trajectories at over 60hz on-board quadrotors. Our paper is accompanied by an open source Julia implementation and ROS package.

* arXiv admin note: text overlap with arXiv:1905.12875

Via

Access Paper or Ask Questions

Fast Reciprocal Collision Avoidance Under Measurement Uncertainty

May 31, 2019

Guillermo Angeris, Kunal Shah, Mac Schwager

Figure 1 for Fast Reciprocal Collision Avoidance Under Measurement Uncertainty

Figure 2 for Fast Reciprocal Collision Avoidance Under Measurement Uncertainty

Figure 3 for Fast Reciprocal Collision Avoidance Under Measurement Uncertainty

Figure 4 for Fast Reciprocal Collision Avoidance Under Measurement Uncertainty

Abstract:We present a fully distributed collision avoidance algorithm based on convex optimization for a team of mobile robots. This method addresses the practical case in which agents sense each other via measurements from noisy on-board sensors with no inter-agent communication. Under some mild conditions, we provide guarantees on mutual collision avoidance for a broad class of policies including the one presented. Additionally, we provide numerical examples of computational performance and show that, in both 2D and 3D simulations, all agents avoid each other and reach their desired goals in spite of their uncertainty about the locations of other agents.

Via

Access Paper or Ask Questions

Isolated and Ensemble Audio Preprocessing Methods for Detecting Adversarial Examples against Automatic Speech Recognition

Sep 11, 2018

Krishan Rajaratnam, Kunal Shah, Jugal Kalita

Figure 1 for Isolated and Ensemble Audio Preprocessing Methods for Detecting Adversarial Examples against Automatic Speech Recognition

Figure 2 for Isolated and Ensemble Audio Preprocessing Methods for Detecting Adversarial Examples against Automatic Speech Recognition

Figure 3 for Isolated and Ensemble Audio Preprocessing Methods for Detecting Adversarial Examples against Automatic Speech Recognition

Figure 4 for Isolated and Ensemble Audio Preprocessing Methods for Detecting Adversarial Examples against Automatic Speech Recognition

Abstract:An adversarial attack is an exploitative process in which minute alterations are made to natural inputs, causing the inputs to be misclassified by neural models. In the field of speech recognition, this has become an issue of increasing significance. Although adversarial attacks were originally introduced in computer vision, they have since infiltrated the realm of speech recognition. In 2017, a genetic attack was shown to be quite potent against the Speech Commands Model. Limited-vocabulary speech classifiers, such as the Speech Commands Model, are used in a variety of applications, particularly in telephony; as such, adversarial examples produced by this attack pose as a major security threat. This paper explores various methods of detecting these adversarial examples with combinations of audio preprocessing. One particular combined defense incorporating compressions, speech coding, filtering, and audio panning was shown to be quite effective against the attack on the Speech Commands Model, detecting audio adversarial examples with 93.5% precision and 91.2% recall.

* Accepted for oral presentation at the 30th Conference on Computational Linguistics and Speech Processing (ROCLING 2018)

Via

Access Paper or Ask Questions