for the Alzheimer's Disease Neuroimaging Initiative
Abstract:Transformer models have demonstrated remarkable success in many domains such as natural language processing (NLP) and computer vision. With the growing interest in transformer-based architectures, they are now utilized for gesture recognition. So, we also explore and devise a novel ConvMixFormer architecture for dynamic hand gestures. The transformers use quadratic scaling of the attention features with the sequential data, due to which these models are computationally complex and heavy. We have considered this drawback of the transformer and designed a resource-efficient model that replaces the self-attention in the transformer with the simple convolutional layer-based token mixer. The computational cost and the parameters used for the convolution-based mixer are comparatively less than the quadratic self-attention. Convolution-mixer helps the model capture the local spatial features that self-attention struggles to capture due to their sequential processing nature. Further, an efficient gate mechanism is employed instead of a conventional feed-forward network in the transformer to help the model control the flow of features within different stages of the proposed model. This design uses fewer learnable parameters which is nearly half the vanilla transformer that helps in fast and efficient training. The proposed method is evaluated on NVidia Dynamic Hand Gesture and Briareo datasets and our model has achieved state-of-the-art results on single and multimodal inputs. We have also shown the parameter efficiency of the proposed ConvMixFormer model compared to other methods. The source code is available at https://github.com/mallikagarg/ConvMixFormer.
Abstract:In this paper, we introduce a novel Multiscale Video Transformer Network (MVTN) for dynamic hand gesture recognition, since multiscale features can extract features with variable size, pose, and shape of hand which is a challenge in hand gesture recognition. The proposed model incorporates a multiscale feature hierarchy to capture diverse levels of detail and context within hand gestures which enhances the model's ability. This multiscale hierarchy is obtained by extracting different dimensions of attention in different transformer stages with initial stages to model high-resolution features and later stages to model low-resolution features. Our approach also leverages multimodal data, utilizing depth maps, infrared data, and surface normals along with RGB images from NVGesture and Briareo datasets. Experiments show that the proposed MVTN achieves state-of-the-art results with less computational complexity and parameters. The source code is available at https://github.com/mallikagarg/MVTN.
Abstract:Transformer model have achieved state-of-the-art results in many applications like NLP, classification, etc. But their exploration in gesture recognition task is still limited. So, we propose a novel GestFormer architecture for dynamic hand gesture recognition. The motivation behind this design is to propose a resource efficient transformer model, since transformers are computationally expensive and very complex. So, we propose to use a pooling based token mixer named PoolFormer, since it uses only pooling layer which is a non-parametric layer instead of quadratic attention. The proposed model also leverages the space-invariant features of the wavelet transform and also the multiscale features are selected using multi-scale pooling. Further, a gated mechanism helps to focus on fine details of the gesture with the contextual information. This enhances the performance of the proposed model compared to the traditional transformer with fewer parameters, when evaluated on dynamic hand gesture datasets, NVidia Dynamic Hand Gesture and Briareo datasets. To prove the efficacy of the proposed model, we have experimented on single as well multimodal inputs such as infrared, normals, depth, optical flow and color images. We have also compared the proposed GestFormer in terms of resource efficiency and number of operations. The source code is available at https://github.com/mallikagarg/GestFormer.
Abstract:Interest in an electronic health record-based computational model that can accurately predict a patient's risk of sepsis at a given point in time has grown rapidly in the last several years. Like other EHR vendors, the Epic Systems Corporation has developed a proprietary sepsis prediction model (ESPM). Epic developed the model using data from three health systems and penalized logistic regression. Demographic, comorbidity, vital sign, laboratory, medication, and procedural variables contribute to the model. The objective of this project was to compare the predictive performance of the ESPM with a regional health system's current Early Warning Score-based sepsis detection program.
Abstract:Associating genetic markers with a multidimensional phenotype is an important yet challenging problem. In this work, we establish the equivalence between two popular methods: kernel-machine regression (KMR), and kernel distance covariance (KDC). KMR is a semiparametric regression frameworks that models the covariate effects parametrically, while the genetic markers are considered non-parametrically. KDC represents a class of methods that includes distance covariance (DC) and Hilbert-Schmidt Independence Criterion (HSIC), which are nonparametric tests of independence. We show the equivalence between the score test of KMR and the KDC statistic under certain conditions. This result leads to a novel generalization of the KDC test that incorporates the covariates. Our contributions are three-fold: (1) establishing the equivalence between KMR and KDC; (2) showing that the principles of kernel machine regression can be applied to the interpretation of KDC; (3) the development of a broader class of KDC statistics, that the members are the quantities of different kernels. We demonstrate the proposals using simulation studies. Data from the Alzheimer's Disease Neuroimaging Initiative (ADNI) is used to explore the association between the genetic variants on gene \emph{FLJ16124} and phenotypes represented in 3D structural brain MR images adjusting for age and gender. The results suggest that SNPs of \emph{FLJ16124} exhibit strong pairwise interaction effects that are correlated to the changes of brain region volumes.