Abstract:Scaling deep learning models has been at the heart of recent revolutions in language modelling and image generation. Practitioners have observed a strong relationship between model size, dataset size, and performance. However, structure-based architectures such as Graph Neural Networks (GNNs) are yet to show the benefits of scale mainly due to the lower efficiency of sparse operations, large data requirements, and lack of clarity about the effectiveness of various architectures. We address this drawback of GNNs by studying their scaling behavior. Specifically, we analyze message-passing networks, graph Transformers, and hybrid architectures on the largest public collection of 2D molecular graphs. For the first time, we observe that GNNs benefit tremendously from the increasing scale of depth, width, number of molecules, number of labels, and the diversity in the pretraining datasets, resulting in a 30.25% improvement when scaling to 1 billion parameters and 28.98% improvement when increasing size of dataset to eightfold. We further demonstrate strong finetuning scaling behavior on 38 tasks, outclassing previous large models. We hope that our work paves the way for an era where foundational GNNs drive pharmaceutical drug discovery.
Abstract:Multi-Agent Reinforcement Learning (MARL) has demonstrated significant success in training decentralised policies in a centralised manner by making use of value factorization methods. However, addressing surprise across spurious states and approximation bias remain open problems for multi-agent settings. We introduce the Energy-based MIXer (EMIX), an algorithm which minimizes surprise utilizing the energy across agents. Our contributions are threefold; (1) EMIX introduces a novel surprise minimization technique across multiple agents in the case of multi-agent partially-observable settings. (2) EMIX highlights the first practical use of energy functions in MARL (to our knowledge) with theoretical guarantees and experiment validations of the energy operator. Lastly, (3) EMIX presents a novel technique for addressing overestimation bias across agents in MARL. When evaluated on a range of challenging StarCraft II micromanagement scenarios, EMIX demonstrates consistent state-of-the-art performance for multi-agent surprise minimization. Moreover, our ablation study highlights the necessity of the energy-based scheme and the need for elimination of overestimation bias in MARL. Our implementation of EMIX and videos of agents are available at https://karush17.github.io/emix-web/.
Abstract:Advances in Reinforcement Learning (RL) have successfully tackled sample efficiency and overestimation bias. However, these methods often fall short of scalable performance. On the other hand, genetic methods provide scalability but depict hyperparameter sensitivity to evolutionary operations. We present the Evolution-based Soft Actor-Critic (ESAC), a scalable RL algorithm. Our contributions are threefold; ESAC (1) abstracts exploration from exploitation by combining Evolution Strategies (ES) with Soft Actor-Critic (SAC), (2) provides dominant skill transfer between offsprings by making use of soft winner selections and genetic crossovers in hindsight and (3) improves hyperparameter sensitivity in evolutions using Automatic Mutation Tuning (AMT). AMT gradually replaces the entropy framework of SAC allowing the population to succeed at the task while acting as randomly as possible, without making use of backpropagation updates. On a range of challenging control tasks consisting of high-dimensional action spaces and sparse rewards, ESAC demonstrates state-of-the-art performance and sample efficiency equivalent to SAC. ESAC demonstrates scalability comparable to ES on the basis of hardware resources and algorithm overhead. A complete implementation of ESAC with notes on reproducibility and videos can be found at the project website https://karush17.github.io/esac-web/.
Abstract:Sign Language is used by the deaf community all over world. The work presented here proposes a novel one-dimensional deep capsule network (CapsNet) architecture for continuous Indian Sign Language recognition by means of signals obtained from a custom designed wearable IMU system. The performance of the proposed CapsNet architecture is assessed by altering dynamic routing between capsule layers. The proposed CapsNet yields improved accuracy values of 94% for 3 routings and 92.50% for 5 routings in comparison with the convolutional neural network (CNN) that yields an accuracy of 87.99%. Improved learning of the proposed architecture is also validated by spatial activations depicting excited units at the predictive layer. Finally, a novel non-cooperative pick-and-predict competition is designed between CapsNet and CNN. Higher value of Nash equilibrium for CapsNet as compared to CNN indicates the suitability of the proposed approach.
Abstract:Recent advancements in diagnostic learning and development of gesture-based human machine interfaces have driven surface electromyography (sEMG) towards significant importance. Analysis of hand gestures requires an accurate assessment of sEMG signals. The proposed work presents a novel sequential master-slave architecture consisting of deep neural networks (DNNs) for classification of signs from the Indian sign language using signals recorded from multiple sEMG channels. The performance of the master-slave network is augmented by leveraging additional synthetic feature data generated by long short term memory networks. Performance of the proposed network is compared to that of a conventional DNN prior to and after the addition of synthetic data. Up to 14% improvement is observed in the conventional DNN and up to 9% improvement in master-slave network on addition of synthetic data with an average accuracy value of 93.5% asserting the suitability of the proposed approach.
Abstract:Surface electromyography (sEMG) has gained significant importance during recent advancements in consumer electronics for healthcare systems, gesture analysis and recognition and sign language communication. For such a system, it is imperative to determine the regions of activity in a continuously recorded sEMG signal. The proposed work provides a novel activity detection approach based on Hidden Markov Models (HMM) using sEMG signals recorded when various hand gestures are performed. Detection procedure is designed based on a probabilistic outlook by making use of mathematical models. The requirement of a threshold for activity detection is obviated making it subject and activity independent. Correctness of the predicted outputs is asserted by classifying the signal segments around the detected transition regions as activity or rest. Classified outputs are compared with the transition regions in a stimulus given to the subject to perform the activity. The activity onsets are detected with an average of 96.25% accuracy whereas the activity termination regions with an average of 87.5% accuracy with the considered set of six activities and four subjects.
Abstract:Surface electromyography (sEMG) is becoming exceeding useful in applications involving analysis of human motion such as in human-machine interface, assistive technology, healthcare and prosthetic development. The proposed work presents a novel dual stage classification approach for classification of grasping gestures from sEMG signals. A statistical assessment of these activities is presented to determine the similar characteristics between the considered activities. Similar activities are grouped together. In the first stage of classification, an activity is identified as belonging to a group, which is then further classified as one of the activities within the group in the second stage of classification. The performance of the proposed approach is compared to the conventional single stage classification approach in terms of classification accuracies. The classification accuracies obtained using the proposed dual stage classification are significantly higher as compared to that for single stage classification.
Abstract:IMUs are gaining significant importance in the field of hand gesture analysis, trajectory detection and kinematic functional study. An Inertial Measurement Unit (IMU) consists of tri-axial accelerometers and gyroscopes which can together be used for formation analysis. The paper presents a novel classification approach using a Deep Neural Network (DNN) for classifying hand gestures obtained from wearable IMU sensors. An optimization objective is set for the classifier in order to reduce correlation between the activities and fit the signal-set with best performance parameters. Training of the network is carried out by feed-forward computation of the input features followed by the back-propagation of errors. The predicted outputs are analyzed in the form of classification accuracies which are then compared to the conventional classification schemes of SVM and kNN. A 3-5% improvement in accuracies is observed in the case of DNN classification. Results are presented for the recorded accelerometer and gyroscope signals and the considered classification schemes.
Abstract:Advancements in gesture recognition algorithms have led to a significant growth in sign language translation. By making use of efficient intelligent models, signs can be recognized with precision. The proposed work presents a novel one-dimensional Convolutional Neural Network (CNN) array architecture for recognition of signs from the Indian sign language using signals recorded from a custom designed wearable IMU device. The IMU device makes use of tri-axial accelerometer and gyroscope. The signals recorded using the IMU device are segregated on the basis of their context, such as whether they correspond to signing for a general sentence or an interrogative sentence. The array comprises of two individual CNNs, one classifying the general sentences and the other classifying the interrogative sentence. Performances of individual CNNs in the array architecture are compared to that of a conventional CNN classifying the unsegregated dataset. Peak classification accuracies of 94.20% for general sentences and 95.00% for interrogative sentences achieved with the proposed CNN array in comparison to 93.50% for conventional CNN assert the suitability of the proposed approach.