Abstract:Hierarchical Reinforcement Learning (HRL) effectively tackles complex tasks by decomposing them into structured policies. However, HRL agents often face challenges with efficient exploration and rapid adaptation. To address this, we integrate meta-learning into HRL to enhance the agent's ability to learn and adapt hierarchical policies swiftly. Our approach employs meta-learning for rapid task adaptation based on prior experience, while intrinsic motivation mechanisms encourage efficient exploration by rewarding novel state visits. Specifically, our agent uses a high-level policy to select among multiple low-level policies operating within custom grid environments. We utilize gradient-based meta-learning with differentiable inner-loop updates, enabling optimization across a curriculum of increasingly difficult tasks. Experimental results demonstrate that our meta-learned hierarchical agent significantly outperforms traditional HRL agents without meta-learning and intrinsic motivation. The agent exhibits accelerated learning, higher cumulative rewards, and improved success rates in complex grid environments. These findings suggest that integrating meta-learning with HRL, alongside curriculum learning and intrinsic motivation, substantially enhances the agent's capability to handle complex tasks.
Abstract:Multimodal learning involves integrating information from various modalities to enhance learning and comprehension. We compare three modality fusion strategies in person identification and verification by processing two modalities: voice and face. In this paper, a one-dimensional convolutional neural network is employed for x-vector extraction from voice, while the pre-trained VGGFace2 network and transfer learning are utilized for face modality. In addition, gammatonegram is used as speech representation in engagement with the Darknet19 pre-trained network. The proposed systems are evaluated using the K-fold cross-validation technique on the 118 speakers of the test set of the VoxCeleb2 dataset. The comparative evaluations are done for single-modality and three proposed multimodal strategies in equal situations. Results demonstrate that the feature fusion strategy of gammatonegram and facial features achieves the highest performance, with an accuracy of 98.37% in the person identification task. However, concatenating facial features with the x-vector reaches 0.62% for EER in verification tasks.