Abstract:Spoken languages often utilise intonation, rhythm, intensity, and structure, to communicate intention, which can be interpreted differently depending on the rhythm of speech of their utterance. These speech acts provide the foundation of communication and are unique in expression to the language. Recent advancements in attention-based models, demonstrating their ability to learn powerful representations from multilingual datasets, have performed well in speech tasks and are ideal to model specific tasks in low resource languages. Here, we develop a novel multimodal approach combining two models, wav2vec2.0 for audio and MarianMT for text translation, by using multimodal attention fusion to predict speech acts in our prepared Bengali speech corpus. We also show that our model BeAts ($\underline{\textbf{Be}}$ngali speech acts recognition using Multimodal $\underline{\textbf{At}}$tention Fu$\underline{\textbf{s}}$ion) significantly outperforms both the unimodal baseline using only speech data and a simpler bimodal fusion using both speech and text data. Project page: https://soumitri2001.github.io/BeAts
Abstract:Metaheuristic algorithms are methods devised to efficiently solve computationally challenging optimization problems. Researchers have taken inspiration from various natural and physical processes alike to formulate meta-heuristics that have successfully provided near-optimal or optimal solutions to several engineering tasks. This chapter focuses on meta-heuristic algorithms modelled upon non-linear physical phenomena having a concrete optimization paradigm, having shown formidable exploration and exploitation abilities for such optimization problems. Specifically, this chapter focuses on several popular physics-based metaheuristics as well as describing the underlying unique physical processes associated with each algorithm.