Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Menan Velayuthan

GADS: A Super Lightweight Model for Head Pose Estimation

Apr 22, 2025

Menan Velayuthan, Asiri Gawesha, Purushoth Velayuthan, Nuwan Kodagoda, Dharshana Kasthurirathna, Pradeepa Samarasinghe

Abstract:In human-computer interaction, head pose estimation profoundly influences application functionality. Although utilizing facial landmarks is valuable for this purpose, existing landmark-based methods prioritize precision over simplicity and model size, limiting their deployment on edge devices and in compute-poor environments. To bridge this gap, we propose \textbf{Grouped Attention Deep Sets (GADS)}, a novel architecture based on the Deep Set framework. By grouping landmarks into regions and employing small Deep Set layers, we reduce computational complexity. Our multihead attention mechanism extracts and combines inter-group information, resulting in a model that is $7.5\times$ smaller and executes $25\times$ faster than the current lightest state-of-the-art model. Notably, our method achieves an impressive reduction, being $4321\times$ smaller than the best-performing model. We introduce vanilla GADS and Hybrid-GADS (landmarks + RGB) and evaluate our models on three benchmark datasets -- AFLW2000, BIWI, and 300W-LP. We envision our architecture as a robust baseline for resource-constrained head pose estimation methods.

* 16 pages, 5 tables, 10 figures, not submitted to any conference or journal

Via

Access Paper or Ask Questions

Egalitarian Language Representation in Language Models: It All Begins with Tokenizers

Sep 17, 2024

Menan Velayuthan, Kengatharaiyer Sarveswaran

Abstract:Tokenizers act as a bridge between human language and the latent space of language models, influencing how language is represented in these models. Due to the immense popularity of English-Centric Large Language Models (LLMs), efforts are being made to adapt them for other languages. However, we demonstrate that, from a tokenization standpoint, not all tokenizers offer fair representation for complex script languages such as Tamil, Sinhala, and Hindi, primarily due to the choice of pre-tokenization methods. We go further to show that pre-tokenization plays a more critical role than the tokenization algorithm itself in achieving an egalitarian representation of these complex script languages. To address this, we introduce an improvement to the Byte Pair Encoding (BPE) algorithm by incorporating graphemes, which we term Grapheme Pair Encoding (GPE). Our experiments show that grapheme-based character extraction outperforms byte-level tokenizers for complex scripts. We validate this approach through experiments on Tamil, Sinhala, and Hindi.

* Content - 8 pages, References - 3 pages

Via

Access Paper or Ask Questions

Quality Does Matter: A Detailed Look at the Quality and Utility of Web-Mined Parallel Corpora

Feb 13, 2024

Surangika Ranathunga, Nisansa de Silva, Menan Velayuthan, Aloka Fernando, Charitha Rathnayake

Figure 1 for Quality Does Matter: A Detailed Look at the Quality and Utility of Web-Mined Parallel Corpora

Figure 2 for Quality Does Matter: A Detailed Look at the Quality and Utility of Web-Mined Parallel Corpora

Figure 3 for Quality Does Matter: A Detailed Look at the Quality and Utility of Web-Mined Parallel Corpora

Figure 4 for Quality Does Matter: A Detailed Look at the Quality and Utility of Web-Mined Parallel Corpora

Abstract:We conducted a detailed analysis on the quality of web-mined corpora for two low-resource languages (making three language pairs, English-Sinhala, English-Tamil and Sinhala-Tamil). We ranked each corpus according to a similarity measure and carried out an intrinsic and extrinsic evaluation on different portions of this ranked corpus. We show that there are significant quality differences between different portions of web-mined corpora and that the quality varies across languages and datasets. We also show that, for some web-mined datasets, Neural Machine Translation (NMT) models trained with their highest-ranked 25k portion can be on par with human-curated datasets.

Via

Access Paper or Ask Questions