Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Antar Mazumder

A Transformer-in-Transformer Network Utilizing Knowledge Distillation for Image Recognition

Feb 24, 2025

Dewan Tauhid Rahman, Yeahia Sarker, Antar Mazumder, Md. Shamim Anower

Abstract:This paper presents a novel knowledge distillation neural architecture leveraging efficient transformer networks for effective image classification. Natural images display intricate arrangements encompassing numerous extraneous elements. Vision transformers utilize localized patches to compute attention. However, exclusive dependence on patch segmentation proves inadequate in sufficiently encompassing the comprehensive nature of the image. To address this issue, we have proposed an inner-outer transformer-based architecture, which gives attention to the global and local aspects of the image. Moreover, The training of transformer models poses significant challenges due to their demanding resource, time, and data requirements. To tackle this, we integrate knowledge distillation into the architecture, enabling efficient learning. Leveraging insights from a larger teacher model, our approach enhances learning efficiency and effectiveness. Significantly, the transformer-in-transformer network acquires lightweight characteristics by means of distillation conducted within the feature extraction layer. Our featured network's robustness is established through substantial experimentation on the MNIST, CIFAR10, and CIFAR100 datasets, demonstrating commendable top-1 and top-5 accuracy. The conducted ablative analysis comprehensively validates the effectiveness of the chosen parameters and settings, showcasing their superiority against contemporary methodologies. Remarkably, the proposed Transformer-in-Transformer Network (TITN) model achieves impressive performance milestones across various datasets: securing the highest top-1 accuracy of 74.71% and a top-5 accuracy of 92.28% for the CIFAR100 dataset, attaining an unparalleled top-1 accuracy of 92.03% and top-5 accuracy of 99.80% for the CIFAR-10 dataset, and registering an exceptional top-1 accuracy of 99.56% for the MNIST dataset.

Via

Access Paper or Ask Questions

A Cost-effective, Stand-alone, and Real-time TinyML-Based Gait Diagnosis Unit Aimed at Lower-limb Robotic Prostheses and Exoskeletons

Nov 13, 2024

Zarin Anjum Madhiha, Antar Mazumder, Sohani Munteha Hiam

Abstract:Robotic prostheses and exoskeletons can do wonders compared to their non-robotic counterpart. However, in a cost-soaring world where 1 in every 10 patients has access to normal medical prostheses, access to advanced ones is, unfortunately, extremely limited especially due to their high cost, a significant portion of which is contributed to by the diagnosis and controlling units. However, affordability is often not a major concern for developing such devices as with cost reduction, performance is also found to be deducted due to the cost vs. performance trade-off. Considering the gravity of such circumstances, the goal of this research was to propose an affordable wearable real-time gait diagnosis unit (GDU) aimed at robotic prostheses and exoskeletons. As a proof of concept, it has also developed the GDU prototype which leveraged TinyML to run two parallel quantized int8 models into an ESP32 NodeMCU development board (7.30 USD) to effectively classify five gait scenarios (idle, walk, run, hopping, and skip) and generate an anomaly score based on acceleration data received from two attached IMUs. The developed wearable gait diagnosis stand-alone unit could be fitted to any prosthesis or exoskeleton and could effectively classify the gait scenarios with an overall accuracy of 92% and provide anomaly scores within 95-96 ms with only 3 seconds of gait data in real-time.

Via

Access Paper or Ask Questions

L-VITeX: Light-weight Visual Intuition for Terrain Exploration

Oct 10, 2024

Antar Mazumder, Zarin Anjum Madhiha

Abstract:This paper presents L-VITeX, a lightweight visual intuition system for terrain exploration designed for resource-constrained robots and swarms. L-VITeX aims to provide a hint of Regions of Interest (RoIs) without computationally expensive processing. By utilizing the Faster Objects, More Objects (FOMO) tinyML architecture, the system achieves high accuracy (>99%) in RoI detection while operating on minimal hardware resources (Peak RAM usage < 50 KB) with near real-time inference (<200 ms). The paper evaluates L-VITeX's performance across various terrains, including mountainous areas, underwater shipwreck debris regions, and Martian rocky surfaces. Additionally, it demonstrates the system's application in 3D mapping using a small mobile robot run by ESP32-Cam and Gaussian Splats (GS), showcasing its potential to enhance exploration efficiency and decision-making.

Via

Access Paper or Ask Questions