Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

M Ashraful Amin

Center for Computational & Data Sciences, Independent University, Bangladesh

FedCTTA: A Collaborative Approach to Continual Test-Time Adaptation in Federated Learning

May 19, 2025

Rakibul Hasan Rajib, Md Akil Raihan Iftee, Mir Sazzat Hossain, A. K. M. Mahbubur Rahman, Sajib Mistry, M Ashraful Amin, Amin Ahsan Ali

Abstract:Federated Learning (FL) enables collaborative model training across distributed clients without sharing raw data, making it ideal for privacy-sensitive applications. However, FL models often suffer performance degradation due to distribution shifts between training and deployment. Test-Time Adaptation (TTA) offers a promising solution by allowing models to adapt using only test samples. However, existing TTA methods in FL face challenges such as computational overhead, privacy risks from feature sharing, and scalability concerns due to memory constraints. To address these limitations, we propose Federated Continual Test-Time Adaptation (FedCTTA), a privacy-preserving and computationally efficient framework for federated adaptation. Unlike prior methods that rely on sharing local feature statistics, FedCTTA avoids direct feature exchange by leveraging similarity-aware aggregation based on model output distributions over randomly generated noise samples. This approach ensures adaptive knowledge sharing while preserving data privacy. Furthermore, FedCTTA minimizes the entropy at each client for continual adaptation, enhancing the model's confidence in evolving target distributions. Our method eliminates the need for server-side training during adaptation and maintains a constant memory footprint, making it scalable even as the number of clients or training rounds increases. Extensive experiments show that FedCTTA surpasses existing methods across diverse temporal and spatial heterogeneity scenarios.

* 8 pages, 5 figures, Accepted In IJCNN 2025

Via

Access Paper or Ask Questions

DM-Codec: Distilling Multimodal Representations for Speech Tokenization

Oct 19, 2024

Md Mubtasim Ahasan, Md Fahim, Tasnim Mohiuddin, A K M Mahbubur Rahman, Aman Chadha, Tariq Iqbal, M Ashraful Amin, Md Mofijul Islam, Amin Ahsan Ali

Figure 1 for DM-Codec: Distilling Multimodal Representations for Speech Tokenization

Figure 2 for DM-Codec: Distilling Multimodal Representations for Speech Tokenization

Figure 3 for DM-Codec: Distilling Multimodal Representations for Speech Tokenization

Figure 4 for DM-Codec: Distilling Multimodal Representations for Speech Tokenization

Abstract:Recent advancements in speech-language models have yielded significant improvements in speech tokenization and synthesis. However, effectively mapping the complex, multidimensional attributes of speech into discrete tokens remains challenging. This process demands acoustic, semantic, and contextual information for precise speech representations. Existing speech representations generally fall into two categories: acoustic tokens from audio codecs and semantic tokens from speech self-supervised learning models. Although recent efforts have unified acoustic and semantic tokens for improved performance, they overlook the crucial role of contextual representation in comprehensive speech modeling. Our empirical investigations reveal that the absence of contextual representations results in elevated Word Error Rate (WER) and Word Information Lost (WIL) scores in speech transcriptions. To address these limitations, we propose two novel distillation approaches: (1) a language model (LM)-guided distillation method that incorporates contextual information, and (2) a combined LM and self-supervised speech model (SM)-guided distillation technique that effectively distills multimodal representations (acoustic, semantic, and contextual) into a comprehensive speech tokenizer, termed DM-Codec. The DM-Codec architecture adopts a streamlined encoder-decoder framework with a Residual Vector Quantizer (RVQ) and incorporates the LM and SM during the training process. Experiments show DM-Codec significantly outperforms state-of-the-art speech tokenization models, reducing WER by up to 13.46%, WIL by 9.82%, and improving speech quality by 5.84% and intelligibility by 1.85% on the LibriSpeech benchmark dataset. The code, samples, and model checkpoints are available at https://github.com/mubtasimahasan/DM-Codec.

Via

Access Paper or Ask Questions

BD-SAT: High-resolution Land Use Land Cover Dataset & Benchmark Results for Developing Division: Dhaka, BD

Jun 09, 2024

Ovi Paul, Abu Bakar Siddik Nayem, Anis Sarker, Amin Ahsan Ali, M Ashraful Amin, AKM Mahbubur Rahman

Figure 1 for BD-SAT: High-resolution Land Use Land Cover Dataset & Benchmark Results for Developing Division: Dhaka, BD

Figure 2 for BD-SAT: High-resolution Land Use Land Cover Dataset & Benchmark Results for Developing Division: Dhaka, BD

Figure 3 for BD-SAT: High-resolution Land Use Land Cover Dataset & Benchmark Results for Developing Division: Dhaka, BD

Figure 4 for BD-SAT: High-resolution Land Use Land Cover Dataset & Benchmark Results for Developing Division: Dhaka, BD

Abstract:Land Use Land Cover (LULC) analysis on satellite images using deep learning-based methods is significantly helpful in understanding the geography, socio-economic conditions, poverty levels, and urban sprawl in developing countries. Recent works involve segmentation with LULC classes such as farmland, built-up areas, forests, meadows, water bodies, etc. Training deep learning methods on satellite images requires large sets of images annotated with LULC classes. However, annotated data for developing countries are scarce due to a lack of funding, absence of dedicated residential/industrial/economic zones, a large population, and diverse building materials. BD-SAT provides a high-resolution dataset that includes pixel-by-pixel LULC annotations for Dhaka metropolitan city and surrounding rural/urban areas. Using a strict and standardized procedure, the ground truth is created using Bing satellite imagery with a ground spatial distance of 2.22 meters per pixel. A three-stage, well-defined annotation process has been followed with support from GIS experts to ensure the reliability of the annotations. We performed several experiments to establish benchmark results. The results show that the annotated BD-SAT is sufficient to train large deep learning models with adequate accuracy for five major LULC classes: forest, farmland, built-up areas, water bodies, and meadows.

* 26 pages, 15 figures and 12 tables

Via

Access Paper or Ask Questions

Morphological Classification of Radio Galaxies using Semi-Supervised Group Equivariant CNNs

May 31, 2023

Mir Sazzat Hossain, Sugandha Roy, K. M. B. Asad, Arshad Momen, Amin Ahsan Ali, M Ashraful Amin, A. K. M. Mahbubur Rahman

Abstract:Out of the estimated few trillion galaxies, only around a million have been detected through radio frequencies, and only a tiny fraction, approximately a thousand, have been manually classified. We have addressed this disparity between labeled and unlabeled images of radio galaxies by employing a semi-supervised learning approach to classify them into the known Fanaroff-Riley Type I (FRI) and Type II (FRII) categories. A Group Equivariant Convolutional Neural Network (G-CNN) was used as an encoder of the state-of-the-art self-supervised methods SimCLR (A Simple Framework for Contrastive Learning of Visual Representations) and BYOL (Bootstrap Your Own Latent). The G-CNN preserves the equivariance for the Euclidean Group E(2), enabling it to effectively learn the representation of globally oriented feature maps. After representation learning, we trained a fully-connected classifier and fine-tuned the trained encoder with labeled data. Our findings demonstrate that our semi-supervised approach outperforms existing state-of-the-art methods across several metrics, including cluster quality, convergence rate, accuracy, precision, recall, and the F1-score. Moreover, statistical significance testing via a t-test revealed that our method surpasses the performance of a fully supervised G-CNN. This study emphasizes the importance of semi-supervised learning in radio galaxy classification, where labeled data are still scarce, but the prospects for discovery are immense.

* 9 pages, 6 figures, accepted in INNS Deep Learning Innovations and Applications (INNS DLIA 2023) workshop, IJCNN 2023, to be published in Procedia Computer Science

Via

Access Paper or Ask Questions

Variational Stacked Local Attention Networks for Diverse Video Captioning

Jan 04, 2022

Tonmoay Deb, Akib Sadmanee, Kishor Kumar Bhaumik, Amin Ahsan Ali, M Ashraful Amin, A K M Mahbubur Rahman

Figure 1 for Variational Stacked Local Attention Networks for Diverse Video Captioning

Figure 2 for Variational Stacked Local Attention Networks for Diverse Video Captioning

Figure 3 for Variational Stacked Local Attention Networks for Diverse Video Captioning

Figure 4 for Variational Stacked Local Attention Networks for Diverse Video Captioning

Abstract:While describing Spatio-temporal events in natural language, video captioning models mostly rely on the encoder's latent visual representation. Recent progress on the encoder-decoder model attends encoder features mainly in linear interaction with the decoder. However, growing model complexity for visual data encourages more explicit feature interaction for fine-grained information, which is currently absent in the video captioning domain. Moreover, feature aggregations methods have been used to unveil richer visual representation, either by the concatenation or using a linear layer. Though feature sets for a video semantically overlap to some extent, these approaches result in objective mismatch and feature redundancy. In addition, diversity in captions is a fundamental component of expressing one event from several meaningful perspectives, currently missing in the temporal, i.e., video captioning domain. To this end, we propose Variational Stacked Local Attention Network (VSLAN), which exploits low-rank bilinear pooling for self-attentive feature interaction and stacking multiple video feature streams in a discount fashion. Each feature stack's learned attributes contribute to our proposed diversity encoding module, followed by the decoding query stage to facilitate end-to-end diverse and natural captions without any explicit supervision on attributes. We evaluate VSLAN on MSVD and MSR-VTT datasets in terms of syntax and diversity. The CIDEr score of VSLAN outperforms current off-the-shelf methods by $7.8\%$ on MSVD and $4.5\%$ on MSR-VTT, respectively. On the same datasets, VSLAN achieves competitive results in caption diversity metrics.

* To be published in Winter Conference on Applications of Computer Vision 2022

Via

Access Paper or Ask Questions

Unified Spatio-Temporal Modeling for Traffic Forecasting using Graph Neural Network

Apr 28, 2021

Amit Roy, Kashob Kumar Roy, Amin Ahsan Ali, M Ashraful Amin, A K M Mahbubur Rahman

Figure 1 for Unified Spatio-Temporal Modeling for Traffic Forecasting using Graph Neural Network

Figure 2 for Unified Spatio-Temporal Modeling for Traffic Forecasting using Graph Neural Network

Figure 3 for Unified Spatio-Temporal Modeling for Traffic Forecasting using Graph Neural Network

Figure 4 for Unified Spatio-Temporal Modeling for Traffic Forecasting using Graph Neural Network

Abstract:Research in deep learning models to forecast traffic intensities has gained great attention in recent years due to their capability to capture the complex spatio-temporal relationships within the traffic data. However, most state-of-the-art approaches have designed spatial-only (e.g. Graph Neural Networks) and temporal-only (e.g. Recurrent Neural Networks) modules to separately extract spatial and temporal features. However, we argue that it is less effective to extract the complex spatio-temporal relationship with such factorized modules. Besides, most existing works predict the traffic intensity of a particular time interval only based on the traffic data of the previous one hour of that day. And thereby ignores the repetitive daily/weekly pattern that may exist in the last hour of data. Therefore, we propose a Unified Spatio-Temporal Graph Convolution Network (USTGCN) for traffic forecasting that performs both spatial and temporal aggregation through direct information propagation across different timestamp nodes with the help of spectral graph convolution on a spatio-temporal graph. Furthermore, it captures historical daily patterns in previous days and current-day patterns in current-day traffic data. Finally, we validate our work's effectiveness through experimental analysis, which shows that our model USTGCN can outperform state-of-the-art performances in three popular benchmark datasets from the Performance Measurement System (PeMS). Moreover, the training time is reduced significantly with our proposed USTGCN model.

* Accepted for publication in International Joint Conference on Neural Networks (IJCNN-2021)

Via

Access Paper or Ask Questions

Node Embedding using Mutual Information and Self-Supervision based Bi-level Aggregation

Apr 27, 2021

Kashob Kumar Roy, Amit Roy, A K M Mahbubur Rahman, M Ashraful Amin, Amin Ahsan Ali

Figure 1 for Node Embedding using Mutual Information and Self-Supervision based Bi-level Aggregation

Figure 2 for Node Embedding using Mutual Information and Self-Supervision based Bi-level Aggregation

Figure 3 for Node Embedding using Mutual Information and Self-Supervision based Bi-level Aggregation

Figure 4 for Node Embedding using Mutual Information and Self-Supervision based Bi-level Aggregation

Abstract:Graph Neural Networks (GNNs) learn low dimensional representations of nodes by aggregating information from their neighborhood in graphs. However, traditional GNNs suffer from two fundamental shortcomings due to their local ($l$-hop neighborhood) aggregation scheme. First, not all nodes in the neighborhood carry relevant information for the target node. Since GNNs do not exclude noisy nodes in their neighborhood, irrelevant information gets aggregated, which reduces the quality of the representation. Second, traditional GNNs also fail to capture long-range non-local dependencies between nodes. To address these limitations, we exploit mutual information (MI) to define two types of neighborhood, 1) \textit{Local Neighborhood} where nodes are densely connected within a community and each node would share higher MI with its neighbors, and 2) \textit{Non-Local Neighborhood} where MI-based node clustering is introduced to assemble informative but graphically distant nodes in the same cluster. To generate node presentations, we combine the embeddings generated by bi-level aggregation - local aggregation to aggregate features from local neighborhoods to avoid noisy information and non-local aggregation to aggregate features from non-local neighborhoods. Furthermore, we leverage self-supervision learning to estimate MI with few labeled data. Finally, we show that our model significantly outperforms the state-of-the-art methods in a wide range of assortative and disassortative graphs.

* Accepted at IJCNN 2021

Via

Access Paper or Ask Questions

Structure-Aware Hierarchical Graph Pooling using Information Bottleneck

Apr 27, 2021

Kashob Kumar Roy, Amit Roy, A K M Mahbubur Rahman, M Ashraful Amin, Amin Ahsan Ali

Figure 1 for Structure-Aware Hierarchical Graph Pooling using Information Bottleneck

Figure 2 for Structure-Aware Hierarchical Graph Pooling using Information Bottleneck

Figure 3 for Structure-Aware Hierarchical Graph Pooling using Information Bottleneck

Figure 4 for Structure-Aware Hierarchical Graph Pooling using Information Bottleneck

Abstract:Graph pooling is an essential ingredient of Graph Neural Networks (GNNs) in graph classification and regression tasks. For these tasks, different pooling strategies have been proposed to generate a graph-level representation by downsampling and summarizing nodes' features in a graph. However, most existing pooling methods are unable to capture distinguishable structural information effectively. Besides, they are prone to adversarial attacks. In this work, we propose a novel pooling method named as {HIBPool} where we leverage the Information Bottleneck (IB) principle that optimally balances the expressiveness and robustness of a model to learn representations of input data. Furthermore, we introduce a novel structure-aware Discriminative Pooling Readout ({DiP-Readout}) function to capture the informative local subgraph structures in the graph. Finally, our experimental results show that our model significantly outperforms other state-of-art methods on several graph classification benchmarks and more resilient to feature-perturbation attack than existing pooling methods.

* Accepted at IJCNN 2021

Via

Access Paper or Ask Questions

SST-GNN: Simplified Spatio-temporal Traffic forecasting model using Graph Neural Network

Mar 31, 2021

Amit Roy, Kashob Kumar Roy, Amin Ahsan Ali, M Ashraful Amin, A K M Mahbubur Rahman

Figure 1 for SST-GNN: Simplified Spatio-temporal Traffic forecasting model using Graph Neural Network

Figure 2 for SST-GNN: Simplified Spatio-temporal Traffic forecasting model using Graph Neural Network

Figure 3 for SST-GNN: Simplified Spatio-temporal Traffic forecasting model using Graph Neural Network

Figure 4 for SST-GNN: Simplified Spatio-temporal Traffic forecasting model using Graph Neural Network

Abstract:To capture spatial relationships and temporal dynamics in traffic data, spatio-temporal models for traffic forecasting have drawn significant attention in recent years. Most of the recent works employed graph neural networks(GNN) with multiple layers to capture the spatial dependency. However, road junctions with different hop-distance can carry distinct traffic information which should be exploited separately but existing multi-layer GNNs are incompetent to discriminate between their impact. Again, to capture the temporal interrelationship, recurrent neural networks are common in state-of-the-art approaches that often fail to capture long-range dependencies. Furthermore, traffic data shows repeated patterns in a daily or weekly period which should be addressed explicitly. To address these limitations, we have designed a Simplified Spatio-temporal Traffic forecasting GNN(SST-GNN) that effectively encodes the spatial dependency by separately aggregating different neighborhood representations rather than with multiple layers and capture the temporal dependency with a simple yet effective weighted spatio-temporal aggregation mechanism. We capture the periodic traffic patterns by using a novel position encoding scheme with historical and current data in two different models. With extensive experimental analysis, we have shown that our model has significantly outperformed the state-of-the-art models on three real-world traffic datasets from the Performance Measurement System (PeMS).

* Accepted for publication in 25th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-2021)

Via

Access Paper or Ask Questions

Hierarchical Self Attention Based Autoencoder for Open-Set Human Activity Recognition

Mar 07, 2021

M Tanjid Hasan Tonmoy, Saif Mahmud, A K M Mahbubur Rahman, M Ashraful Amin, Amin Ahsan Ali

Figure 1 for Hierarchical Self Attention Based Autoencoder for Open-Set Human Activity Recognition

Figure 2 for Hierarchical Self Attention Based Autoencoder for Open-Set Human Activity Recognition

Figure 3 for Hierarchical Self Attention Based Autoencoder for Open-Set Human Activity Recognition

Figure 4 for Hierarchical Self Attention Based Autoencoder for Open-Set Human Activity Recognition

Abstract:Wearable sensor based human activity recognition is a challenging problem due to difficulty in modeling spatial and temporal dependencies of sensor signals. Recognition models in closed-set assumption are forced to yield members of known activity classes as prediction. However, activity recognition models can encounter an unseen activity due to body-worn sensor malfunction or disability of the subject performing the activities. This problem can be addressed through modeling solution according to the assumption of open-set recognition. Hence, the proposed self attention based approach combines data hierarchically from different sensor placements across time to classify closed-set activities and it obtains notable performance improvement over state-of-the-art models on five publicly available datasets. The decoder in this autoencoder architecture incorporates self-attention based feature representations from encoder to detect unseen activity classes in open-set recognition setting. Furthermore, attention maps generated by the hierarchical model demonstrate explainable selection of features in activity recognition. We conduct extensive leave one subject out validation experiments that indicate significantly improved robustness to noise and subject specific variability in body-worn sensor signals. The source code is available at: github.com/saif-mahmud/hierarchical-attention-HAR

* Accepted for publication in 25th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-2021)

Via

Access Paper or Ask Questions