Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fangchen Feng

Interactive Masked Image Modeling for Multimodal Object Detection in Remote Sensing

Sep 13, 2024

Minh-Duc Vu, Zuheng Ming, Fangchen Feng, Bissmella Bahaduri, Anissa Mokraoui

Abstract:Object detection in remote sensing imagery plays a vital role in various Earth observation applications. However, unlike object detection in natural scene images, this task is particularly challenging due to the abundance of small, often barely visible objects across diverse terrains. To address these challenges, multimodal learning can be used to integrate features from different data modalities, thereby improving detection accuracy. Nonetheless, the performance of multimodal learning is often constrained by the limited size of labeled datasets. In this paper, we propose to use Masked Image Modeling (MIM) as a pre-training technique, leveraging self-supervised learning on unlabeled data to enhance detection performance. However, conventional MIM such as MAE which uses masked tokens without any contextual information, struggles to capture the fine-grained details due to a lack of interactions with other parts of image. To address this, we propose a new interactive MIM method that can establish interactions between different tokens, which is particularly beneficial for object detection in remote sensing. The extensive ablation studies and evluation demonstrate the effectiveness of our approach.

Via

Access Paper or Ask Questions

Universal End-to-End Neural Network for Lossy Image Compression

Sep 10, 2024

Bouzid Arezki, Fangchen Feng, Anissa Mokraoui

Abstract:This paper presents variable bitrate lossy image compression using a VAE-based neural network. An adaptable image quality adjustment strategy is proposed. The key innovation involves adeptly adjusting the input scale exclusively during the inference process, resulting in an exceptionally efficient rate-distortion mechanism. Through extensive experimentation, across diverse VAE-based compression architectures (CNN, ViT) and training methodologies (MSE, SSIM), our approach exhibits remarkable universality. This success is attributed to the inherent generalization capacity of neural networks. Unlike methods that adjust model architecture or loss functions, our approach emphasizes simplicity, reducing computational complexity and memory requirements. The experiments not only highlight the effectiveness of our approach but also indicate its potential to drive advancements in variable-rate neural network lossy image compression methodologies.

* Accepted at EUSIPCO European conference on signal processing August 26-30 2024 in Lyon France

Via

Access Paper or Ask Questions

Convolutional Transformer-Based Image Compression

Sep 06, 2024

Bouzid Arezki, Fangchen Feng, Anissa Mokraoui

Abstract:In this paper, we present a novel transformer-based architecture for end-to-end image compression. Our architecture incorporates blocks that effectively capture local dependencies between tokens, eliminating the need for positional encoding by integrating convolutional operations within the multi-head attention mechanism. We demonstrate through experiments that our proposed framework surpasses state-of-the-art CNN-based architectures in terms of the trade-off between bit-rate and distortion and achieves comparable results to transformer-based methods while maintaining lower computational complexity.

* Published in: IEEE Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA) 2023 Poznan, Poland

Via

Access Paper or Ask Questions

Efficient Image_Compression Using Advanced State Space Models

Sep 04, 2024

Bouzid Arezki, Anissa Mokraoui, Fangchen Feng

Abstract:Transformers have led to learning-based image compression methods that outperform traditional approaches. However, these methods often suffer from high complexity, limiting their practical application. To address this, various strategies such as knowledge distillation and lightweight architectures have been explored, aiming to enhance efficiency without significantly sacrificing performance. This paper proposes a State Space Model-based Image Compression (SSMIC) architecture. This novel architecture balances performance and computational efficiency, making it suitable for real-world applications. Experimental evaluations confirm the effectiveness of our model in achieving a superior BD-rate while significantly reducing computational complexity and latency compared to competitive learning-based image compression methods.

* number of pages= 6 and number of figures = 4, accepted at MMSP conference 2024 usa indiana see link https://attend.ieee.org/mmsp-2024/ 2-4 octobre 2024

Via

Access Paper or Ask Questions

Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images

Oct 21, 2023

Bissmella Bahaduri, Zuheng Ming, Fangchen Feng, Anissa Mokraou

Figure 1 for Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images

Figure 2 for Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images

Figure 3 for Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images

Figure 4 for Multimodal Transformer Using Cross-Channel attention for Object Detection in Remote Sensing Images

Abstract:Object detection in Remote Sensing Images (RSI) is a critical task for numerous applications in Earth Observation (EO). Unlike general object detection, object detection in RSI has specific challenges: 1) the scarcity of labeled data in RSI compared to general object detection datasets, and 2) the small objects presented in a high-resolution image with a vast background. To address these challenges, we propose a multimodal transformer exploring multi-source remote sensing data for object detection. Instead of directly combining the multimodal input through a channel-wise concatenation, which ignores the heterogeneity of different modalities, we propose a cross-channel attention module. This module learns the relationship between different channels, enabling the construction of a coherent multimodal input by aligning the different modalities at the early stage. We also introduce a new architecture based on the Swin transformer that incorporates convolution layers in non-shifting blocks while maintaining fixed dimensions, allowing for the generation of fine-to-coarse representations with a favorable accuracy-computation trade-off. The extensive experiments prove the effectiveness of the proposed multimodal fusion module and architecture, demonstrating their applicability to multimodal aerial imagery.

* submitted to ICASSP2023

Via

Access Paper or Ask Questions

Context Normalization for Robust Image Classification

Mar 14, 2023

Bilal Faye, Mohamed-Djallel Dilmi, Hanane Azzag, Mustapha Lebbah, Fangchen Feng

Figure 1 for Context Normalization for Robust Image Classification

Figure 2 for Context Normalization for Robust Image Classification

Figure 3 for Context Normalization for Robust Image Classification

Figure 4 for Context Normalization for Robust Image Classification

Abstract:Normalization is a pre-processing step that converts the data into a more usable representation. As part of the deep neural networks (DNNs), the batch normalization (BN) technique uses normalization to address the problem of internal covariate shift. It can be packaged as general modules, which have been extensively integrated into various DNNs, to stabilize and accelerate training, presumably leading to improved generalization. However, the effect of BN is dependent on the mini-batch size and it does not take into account any groups or clusters that may exist in the dataset when estimating population statistics. This study proposes a new normalization technique, called context normalization, for image data. This approach adjusts the scaling of features based on the characteristics of each sample, which improves the model's convergence speed and performance by adapting the data values to the context of the target task. The effectiveness of context normalization is demonstrated on various datasets, and its performance is compared to other standard normalization techniques.

Via

Access Paper or Ask Questions