Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Min-Jun Kim

IRASNet: Improved Feature-Level Clutter Reduction for Domain Generalized SAR-ATR

Sep 25, 2024

Oh-Tae Jang, Hae-Kang Song, Min-Jun Kim, Kyung-Hwan Lee, Geon Lee, Sung-Ho Kim, Kyung-Tae Kim

Figure 1 for IRASNet: Improved Feature-Level Clutter Reduction for Domain Generalized SAR-ATR

Figure 2 for IRASNet: Improved Feature-Level Clutter Reduction for Domain Generalized SAR-ATR

Figure 3 for IRASNet: Improved Feature-Level Clutter Reduction for Domain Generalized SAR-ATR

Figure 4 for IRASNet: Improved Feature-Level Clutter Reduction for Domain Generalized SAR-ATR

Abstract:Recently, computer-aided design models and electromagnetic simulations have been used to augment synthetic aperture radar (SAR) data for deep learning. However, an automatic target recognition (ATR) model struggles with domain shift when using synthetic data because the model learns specific clutter patterns present in such data, which disturbs performance when applied to measured data with different clutter distributions. This study proposes a framework particularly designed for domain-generalized SAR-ATR called IRASNet, enabling effective feature-level clutter reduction and domain-invariant feature learning. First, we propose a clutter reduction module (CRM) that maximizes the signal-to-clutter ratio on feature maps. The module reduces the impact of clutter at the feature level while preserving target and shadow information, thereby improving ATR performance. Second, we integrate adversarial learning with CRM to extract clutter-reduced domain-invariant features. The integration bridges the gap between synthetic and measured datasets without requiring measured data during training. Third, we improve feature extraction from target and shadow regions by implementing a positional supervision task using mask ground truth encoding. The improvement enhances the ability of the model to discriminate between classes. Our proposed IRASNet presents new state-of-the-art public SAR datasets utilizing target and shadow information to achieve superior performance across various test conditions. IRASNet not only enhances generalization performance but also significantly improves feature-level clutter reduction, making it a valuable advancement in the field of radar image pattern recognition.

* 16 pages, 11 figures

Via

Access Paper or Ask Questions

Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024

Jun 10, 2024

Jinwoo Ahn, Junhyeok Park, Min-Jun Kim, Kang-Hyeon Kim, So-Yeong Sohn, Yun-Ji Lee, Du-Seong Chang, Yu-Jung Heo, Eun-Sol Kim

Figure 1 for Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024

Figure 2 for Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024

Figure 3 for Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024

Figure 4 for Solution for SMART-101 Challenge of CVPR Multi-modal Algorithmic Reasoning Task 2024

Abstract:In this paper, the solution of HYU MLLAB KT Team to the Multimodal Algorithmic Reasoning Task: SMART-101 CVPR 2024 Challenge is presented. Beyond conventional visual question-answering problems, the SMART-101 challenge aims to achieve human-level multimodal understanding by tackling complex visio-linguistic puzzles designed for children in the 6-8 age group. To solve this problem, we suggest two main ideas. First, to utilize the reasoning ability of a large-scale language model (LLM), the given visual cues (images) are grounded in the text modality. For this purpose, we generate highly detailed text captions that describe the context of the image and use these captions as input for the LLM. Second, due to the nature of puzzle images, which often contain various geometric visual patterns, we utilize an object detection algorithm to ensure these patterns are not overlooked in the captioning process. We employed the SAM algorithm, which can detect various-size objects, to capture the visual features of these geometric patterns and used this information as input for the LLM. Under the puzzle split configuration, we achieved an option selection accuracy Oacc of 29.5 on the test set and a weighted option selection accuracy (WOSA) of 27.1 on the challenge set.

Via

Access Paper or Ask Questions