Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chengcheng Chen

Bring Remote Sensing Object Detect Into Nature Language Model: Using SFT Method

Mar 11, 2025

Fei Wang, Chengcheng Chen, Hongyu Chen, Yugang Chang, Weiming Zeng

Figure 1 for Bring Remote Sensing Object Detect Into Nature Language Model: Using SFT Method

Figure 2 for Bring Remote Sensing Object Detect Into Nature Language Model: Using SFT Method

Figure 3 for Bring Remote Sensing Object Detect Into Nature Language Model: Using SFT Method

Figure 4 for Bring Remote Sensing Object Detect Into Nature Language Model: Using SFT Method

Abstract:Recently, large language models (LLMs) and visionlanguage models (VLMs) have achieved significant success, demonstrating remarkable capabilities in understanding various images and videos, particularly in classification and detection tasks. However, due to the substantial differences between remote sensing images and conventional optical images, these models face considerable challenges in comprehension, especially in detection tasks. Directly prompting VLMs with detection instructions often fails to yield satisfactory results. To address this issue, this letter explores the application of VLMs for object detection in remote sensing images. Specifically, we utilize publicly available remote sensing object detection datasets, including SSDD, HRSID, and NWPU-VHR-10, to convert traditional annotation information into natural language, thereby constructing an instruction-tuning (SFT) dataset for VLM training. We then evaluate the detection performance of different fine-tuning strategies for VLMs and obtain optimized model weights for object detection in remote sensing images. Finally, we assess the model's prior knowledge capabilities through natural language queries.Experimental results demonstrate that, without modifying the model architecture, remote sensing object detection can be effectively achieved using natural language alone. Additionally, the model exhibits the ability to perform certain vision question answering (VQA) tasks. Our dataset and relevant code will be released soon.

Via

Access Paper or Ask Questions

MID: A Comprehensive Shore-Based Dataset for Multi-Scale Dense Ship Occlusion and Interaction Scenarios

Dec 10, 2024

Yugang Chang, Hongyu Chen, Fei Wang, Chengcheng Chen, Weiming Zeng

Figure 1 for MID: A Comprehensive Shore-Based Dataset for Multi-Scale Dense Ship Occlusion and Interaction Scenarios

Figure 2 for MID: A Comprehensive Shore-Based Dataset for Multi-Scale Dense Ship Occlusion and Interaction Scenarios

Figure 3 for MID: A Comprehensive Shore-Based Dataset for Multi-Scale Dense Ship Occlusion and Interaction Scenarios

Figure 4 for MID: A Comprehensive Shore-Based Dataset for Multi-Scale Dense Ship Occlusion and Interaction Scenarios

Abstract:This paper introduces the Maritime Ship Navigation Behavior Dataset (MID), designed to address challenges in ship detection within complex maritime environments using Oriented Bounding Boxes (OBB). MID contains 5,673 images with 135,884 finely annotated target instances, supporting both supervised and semi-supervised learning. It features diverse maritime scenarios such as ship encounters under varying weather, docking maneuvers, small target clustering, and partial occlusions, filling critical gaps in datasets like HRSID, SSDD, and NWPU-10. MID's images are sourced from high-definition video clips of real-world navigation across 43 water areas, with varied weather and lighting conditions (e.g., rain, fog). Manually curated annotations enhance the dataset's variety, ensuring its applicability to real-world demands in busy ports and dense maritime regions. This diversity equips models trained on MID to better handle complex, dynamic environments, supporting advancements in maritime situational awareness. To validate MID's utility, we evaluated 10 detection algorithms, providing an in-depth analysis of the dataset, detection results from various models, and a comparative study of baseline algorithms, with a focus on handling occlusions and dense target clusters. The results highlight MID's potential to drive innovation in intelligent maritime traffic monitoring and autonomous navigation systems. The dataset will be made publicly available at https://github.com/VirtualNew/MID_DataSet.

Via

Access Paper or Ask Questions

A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning

Nov 03, 2024

Fei Wang, Chengcheng Chen, Hongyu Chen, Yugang Chang, Weiming Zeng

Figure 1 for A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning

Figure 2 for A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning

Figure 3 for A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning

Figure 4 for A Visual Question Answering Method for SAR Ship: Breaking the Requirement for Multimodal Dataset Construction and Model Fine-Tuning

Abstract:Current visual question answering (VQA) tasks often require constructing multimodal datasets and fine-tuning visual language models, which demands significant time and resources. This has greatly hindered the application of VQA to downstream tasks, such as ship information analysis based on Synthetic Aperture Radar (SAR) imagery. To address this challenge, this letter proposes a novel VQA approach that integrates object detection networks with visual language models, specifically designed for analyzing ships in SAR images. This integration aims to enhance the capabilities of VQA systems, focusing on aspects such as ship location, density, and size analysis, as well as risk behavior detection. Initially, we conducted baseline experiments using YOLO networks on two representative SAR ship detection datasets, SSDD and HRSID, to assess each model's performance in terms of detection accuracy. Based on these results, we selected the optimal model, YOLOv8n, as the most suitable detection network for this task. Subsequently, leveraging the vision-language model Qwen2-VL, we designed and implemented a VQA task specifically for SAR scenes. This task employs the ship location and size information output by the detection network to generate multi-turn dialogues and scene descriptions for SAR imagery. Experimental results indicate that this method not only enables fundamental SAR scene question-answering without the need for additional datasets or fine-tuning but also dynamically adapts to complex, multi-turn dialogue requirements, demonstrating robust semantic understanding and adaptability.

Via

Access Paper or Ask Questions

RSNet: A Light Framework for The Detection of Multi-scale Remote Sensing Targets

Oct 30, 2024

Hongyu Chen, Chengcheng Chen, Fei Wang, Yuhu Shi, Weiming Zeng

Figure 1 for RSNet: A Light Framework for The Detection of Multi-scale Remote Sensing Targets

Figure 2 for RSNet: A Light Framework for The Detection of Multi-scale Remote Sensing Targets

Figure 3 for RSNet: A Light Framework for The Detection of Multi-scale Remote Sensing Targets

Figure 4 for RSNet: A Light Framework for The Detection of Multi-scale Remote Sensing Targets

Abstract:Recent developments in synthetic aperture radar (SAR) ship detection have seen deep learning techniques achieve remarkable progress in accuracy and speed. However, the detection of small targets against complex backgrounds remains a significant challenge. To tackle these difficulties, this letter presents RSNet, a lightweight framework aimed at enhancing ship detection capabilities in SAR imagery. RSNet features the Waveletpool-ContextGuided (WCG) backbone for enhanced accuracy with fewer parameters, and the Waveletpool-StarFusion (WSF) head for efficient parameter reduction. Additionally, a Lightweight-Shared (LS) module minimizes the detection head's parameter load. Experiments on the SAR Ship Detection Dataset (SSDD) and High-Resolution SAR Image Dataset (HRSID) demonstrate that RSNet achieves a strong balance between lightweight design and detection performance, surpassing many state-of-the-art detectors, reaching 72.5\% and 67.6\% in \textbf{\(\mathbf{mAP_{.50:95}}\) }respectively with 1.49M parameters. Our code will be released soon.

Via

Access Paper or Ask Questions