Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Md Jahidul Islam

UStyle: Waterbody Style Transfer of Underwater Scenes by Depth-Guided Feature Synthesis

Mar 14, 2025

Md Abu Bakr Siddique, Junliang Liu, Piyush Singh, Md Jahidul Islam

Abstract:The concept of waterbody style transfer remains largely unexplored in the underwater imaging and vision literature. Traditional image style transfer (STx) methods primarily focus on artistic and photorealistic blending, often failing to preserve object and scene geometry in images captured in high-scattering mediums such as underwater. The wavelength-dependent nonlinear attenuation and depth-dependent backscattering artifacts further complicate learning underwater image STx from unpaired data. This paper introduces UStyle, the first data-driven learning framework for transferring waterbody styles across underwater images without requiring prior reference images or scene information. We propose a novel depth-aware whitening and coloring transform (DA-WCT) mechanism that integrates physics-based waterbody synthesis to ensure perceptually consistent stylization while preserving scene structure. To enhance style transfer quality, we incorporate carefully designed loss functions that guide UStyle to maintain colorfulness, lightness, structural integrity, and frequency-domain characteristics, as well as high-level content in VGG and CLIP (contrastive language-image pretraining) feature spaces. By addressing domain-specific challenges, UStyle provides a robust framework for no-reference underwater image STx, surpassing state-of-the-art (SOTA) methods that rely solely on end-to-end reconstruction loss. Furthermore, we introduce the UF7D dataset, a curated collection of high-resolution underwater images spanning seven distinct waterbody styles, establishing a benchmark to support future research in underwater image STx. The UStyle inference pipeline and UF7D dataset are released at: https://github.com/uf-robopi/UStyle.

Via

Access Paper or Ask Questions

ClipRover: Zero-shot Vision-Language Exploration and Target Discovery by Mobile Robots

Feb 12, 2025

Yuxuan Zhang, Adnan Abdullah, Sanjeev J. Koppal, Md Jahidul Islam

Abstract:Vision-language navigation (VLN) has emerged as a promising paradigm, enabling mobile robots to perform zero-shot inference and execute tasks without specific pre-programming. However, current systems often separate map exploration and path planning, with exploration relying on inefficient algorithms due to limited (partially observed) environmental information. In this paper, we present a novel navigation pipeline named ''ClipRover'' for simultaneous exploration and target discovery in unknown environments, leveraging the capabilities of a vision-language model named CLIP. Our approach requires only monocular vision and operates without any prior map or knowledge about the target. For comprehensive evaluations, we design the functional prototype of a UGV (unmanned ground vehicle) system named ''Rover Master'', a customized platform for general-purpose VLN tasks. We integrate and deploy the ClipRover pipeline on Rover Master to evaluate its throughput, obstacle avoidance capability, and trajectory performance across various real-world scenarios. Experimental results demonstrate that ClipRover consistently outperforms traditional map traversal algorithms and achieves performance comparable to path-planning methods that depend on prior map and target knowledge. Notably, ClipRover offers real-time active navigation without requiring pre-captured candidate images or pre-built node graphs, addressing key limitations of existing VLN pipelines.

* V1, 21 pages

Via

Access Paper or Ask Questions

Demonstrating CavePI: Autonomous Exploration of Underwater Caves by Semantic Guidance

Feb 07, 2025

Alankrit Gupta, Adnan Abdullah, Xianyao Li, Vaishnav Ramesh, Ioannis Rekleitis, Md Jahidul Islam

Abstract:Enabling autonomous robots to safely and efficiently navigate, explore, and map underwater caves is of significant importance to water resource management, hydrogeology, archaeology, and marine robotics. In this work, we demonstrate the system design and algorithmic integration of a visual servoing framework for semantically guided autonomous underwater cave exploration. We present the hardware and edge-AI design considerations to deploy this framework on a novel AUV (Autonomous Underwater Vehicle) named CavePI. The guided navigation is driven by a computationally light yet robust deep visual perception module, delivering a rich semantic understanding of the environment. Subsequently, a robust control mechanism enables CavePI to track the semantic guides and navigate within complex cave structures. We evaluate the system through field experiments in natural underwater caves and spring-water sites and further validate its ROS (Robot Operating System)-based digital twin in a simulation environment. Our results highlight how these integrated design choices facilitate reliable navigation under feature-deprived, GPS-denied, and low-visibility conditions.

* V1, 15 pages

Via

Access Paper or Ask Questions

Human-Machine Interfaces for Subsea Telerobotics: From Soda-straw to Natural Language Interactions

Dec 02, 2024

Adnan Abdullah, Ruo Chen, David Blow, Thanakon Uthai, Eric Jing Du, Md Jahidul Islam

Abstract:This review explores the evolution of human-machine interfaces (HMIs) for subsea telerobotics, tracing back the transition from traditional first-person "soda-straw" consoles (narrow field-of-view camera feed) to advanced interfaces powered by gesture recognition, virtual reality, and natural language models. First, we discuss various forms of subsea telerobotics applications, current state-of-the-art (SOTA) interface systems, and the challenges they face in robust underwater sensing, real-time estimation, and low-latency communication. Through this analysis, we highlight how advanced HMIs facilitate intuitive interactions between human operators and robots to overcome these challenges. A detailed review then categorizes and evaluates the cutting-edge HMI systems based on their offered features from both human perspectives (e.g., enhancing operator control and situational awareness) and machine perspectives (e.g., improving safety, mission accuracy, and task efficiency). Moreover, we examine the literature on bidirectional interaction and intelligent collaboration in terms of sensory feedback and intuitive control mechanisms for both physical and virtual interfaces. The paper concludes by identifying critical challenges, open research questions, and future directions, emphasizing the need for multidisciplinary collaboration in subsea telerobotics.

* 37 pages including 22 pages of main text

Via

Access Paper or Ask Questions

Optimized Vessel Segmentation: A Structure-Agnostic Approach with Small Vessel Enhancement and Morphological Correction

Nov 22, 2024

Dongning Song, Weijian Huang, Jiarun Liu, Md Jahidul Islam, Hao Yang, Shanshan Wang

Abstract:Accurate segmentation of blood vessels is essential for various clinical assessments and postoperative analyses. However, the inherent challenges of vascular imaging, such as sparsity, fine granularity, low contrast, data distribution variability, and the critical need for preserving topological structure, making generalized vessel segmentation particularly complex. While specialized segmentation methods have been developed for specific anatomical regions, their over-reliance on tailored models hinders broader applicability and generalization. General-purpose segmentation models introduced in medical imaging often fail to address critical vascular characteristics, including the connectivity of segmentation results. To overcome these limitations, we propose an optimized vessel segmentation framework: a structure-agnostic approach incorporating small vessel enhancement and morphological correction for multi-modality vessel segmentation. To train and validate this framework, we compiled a comprehensive multi-modality dataset spanning 17 datasets and benchmarked our model against six SAM-based methods and 17 expert models. The results demonstrate that our approach achieves superior segmentation accuracy, generalization, and a 34.6% improvement in connectivity, underscoring its clinical potential. An ablation study further validates the effectiveness of the proposed improvements. We will release the code and dataset at github following the publication of this work.

* 12 pages, 7 figurres, submitted to TIP

Via

Access Paper or Ask Questions

BlueME: Robust Underwater Robot-to-Robot Communication Using Compact Magnetoelectric Antennas

Nov 14, 2024

Mehron Talebi, Sultan Mahmud, Adam Khalifa, Md Jahidul Islam

Figure 1 for BlueME: Robust Underwater Robot-to-Robot Communication Using Compact Magnetoelectric Antennas

Figure 2 for BlueME: Robust Underwater Robot-to-Robot Communication Using Compact Magnetoelectric Antennas

Figure 3 for BlueME: Robust Underwater Robot-to-Robot Communication Using Compact Magnetoelectric Antennas

Figure 4 for BlueME: Robust Underwater Robot-to-Robot Communication Using Compact Magnetoelectric Antennas

Abstract:We present the design, development, and experimental validation of BlueME, a compact magnetoelectric (ME) antenna array system for underwater robot-to-robot communication. BlueME employs ME antennas operating at their natural mechanical resonance frequency to efficiently transmit and receive very-low-frequency (VLF) electromagnetic signals underwater. To evaluate its performance, we deployed BlueME on an autonomous surface vehicle (ASV) and a remotely operated vehicle (ROV) in open-water field trials. Our tests demonstrate that BlueME maintains reliable signal transmission at distances beyond 200 meters while consuming only 1 watt of power. Field trials show that the system operates effectively in challenging underwater conditions such as turbidity, obstacles, and multipath interference -- that generally affect acoustics and optics. Our analysis also examines the impact of complete submersion on system performance and identifies key deployment considerations. This work represents the first practical underwater deployment of ME antennas outside the laboratory and implements the largest VLF ME array system to date. BlueME demonstrates significant potential for marine robotics and automation in multi-robot cooperative systems and remote sensor networks.

Via

Access Paper or Ask Questions

AquaFuse: Waterbody Fusion for Physics Guided View Synthesis of Underwater Scenes

Nov 02, 2024

Md Abu Bakr Siddique, Jiayi Wu, Ioannis Rekleitis, Md Jahidul Islam

Abstract:We introduce the idea of AquaFuse, a physics-based method for synthesizing waterbody properties in underwater imagery. We formulate a closed-form solution for waterbody fusion that facilitates realistic data augmentation and geometrically consistent underwater scene rendering. AquaFuse leverages the physical characteristics of light propagation underwater to synthesize the waterbody from one scene to the object contents of another. Unlike data-driven style transfer, AquaFuse preserves the depth consistency and object geometry in an input scene. We validate this unique feature by comprehensive experiments over diverse underwater scenes. We find that the AquaFused images preserve over 94% depth consistency and 90-95% structural similarity of the input scenes. We also demonstrate that it generates accurate 3D view synthesis by preserving object geometry while adapting to the inherent waterbody fusion process. AquaFuse opens up a new research direction in data augmentation by geometry-preserving style transfer for underwater imaging and robot vision applications.

Via

Access Paper or Ask Questions

Word2Wave: Language Driven Mission Programming for Efficient Subsea Deployments of Marine Robots

Sep 27, 2024

Ruo Chen, David Blow, Adnan Abdullah, Md Jahidul Islam

Figure 1 for Word2Wave: Language Driven Mission Programming for Efficient Subsea Deployments of Marine Robots

Figure 2 for Word2Wave: Language Driven Mission Programming for Efficient Subsea Deployments of Marine Robots

Figure 3 for Word2Wave: Language Driven Mission Programming for Efficient Subsea Deployments of Marine Robots

Figure 4 for Word2Wave: Language Driven Mission Programming for Efficient Subsea Deployments of Marine Robots

Abstract:This paper explores the design and development of a language-based interface for dynamic mission programming of autonomous underwater vehicles (AUVs). The proposed 'Word2Wave' (W2W) framework enables interactive programming and parameter configuration of AUVs for remote subsea missions. The W2W framework includes: (i) a set of novel language rules and command structures for efficient language-to-mission mapping; (ii) a GPT-based prompt engineering module for training data generation; (iii) a small language model (SLM)-based sequence-to-sequence learning pipeline for mission command generation from human speech or text; and (iv) a novel user interface for 2D mission map visualization and human-machine interfacing. The proposed learning pipeline adapts an SLM named T5-Small that can learn language-to-mission mapping from processed language data effectively, providing robust and efficient performance. In addition to a benchmark evaluation with state-of-the-art, we conduct a user interaction study to demonstrate the effectiveness of W2W over commercial AUV programming interfaces. Across participants, W2W-based programming required less than 10% time for mission programming compared to traditional interfaces; it is deemed to be a simpler and more natural paradigm for subsea mission programming with a usability score of 76.25. W2W opens up promising future research opportunities on hands-free AUV mission programming for efficient subsea deployments.

Via

Access Paper or Ask Questions

LightViz: Autonomous Light-field Surveying and Mapping for Distributed Light Pollution Monitoring

Jul 31, 2024

Sheng-En Huang, Kazi Farha Farzana Suhi, Md Jahidul Islam

Abstract:Existing technologies for distributed light-field mapping and light pollution monitoring (LPM) rely on either remote satellite imagery or manual light surveying with single-point sensors such as SQMs (sky quality meters). These modalities offer low-resolution data that are not informative for dense light-field mapping, pollutant factor identification, or sustainable policy implementation. In this work, we propose LightViz -- an interactive software interface to survey, simulate, and visualize light pollution maps in real-time. As opposed to manual error-prone methods, LightViz (i) automates the light-field data collection and mapping processes; (ii) provides a platform to simulate various light sources and intensity attenuation models; and (iii) facilitates effective policy identification for conservation. To validate the end-to-end computational pipeline, we design a distributed light-field sensor suit, collect data on Florida coasts, and visualize the distributed light-field maps. In particular, we perform a case study at St. Johns County in Florida, which has a two-decade conservation program for lighting ordinances. The experimental results demonstrate that LightViz can offer high-resolution light-field mapping and provide interactive features to simulate and formulate community policies for light pollution mitigation. We also propose a mathematical formulation for light footprint evaluation, which we integrated into LightViz for targeted LPM in vulnerable communities.

* 10 pages, 11 figures

Via

Access Paper or Ask Questions

Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation

Jun 30, 2024

Adnan Abdullah, Ruo Chen, Ioannis Rekleitis, Md Jahidul Islam

Figure 1 for Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation

Figure 2 for Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation

Figure 3 for Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation

Figure 4 for Ego-to-Exo: Interfacing Third Person Visuals from Egocentric Views in Real-time for Improved ROV Teleoperation

Abstract:Underwater ROVs (Remotely Operated Vehicles) are unmanned submersible vehicles designed for exploring and operating in the depths of the ocean. Despite using high-end cameras, typical teleoperation engines based on first-person (egocentric) views limit a surface operator's ability to maneuver and navigate the ROV in complex deep-water missions. In this paper, we present an interactive teleoperation interface that (i) offers on-demand "third"-person (exocentric) visuals from past egocentric views, and (ii) facilitates enhanced peripheral information with augmented ROV pose in real-time. We achieve this by integrating a 3D geometry-based Ego-to-Exo view synthesis algorithm into a monocular SLAM system for accurate trajectory estimation. The proposed closed-form solution only uses past egocentric views from the ROV and a SLAM backbone for pose estimation, which makes it portable to existing ROV platforms. Unlike data-driven solutions, it is invariant to applications and waterbody-specific scenes. We validate the geometric accuracy of the proposed framework through extensive experiments of 2-DOF indoor navigation and 6-DOF underwater cave exploration in challenging low-light conditions. We demonstrate the benefits of dynamic Ego-to-Exo view generation and real-time pose rendering for remote ROV teleoperation by following navigation guides such as cavelines inside underwater caves. This new way of interactive ROV teleoperation opens up promising opportunities for future research in underwater telerobotics.

* V1, 8 pages

Via

Access Paper or Ask Questions