Picture for Rongtao Xu

Rongtao Xu

Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision

Add code
Apr 03, 2025
Viaarxiv icon

Structured Preference Optimization for Vision-Language Long-Horizon Task Planning

Add code
Feb 28, 2025
Viaarxiv icon

Constraint-Aware Zero-Shot Vision-Language Navigation in Continuous Environments

Add code
Dec 13, 2024
Viaarxiv icon

InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction

Add code
Dec 08, 2024
Viaarxiv icon

InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models

Add code
Nov 18, 2024
Figure 1 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Figure 2 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Figure 3 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Figure 4 for InstruGen: Automatic Instruction Generation for Vision-and-Language Navigation Via Large Multimodal Models
Viaarxiv icon

NeuroClips: Towards High-fidelity and Smooth fMRI-to-Video Reconstruction

Add code
Oct 28, 2024
Viaarxiv icon

SkinFormer: Learning Statistical Texture Representation with Transformer for Skin Lesion Segmentation

Add code
Sep 13, 2024
Figure 1 for SkinFormer: Learning Statistical Texture Representation with Transformer for Skin Lesion Segmentation
Figure 2 for SkinFormer: Learning Statistical Texture Representation with Transformer for Skin Lesion Segmentation
Figure 3 for SkinFormer: Learning Statistical Texture Representation with Transformer for Skin Lesion Segmentation
Figure 4 for SkinFormer: Learning Statistical Texture Representation with Transformer for Skin Lesion Segmentation
Viaarxiv icon

Generalization Boosted Adapter for Open-Vocabulary Segmentation

Add code
Sep 13, 2024
Figure 1 for Generalization Boosted Adapter for Open-Vocabulary Segmentation
Figure 2 for Generalization Boosted Adapter for Open-Vocabulary Segmentation
Figure 3 for Generalization Boosted Adapter for Open-Vocabulary Segmentation
Figure 4 for Generalization Boosted Adapter for Open-Vocabulary Segmentation
Viaarxiv icon

PSTNet: Enhanced Polyp Segmentation with Multi-scale Alignment and Frequency Domain Integration

Add code
Sep 13, 2024
Figure 1 for PSTNet: Enhanced Polyp Segmentation with Multi-scale Alignment and Frequency Domain Integration
Figure 2 for PSTNet: Enhanced Polyp Segmentation with Multi-scale Alignment and Frequency Domain Integration
Figure 3 for PSTNet: Enhanced Polyp Segmentation with Multi-scale Alignment and Frequency Domain Integration
Figure 4 for PSTNet: Enhanced Polyp Segmentation with Multi-scale Alignment and Frequency Domain Integration
Viaarxiv icon

HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection

Add code
Mar 16, 2024
Figure 1 for HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection
Figure 2 for HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection
Figure 3 for HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection
Figure 4 for HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection
Viaarxiv icon