Picture for Feng Zheng

Feng Zheng

RoboReflect: Robotic Reflective Reasoning for Grasping Ambiguous-Condition Objects

Add code
Jan 16, 2025
Viaarxiv icon

Enhancing Uncertainty Modeling with Semantic Graph for Hallucination Detection

Add code
Jan 02, 2025
Figure 1 for Enhancing Uncertainty Modeling with Semantic Graph for Hallucination Detection
Figure 2 for Enhancing Uncertainty Modeling with Semantic Graph for Hallucination Detection
Figure 3 for Enhancing Uncertainty Modeling with Semantic Graph for Hallucination Detection
Figure 4 for Enhancing Uncertainty Modeling with Semantic Graph for Hallucination Detection
Viaarxiv icon

Image-based Multimodal Models as Intruders: Transferable Multimodal Attacks on Video-based MLLMs

Add code
Jan 02, 2025
Viaarxiv icon

SoftPatch+: Fully Unsupervised Anomaly Classification and Segmentation

Add code
Dec 30, 2024
Viaarxiv icon

A Self-guided Multimodal Approach to Enhancing Graph Representation Learning for Alzheimer's Diseases

Add code
Dec 09, 2024
Figure 1 for A Self-guided Multimodal Approach to Enhancing Graph Representation Learning for Alzheimer's Diseases
Figure 2 for A Self-guided Multimodal Approach to Enhancing Graph Representation Learning for Alzheimer's Diseases
Figure 3 for A Self-guided Multimodal Approach to Enhancing Graph Representation Learning for Alzheimer's Diseases
Figure 4 for A Self-guided Multimodal Approach to Enhancing Graph Representation Learning for Alzheimer's Diseases
Viaarxiv icon

InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction

Add code
Dec 08, 2024
Viaarxiv icon

Agri-LLaVA: Knowledge-Infused Large Multimodal Assistant on Agricultural Pests and Diseases

Add code
Dec 03, 2024
Viaarxiv icon

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

Add code
Nov 29, 2024
Figure 1 for LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Figure 2 for LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Figure 3 for LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Figure 4 for LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Viaarxiv icon

PlantCamo: Plant Camouflage Detection

Add code
Oct 23, 2024
Figure 1 for PlantCamo: Plant Camouflage Detection
Figure 2 for PlantCamo: Plant Camouflage Detection
Figure 3 for PlantCamo: Plant Camouflage Detection
Figure 4 for PlantCamo: Plant Camouflage Detection
Viaarxiv icon

MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection

Add code
Oct 12, 2024
Figure 1 for MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
Figure 2 for MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
Figure 3 for MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
Figure 4 for MMAD: The First-Ever Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
Viaarxiv icon