Picture for Yun Xing

Yun Xing

MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents

Add code
Dec 11, 2024
Figure 1 for MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents
Figure 2 for MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents
Figure 3 for MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents
Figure 4 for MAGIC: Mastering Physical Adversarial Generation in Context through Collaborative LLM Agents
Viaarxiv icon

SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments

Add code
Nov 28, 2024
Figure 1 for SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
Figure 2 for SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
Figure 3 for SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
Figure 4 for SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
Viaarxiv icon

Mitigating Object Hallucination via Concentric Causal Attention

Add code
Oct 21, 2024
Viaarxiv icon

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Add code
Oct 16, 2024
Figure 1 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Figure 2 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Figure 3 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Figure 4 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Viaarxiv icon

Segment Anything with Multiple Modalities

Add code
Aug 17, 2024
Figure 1 for Segment Anything with Multiple Modalities
Figure 2 for Segment Anything with Multiple Modalities
Figure 3 for Segment Anything with Multiple Modalities
Figure 4 for Segment Anything with Multiple Modalities
Viaarxiv icon

HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras

Add code
Apr 03, 2024
Viaarxiv icon

CAT-SAM: Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model

Add code
Feb 06, 2024
Figure 1 for CAT-SAM: Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model
Figure 2 for CAT-SAM: Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model
Figure 3 for CAT-SAM: Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model
Figure 4 for CAT-SAM: Conditional Tuning Network for Few-Shot Adaptation of Segmentation Anything Model
Viaarxiv icon

Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

Add code
Jan 16, 2024
Figure 1 for Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining
Figure 2 for Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining
Figure 3 for Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining
Figure 4 for Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining
Viaarxiv icon

Bridging Semantic Gaps for Language-Supervised Semantic Segmentation

Add code
Sep 27, 2023
Viaarxiv icon

Domain Adaptive Video Segmentation via Temporal Pseudo Supervision

Add code
Jul 06, 2022
Figure 1 for Domain Adaptive Video Segmentation via Temporal Pseudo Supervision
Figure 2 for Domain Adaptive Video Segmentation via Temporal Pseudo Supervision
Figure 3 for Domain Adaptive Video Segmentation via Temporal Pseudo Supervision
Figure 4 for Domain Adaptive Video Segmentation via Temporal Pseudo Supervision
Viaarxiv icon