Picture for Cihang Xie

Cihang Xie

University of California, Santa Cruz

Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness

Add code
Jan 16, 2025
Viaarxiv icon

Generative Image Layer Decomposition with Visual Effects

Add code
Nov 26, 2024
Viaarxiv icon

CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions

Add code
Nov 25, 2024
Figure 1 for CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions
Figure 2 for CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions
Figure 3 for CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions
Figure 4 for CLIPS: An Enhanced CLIP Framework for Learning with Synthetic Captions
Viaarxiv icon

M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation

Add code
Nov 15, 2024
Figure 1 for M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation
Figure 2 for M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation
Figure 3 for M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation
Figure 4 for M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation
Viaarxiv icon

AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation

Add code
Oct 11, 2024
Figure 1 for AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Figure 2 for AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Figure 3 for AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Figure 4 for AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Viaarxiv icon

Causal Image Modeling for Efficient Visual Understanding

Add code
Oct 10, 2024
Viaarxiv icon

VHELM: A Holistic Evaluation of Vision Language Models

Add code
Oct 09, 2024
Figure 1 for VHELM: A Holistic Evaluation of Vision Language Models
Figure 2 for VHELM: A Holistic Evaluation of Vision Language Models
Figure 3 for VHELM: A Holistic Evaluation of Vision Language Models
Figure 4 for VHELM: A Holistic Evaluation of Vision Language Models
Viaarxiv icon

From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation

Add code
Sep 02, 2024
Figure 1 for From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation
Figure 2 for From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation
Figure 3 for From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation
Figure 4 for From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation
Viaarxiv icon

VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

Add code
Sep 02, 2024
Figure 1 for VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges
Figure 2 for VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges
Figure 3 for VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges
Figure 4 for VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges
Viaarxiv icon

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

Add code
Aug 06, 2024
Viaarxiv icon