Picture for Cihang Xie

Cihang Xie

University of California, Santa Cruz

M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation

Add code
Nov 15, 2024
Figure 1 for M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation
Figure 2 for M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation
Figure 3 for M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation
Figure 4 for M-VAR: Decoupled Scale-wise Autoregressive Modeling for High-Quality Image Generation
Viaarxiv icon

AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation

Add code
Oct 11, 2024
Figure 1 for AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Figure 2 for AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Figure 3 for AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Figure 4 for AttnGCG: Enhancing Jailbreaking Attacks on LLMs with Attention Manipulation
Viaarxiv icon

Causal Image Modeling for Efficient Visual Understanding

Add code
Oct 10, 2024
Viaarxiv icon

VHELM: A Holistic Evaluation of Vision Language Models

Add code
Oct 09, 2024
Figure 1 for VHELM: A Holistic Evaluation of Vision Language Models
Figure 2 for VHELM: A Holistic Evaluation of Vision Language Models
Figure 3 for VHELM: A Holistic Evaluation of Vision Language Models
Figure 4 for VHELM: A Holistic Evaluation of Vision Language Models
Viaarxiv icon

From Pixels to Objects: A Hierarchical Approach for Part and Object Segmentation Using Local and Global Aggregation

Add code
Sep 02, 2024
Viaarxiv icon

VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges

Add code
Sep 02, 2024
Figure 1 for VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges
Figure 2 for VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges
Figure 3 for VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges
Figure 4 for VideoLLaMB: Long-context Video Understanding with Recurrent Memory Bridges
Viaarxiv icon

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine

Add code
Aug 06, 2024
Viaarxiv icon

VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models

Add code
Jun 24, 2024
Viaarxiv icon

What If We Recaption Billions of Web Images with LLaMA-3?

Add code
Jun 12, 2024
Viaarxiv icon

Autoregressive Pretraining with Mamba in Vision

Add code
Jun 11, 2024
Figure 1 for Autoregressive Pretraining with Mamba in Vision
Figure 2 for Autoregressive Pretraining with Mamba in Vision
Figure 3 for Autoregressive Pretraining with Mamba in Vision
Figure 4 for Autoregressive Pretraining with Mamba in Vision
Viaarxiv icon