Picture for Yuexian Zou

Yuexian Zou

Do we really have to filter out random noise in pre-training data for language models?

Add code
Feb 10, 2025
Viaarxiv icon

VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model

Add code
Jan 21, 2025
Viaarxiv icon

Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding

Add code
Jan 19, 2025
Viaarxiv icon

VASparse: Towards Efficient Visual Hallucination Mitigation for Large Vision-Language Model via Visual-Aware Sparsification

Add code
Jan 11, 2025
Viaarxiv icon

CAR: Controllable Autoregressive Modeling for Visual Generation

Add code
Oct 07, 2024
Viaarxiv icon

DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval

Add code
Sep 16, 2024
Viaarxiv icon

Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation

Add code
Sep 14, 2024
Viaarxiv icon

Image Conductor: Precision Control for Interactive Video Synthesis

Add code
Jun 21, 2024
Viaarxiv icon

On the Worst Prompt Performance of Large Language Models

Add code
Jun 08, 2024
Figure 1 for On the Worst Prompt Performance of Large Language Models
Figure 2 for On the Worst Prompt Performance of Large Language Models
Figure 3 for On the Worst Prompt Performance of Large Language Models
Figure 4 for On the Worst Prompt Performance of Large Language Models
Viaarxiv icon

Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning

Add code
May 31, 2024
Viaarxiv icon