Picture for Yicong Li

Yicong Li

VA-$π$: Variational Policy Alignment for Pixel-Aware Autoregressive Generation

Add code
Dec 22, 2025
Viaarxiv icon

AnchorDS: Anchoring Dynamic Sources for Semantically Consistent Text-to-3D Generation

Add code
Nov 12, 2025
Viaarxiv icon

REAR: Rethinking Visual Autoregressive Models via Generator-Tokenizer Consistency Regularization

Add code
Oct 06, 2025
Viaarxiv icon

VINCIE: Unlocking In-context Image Editing from Video

Add code
Jun 12, 2025
Viaarxiv icon

MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning

Add code
May 15, 2025
Figure 1 for MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Figure 2 for MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Figure 3 for MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Figure 4 for MSCI: Addressing CLIP's Inherent Limitations for Compositional Zero-Shot Learning
Viaarxiv icon

Visual Intention Grounding for Egocentric Assistants

Add code
Apr 18, 2025
Viaarxiv icon

Predicting the Road Ahead: A Knowledge Graph based Foundation Model for Scene Understanding in Autonomous Driving

Add code
Mar 24, 2025
Viaarxiv icon

Factor Graph-based Interpretable Neural Networks

Add code
Feb 20, 2025
Viaarxiv icon

EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering

Add code
Feb 11, 2025
Viaarxiv icon

Understanding Long Videos via LLM-Powered Entity Relation Graphs

Add code
Jan 27, 2025
Figure 1 for Understanding Long Videos via LLM-Powered Entity Relation Graphs
Figure 2 for Understanding Long Videos via LLM-Powered Entity Relation Graphs
Figure 3 for Understanding Long Videos via LLM-Powered Entity Relation Graphs
Figure 4 for Understanding Long Videos via LLM-Powered Entity Relation Graphs
Viaarxiv icon