Picture for Benlin Liu

Benlin Liu

Counting Circuits: Mechanistic Interpretability of Visual Reasoning in Large Vision-Language Models

Add code
Mar 19, 2026
Viaarxiv icon

Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation

Add code
Dec 12, 2025
Figure 1 for Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation
Figure 2 for Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation
Figure 3 for Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation
Figure 4 for Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation
Viaarxiv icon

Visual Representations inside the Language Model

Add code
Oct 06, 2025
Figure 1 for Visual Representations inside the Language Model
Figure 2 for Visual Representations inside the Language Model
Figure 3 for Visual Representations inside the Language Model
Figure 4 for Visual Representations inside the Language Model
Viaarxiv icon

LiveVQA: Live Visual Knowledge Seeking

Add code
Apr 07, 2025
Figure 1 for LiveVQA: Live Visual Knowledge Seeking
Figure 2 for LiveVQA: Live Visual Knowledge Seeking
Figure 3 for LiveVQA: Live Visual Knowledge Seeking
Figure 4 for LiveVQA: Live Visual Knowledge Seeking
Viaarxiv icon

Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment

Add code
Nov 26, 2024
Viaarxiv icon

Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model

Add code
Aug 01, 2024
Figure 1 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Figure 2 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Figure 3 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Figure 4 for Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model
Viaarxiv icon

Efficient Inference of Vision Instruction-Following Models with Elastic Cache

Add code
Jul 25, 2024
Viaarxiv icon

Matching-based Data Valuation for Generative Model

Add code
Apr 21, 2023
Viaarxiv icon

TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering

Add code
Mar 28, 2023
Viaarxiv icon

Unleashing Text-to-Image Diffusion Models for Visual Perception

Add code
Mar 03, 2023
Viaarxiv icon