Picture for Zhenfang Chen

Zhenfang Chen

Compositional Physical Reasoning of Objects and Events from Videos

Add code
Aug 02, 2024
Viaarxiv icon

FlexAttention for Efficient High-Resolution Vision-Language Models

Add code
Jul 29, 2024
Figure 1 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 2 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 3 for FlexAttention for Efficient High-Resolution Vision-Language Models
Figure 4 for FlexAttention for Efficient High-Resolution Vision-Language Models
Viaarxiv icon

SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

Add code
May 17, 2024
Figure 1 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Figure 2 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Figure 3 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Figure 4 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Viaarxiv icon

STAR: A Benchmark for Situated Reasoning in Real-World Videos

Add code
May 15, 2024
Viaarxiv icon

ContPhy: Continuum Physical Concept Learning and Reasoning from Videos

Add code
Feb 09, 2024
Viaarxiv icon

Improving Reinforcement Learning from Human Feedback with Efficient Reward Model Ensemble

Add code
Jan 30, 2024
Viaarxiv icon

GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs

Add code
Nov 08, 2023
Viaarxiv icon

CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding

Add code
Nov 06, 2023
Viaarxiv icon

Sparse Universal Transformer

Add code
Oct 11, 2023
Viaarxiv icon

TextPSG: Panoptic Scene Graph Generation from Textual Descriptions

Add code
Oct 10, 2023
Viaarxiv icon