Picture for Xiaohan Wang

Xiaohan Wang

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Add code
Dec 13, 2024
Viaarxiv icon

Targeted Learning for Variable Importance

Add code
Nov 04, 2024
Viaarxiv icon

Zero-shot Action Localization via the Confidence of Large Vision-Language Models

Add code
Oct 18, 2024
Viaarxiv icon

Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps

Add code
Oct 14, 2024
Viaarxiv icon

RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment

Add code
Aug 22, 2024
Viaarxiv icon

MetaTool: Facilitating Large Language Models to Master Tools with Meta-task Augmentation

Add code
Jul 15, 2024
Viaarxiv icon

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Add code
Jul 08, 2024
Figure 1 for Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Figure 2 for Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Figure 3 for Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Figure 4 for Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Viaarxiv icon

Why are Visually-Grounded Language Models Bad at Image Classification?

Add code
May 28, 2024
Viaarxiv icon

Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models

Add code
Mar 19, 2024
Viaarxiv icon

VideoAgent: Long-form Video Understanding with Large Language Model as Agent

Add code
Mar 15, 2024
Viaarxiv icon