Picture for Serena Yeung-Levy

Serena Yeung-Levy

Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration

Add code
Dec 17, 2024
Viaarxiv icon

Apollo: An Exploration of Video Understanding in Large Multimodal Models

Add code
Dec 13, 2024
Viaarxiv icon

DeforHMR: Vision Transformer with Deformable Cross-Attention for 3D Human Mesh Recovery

Add code
Nov 18, 2024
Viaarxiv icon

Motion Diffusion-Guided 3D Global HMR from a Dynamic Camera

Add code
Nov 15, 2024
Viaarxiv icon

Zero-shot Action Localization via the Confidence of Large Vision-Language Models

Add code
Oct 18, 2024
Viaarxiv icon

How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities

Add code
Sep 18, 2024
Figure 1 for How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities
Figure 2 for How to Build the Virtual Cell with Artificial Intelligence: Priorities and Opportunities
Viaarxiv icon

Continuous Perception Benchmark

Add code
Aug 15, 2024
Viaarxiv icon

Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision

Add code
Jul 08, 2024
Figure 1 for Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Figure 2 for Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Figure 3 for Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Figure 4 for Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Viaarxiv icon

μ-Bench: A Vision-Language Benchmark for Microscopy Understanding

Add code
Jul 01, 2024
Viaarxiv icon

Why are Visually-Grounded Language Models Bad at Image Classification?

Add code
May 28, 2024
Viaarxiv icon