Picture for Jiahao Wang

Jiahao Wang

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Add code
Dec 12, 2024
Figure 1 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 2 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 3 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 4 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Viaarxiv icon

GenEx: Generating an Explorable World

Add code
Dec 12, 2024
Viaarxiv icon

VP-MEL: Visual Prompts Guided Multimodal Entity Linking

Add code
Dec 10, 2024
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Viaarxiv icon

Towards Precise Scaling Laws for Video Diffusion Transformers

Add code
Nov 25, 2024
Viaarxiv icon

TableTime: Reformulating Time Series Classification as Zero-Shot Table Understanding via Large Language Models

Add code
Nov 24, 2024
Viaarxiv icon

DMQR-RAG: Diverse Multi-Query Rewriting for RAG

Add code
Nov 20, 2024
Viaarxiv icon

The Oxford Spires Dataset: Benchmarking Large-Scale LiDAR-Visual Localisation, Reconstruction and Radiance Field Methods

Add code
Nov 15, 2024
Viaarxiv icon

LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models

Add code
Nov 11, 2024
Viaarxiv icon

Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing

Add code
Oct 24, 2024
Viaarxiv icon