Picture for Yanjie Wang

Yanjie Wang

Vision as LoRA

Add code
Mar 26, 2025
Viaarxiv icon

EVE: Towards End-to-End Video Subtitle Extraction with Vision-Language Models

Add code
Mar 06, 2025
Viaarxiv icon

Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM

Add code
Dec 12, 2024
Viaarxiv icon

Perceptual-Distortion Balanced Image Super-Resolution is a Multi-Objective Optimization Problem

Add code
Sep 05, 2024
Viaarxiv icon

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding

Add code
Jul 02, 2024
Viaarxiv icon

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering

Add code
May 20, 2024
Viaarxiv icon

Elysium: Exploring Object-level Perception in Videos via MLLM

Add code
Mar 29, 2024
Viaarxiv icon

PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition

Add code
Feb 15, 2024
Figure 1 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Figure 2 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Figure 3 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Figure 4 for PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Viaarxiv icon

GloTSFormer: Global Video Text Spotting Transformer

Add code
Jan 08, 2024
Viaarxiv icon

Rethinking Skip Connections in Encoder-decoder Networks for Monocular Depth Estimation

Add code
Aug 29, 2022
Figure 1 for Rethinking Skip Connections in Encoder-decoder Networks for Monocular Depth Estimation
Figure 2 for Rethinking Skip Connections in Encoder-decoder Networks for Monocular Depth Estimation
Figure 3 for Rethinking Skip Connections in Encoder-decoder Networks for Monocular Depth Estimation
Figure 4 for Rethinking Skip Connections in Encoder-decoder Networks for Monocular Depth Estimation
Viaarxiv icon