Picture for Shen Yan

Shen Yan

MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency

Add code
Feb 13, 2025
Viaarxiv icon

SparseFocus: Learning-based One-shot Autofocus for Microscopy with Sparse Content

Add code
Feb 10, 2025
Figure 1 for SparseFocus: Learning-based One-shot Autofocus for Microscopy with Sparse Content
Figure 2 for SparseFocus: Learning-based One-shot Autofocus for Microscopy with Sparse Content
Figure 3 for SparseFocus: Learning-based One-shot Autofocus for Microscopy with Sparse Content
Figure 4 for SparseFocus: Learning-based One-shot Autofocus for Microscopy with Sparse Content
Viaarxiv icon

CompCap: Improving Multimodal Large Language Models with Composite Captions

Add code
Dec 06, 2024
Figure 1 for CompCap: Improving Multimodal Large Language Models with Composite Captions
Figure 2 for CompCap: Improving Multimodal Large Language Models with Composite Captions
Figure 3 for CompCap: Improving Multimodal Large Language Models with Composite Captions
Figure 4 for CompCap: Improving Multimodal Large Language Models with Composite Captions
Viaarxiv icon

Autoregressive Models in Vision: A Survey

Add code
Nov 08, 2024
Figure 1 for Autoregressive Models in Vision: A Survey
Figure 2 for Autoregressive Models in Vision: A Survey
Figure 3 for Autoregressive Models in Vision: A Survey
Figure 4 for Autoregressive Models in Vision: A Survey
Viaarxiv icon

LoD-Loc: Aerial Visual Localization using LoD 3D Map with Neural Wireframe Alignment

Add code
Oct 16, 2024
Viaarxiv icon

Streaming Dense Video Captioning

Add code
Apr 01, 2024
Viaarxiv icon

VideoPrism: A Foundational Visual Encoder for Video Understanding

Add code
Feb 20, 2024
Figure 1 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 2 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 3 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Figure 4 for VideoPrism: A Foundational Visual Encoder for Video Understanding
Viaarxiv icon

PaLM2-VAdapter: Progressively Aligned Language Model Makes a Strong Vision-language Adapter

Add code
Feb 16, 2024
Viaarxiv icon

UAVD4L: A Large-Scale Dataset for UAV 6-DoF Localization

Add code
Jan 11, 2024
Viaarxiv icon

Efficient Large Language Models: A Survey

Add code
Dec 23, 2023
Figure 1 for Efficient Large Language Models: A Survey
Figure 2 for Efficient Large Language Models: A Survey
Figure 3 for Efficient Large Language Models: A Survey
Figure 4 for Efficient Large Language Models: A Survey
Viaarxiv icon