Picture for Yue Fan

Yue Fan

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters

Add code
Oct 30, 2024
Figure 1 for TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Figure 2 for TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Figure 3 for TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Figure 4 for TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Viaarxiv icon

Toward a Diffusion-Based Generalist for Dense Vision Tasks

Add code
Jun 29, 2024
Figure 1 for Toward a Diffusion-Based Generalist for Dense Vision Tasks
Figure 2 for Toward a Diffusion-Based Generalist for Dense Vision Tasks
Figure 3 for Toward a Diffusion-Based Generalist for Dense Vision Tasks
Figure 4 for Toward a Diffusion-Based Generalist for Dense Vision Tasks
Viaarxiv icon

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

Add code
Jun 27, 2024
Figure 1 for Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
Figure 2 for Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
Figure 3 for Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
Figure 4 for Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
Viaarxiv icon

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

Add code
Jun 12, 2024
Viaarxiv icon

Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting

Add code
Mar 22, 2024
Viaarxiv icon

VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding

Add code
Mar 18, 2024
Viaarxiv icon

Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey

Add code
Feb 08, 2024
Viaarxiv icon

Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA

Add code
Jan 29, 2024
Figure 1 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 2 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 3 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Figure 4 for Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
Viaarxiv icon

SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning

Add code
Nov 17, 2023
Figure 1 for SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning
Figure 2 for SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning
Figure 3 for SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning
Figure 4 for SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning
Viaarxiv icon

Evaluating Multi-Agent Coordination Abilities in Large Language Models

Add code
Oct 05, 2023
Viaarxiv icon