Picture for Qi Sun

Qi Sun

Robin: a Suite of Multi-Scale Vision-Language Models and the CHIRP Evaluation Benchmark

Add code
Jan 16, 2025
Viaarxiv icon

$\text{Transformer}^2$: Self-adaptive LLMs

Add code
Jan 14, 2025
Viaarxiv icon

Computer Vision-Driven Gesture Recognition: Toward Natural and Intuitive Human-Computer

Add code
Dec 24, 2024
Viaarxiv icon

Emma-X: An Embodied Multimodal Action Model with Grounded Chain of Thought and Look-ahead Spatial Reasoning

Add code
Dec 17, 2024
Viaarxiv icon

FovealNet: Advancing AI-Driven Gaze Tracking Solutions for Optimized Foveated Rendering System Performance in Virtual Reality

Add code
Dec 12, 2024
Figure 1 for FovealNet: Advancing AI-Driven Gaze Tracking Solutions for Optimized Foveated Rendering System Performance in Virtual Reality
Figure 2 for FovealNet: Advancing AI-Driven Gaze Tracking Solutions for Optimized Foveated Rendering System Performance in Virtual Reality
Figure 3 for FovealNet: Advancing AI-Driven Gaze Tracking Solutions for Optimized Foveated Rendering System Performance in Virtual Reality
Figure 4 for FovealNet: Advancing AI-Driven Gaze Tracking Solutions for Optimized Foveated Rendering System Performance in Virtual Reality
Viaarxiv icon

BudgetFusion: Perceptually-Guided Adaptive Diffusion Models

Add code
Dec 10, 2024
Viaarxiv icon

Detect an Object At Once without Fine-tuning

Add code
Nov 04, 2024
Figure 1 for Detect an Object At Once without Fine-tuning
Figure 2 for Detect an Object At Once without Fine-tuning
Figure 3 for Detect an Object At Once without Fine-tuning
Figure 4 for Detect an Object At Once without Fine-tuning
Viaarxiv icon

An Evolved Universal Transformer Memory

Add code
Oct 17, 2024
Viaarxiv icon

Low Latency Point Cloud Rendering with Learned Splatting

Add code
Sep 24, 2024
Viaarxiv icon

Can-Do! A Dataset and Neuro-Symbolic Grounded Framework for Embodied Planning with Large Multimodal Models

Add code
Sep 22, 2024
Viaarxiv icon