Picture for Baining Guo

Baining Guo

LoLA: Long Horizon Latent Action Learning for General Robot Manipulation

Add code
Dec 23, 2025
Viaarxiv icon

VASA-3D: Lifelike Audio-Driven Gaussian Head Avatars from a Single Image

Add code
Dec 16, 2025
Viaarxiv icon

Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos

Add code
Oct 24, 2025
Figure 1 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 2 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 3 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Figure 4 for Scalable Vision-Language-Action Model Pretraining for Robotic Manipulation with Real-Life Human Activity Videos
Viaarxiv icon

Recycling Pretrained Checkpoints: Orthogonal Growth of Mixture-of-Experts for Efficient Large Language Model Pre-Training

Add code
Oct 09, 2025
Viaarxiv icon

Incorporating Pre-trained Diffusion Models in Solving the Schrödinger Bridge Problem

Add code
Aug 25, 2025
Figure 1 for Incorporating Pre-trained Diffusion Models in Solving the Schrödinger Bridge Problem
Figure 2 for Incorporating Pre-trained Diffusion Models in Solving the Schrödinger Bridge Problem
Figure 3 for Incorporating Pre-trained Diffusion Models in Solving the Schrödinger Bridge Problem
Figure 4 for Incorporating Pre-trained Diffusion Models in Solving the Schrödinger Bridge Problem
Viaarxiv icon

Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis

Add code
Jul 31, 2025
Viaarxiv icon

Phi-Ground Tech Report: Advancing Perception in GUI Grounding

Add code
Jul 31, 2025
Figure 1 for Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Figure 2 for Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Figure 3 for Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Figure 4 for Phi-Ground Tech Report: Advancing Perception in GUI Grounding
Viaarxiv icon

ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation

Add code
Feb 25, 2025
Viaarxiv icon

Diffusion Models without Classifier-free Guidance

Add code
Feb 17, 2025
Viaarxiv icon

Optimizing Large Language Model Training Using FP4 Quantization

Add code
Jan 28, 2025
Figure 1 for Optimizing Large Language Model Training Using FP4 Quantization
Figure 2 for Optimizing Large Language Model Training Using FP4 Quantization
Figure 3 for Optimizing Large Language Model Training Using FP4 Quantization
Figure 4 for Optimizing Large Language Model Training Using FP4 Quantization
Viaarxiv icon