Picture for Chong Luo

Chong Luo

HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models

Add code
Mar 14, 2025
Viaarxiv icon

Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs

Add code
Mar 03, 2025
Viaarxiv icon

Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

Add code
Feb 20, 2025
Viaarxiv icon

FloVD: Optical Flow Meets Video Diffusion Model for Enhanced Camera-Controlled Video Synthesis

Add code
Feb 12, 2025
Viaarxiv icon

MageBench: Bridging Large Multimodal Models to Agents

Add code
Dec 05, 2024
Viaarxiv icon

StableAnimator: High-Quality Identity-Preserving Human Image Animation

Add code
Nov 26, 2024
Viaarxiv icon

LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation

Add code
Nov 26, 2024
Figure 1 for LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
Figure 2 for LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
Figure 3 for LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
Figure 4 for LLM2CLIP: Powerful Language Model Unlocks Richer Visual Representation
Viaarxiv icon

REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents

Add code
Nov 20, 2024
Viaarxiv icon

LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation

Add code
Nov 07, 2024
Figure 1 for LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Figure 2 for LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Figure 3 for LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Figure 4 for LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Viaarxiv icon

PF3plat: Pose-Free Feed-Forward 3D Gaussian Splatting

Add code
Oct 29, 2024
Viaarxiv icon