Picture for Chengzhuo Tong

Chengzhuo Tong

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

Add code
Feb 02, 2026
Viaarxiv icon

CoF-T2I: Video Models as Pure Visual Reasoners for Text-to-Image Generation

Add code
Jan 15, 2026
Viaarxiv icon

Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

Add code
Dec 14, 2025
Figure 1 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Figure 2 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Figure 3 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Figure 4 for Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling
Viaarxiv icon

Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO

Add code
May 22, 2025
Viaarxiv icon

Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step

Add code
Jan 23, 2025
Figure 1 for Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step
Figure 2 for Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step
Figure 3 for Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step
Figure 4 for Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step
Viaarxiv icon

SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners

Add code
Aug 29, 2024
Figure 1 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Figure 2 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Figure 3 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Figure 4 for SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners
Viaarxiv icon

MAVIS: Mathematical Visual Instruction Tuning

Add code
Jul 11, 2024
Figure 1 for MAVIS: Mathematical Visual Instruction Tuning
Figure 2 for MAVIS: Mathematical Visual Instruction Tuning
Figure 3 for MAVIS: Mathematical Visual Instruction Tuning
Figure 4 for MAVIS: Mathematical Visual Instruction Tuning
Viaarxiv icon