Picture for Cheng Da

Cheng Da

MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training

Add code
Jun 07, 2026
Viaarxiv icon

Steering Visual Generation in Unified Multimodal Models with Understanding Supervision

Add code
May 07, 2026
Viaarxiv icon

ResTok: Learning Hierarchical Residuals in 1D Visual Tokenizers for Autoregressive Image Generation

Add code
Jan 07, 2026
Viaarxiv icon

Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization

Add code
Feb 03, 2025
Figure 1 for Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
Figure 2 for Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
Figure 3 for Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
Figure 4 for Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
Viaarxiv icon

Vision Grid Transformer for Document Layout Analysis

Add code
Aug 29, 2023
Figure 1 for Vision Grid Transformer for Document Layout Analysis
Figure 2 for Vision Grid Transformer for Document Layout Analysis
Figure 3 for Vision Grid Transformer for Document Layout Analysis
Figure 4 for Vision Grid Transformer for Document Layout Analysis
Viaarxiv icon

LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition

Add code
Aug 24, 2023
Figure 1 for LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
Figure 2 for LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
Figure 3 for LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
Figure 4 for LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition
Viaarxiv icon

Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition

Add code
Jul 25, 2023
Figure 1 for Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition
Figure 2 for Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition
Figure 3 for Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition
Figure 4 for Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition
Viaarxiv icon

Levenshtein OCR

Add code
Sep 08, 2022
Figure 1 for Levenshtein OCR
Figure 2 for Levenshtein OCR
Figure 3 for Levenshtein OCR
Figure 4 for Levenshtein OCR
Viaarxiv icon

Multi-Granularity Prediction for Scene Text Recognition

Add code
Sep 08, 2022
Figure 1 for Multi-Granularity Prediction for Scene Text Recognition
Figure 2 for Multi-Granularity Prediction for Scene Text Recognition
Figure 3 for Multi-Granularity Prediction for Scene Text Recognition
Figure 4 for Multi-Granularity Prediction for Scene Text Recognition
Viaarxiv icon

Fashion Focus: Multi-modal Retrieval System for Video Commodity Localization in E-commerce

Add code
Feb 09, 2021
Figure 1 for Fashion Focus: Multi-modal Retrieval System for Video Commodity Localization in E-commerce
Figure 2 for Fashion Focus: Multi-modal Retrieval System for Video Commodity Localization in E-commerce
Figure 3 for Fashion Focus: Multi-modal Retrieval System for Video Commodity Localization in E-commerce
Viaarxiv icon