Picture for Ziyong Feng

Ziyong Feng

MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space

Add code
Mar 19, 2025
Viaarxiv icon

Mocap-2-to-3: Lifting 2D Diffusion-Based Pretrained Models for 3D Motion Capture

Add code
Mar 05, 2025
Viaarxiv icon

RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm

Add code
Feb 18, 2025
Viaarxiv icon

ORID: Organ-Regional Information Driven Framework for Radiology Report Generation

Add code
Nov 20, 2024
Viaarxiv icon

Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension

Add code
Oct 18, 2024
Figure 1 for Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Figure 2 for Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Figure 3 for Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Figure 4 for Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Viaarxiv icon

CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination

Add code
Aug 18, 2024
Figure 1 for CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
Figure 2 for CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
Figure 3 for CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
Figure 4 for CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
Viaarxiv icon

VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling

Add code
Aug 02, 2024
Viaarxiv icon

Multi-label Cluster Discrimination for Visual Representation Learning

Add code
Jul 24, 2024
Figure 1 for Multi-label Cluster Discrimination for Visual Representation Learning
Figure 2 for Multi-label Cluster Discrimination for Visual Representation Learning
Figure 3 for Multi-label Cluster Discrimination for Visual Representation Learning
Figure 4 for Multi-label Cluster Discrimination for Visual Representation Learning
Viaarxiv icon

High-Fidelity Facial Albedo Estimation via Texture Quantization

Add code
Jun 19, 2024
Figure 1 for High-Fidelity Facial Albedo Estimation via Texture Quantization
Figure 2 for High-Fidelity Facial Albedo Estimation via Texture Quantization
Figure 3 for High-Fidelity Facial Albedo Estimation via Texture Quantization
Figure 4 for High-Fidelity Facial Albedo Estimation via Texture Quantization
Viaarxiv icon

RWKV-CLIP: A Robust Vision-Language Representation Learner

Add code
Jun 11, 2024
Viaarxiv icon