Picture for Zongyuan Ge

Zongyuan Ge

ScalingNoise: Scaling Inference-Time Search for Generating Infinite Videos

Add code
Mar 20, 2025
Viaarxiv icon

Derm1M: A Million-scale Vision-Language Dataset Aligned with Clinical Ontology Knowledge for Dermatology

Add code
Mar 19, 2025
Viaarxiv icon

MSWAL: 3D Multi-class Segmentation of Whole Abdominal Lesions Dataset

Add code
Mar 17, 2025
Viaarxiv icon

MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models

Add code
Mar 11, 2025
Viaarxiv icon

Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation

Add code
Mar 07, 2025
Viaarxiv icon

Interpretable Few-Shot Retinal Disease Diagnosis with Concept-Guided Prompting of Vision-Language Models

Add code
Mar 04, 2025
Viaarxiv icon

Accelerating Vision-Language-Action Model Integrated with Action Chunking via Parallel Decoding

Add code
Mar 04, 2025
Viaarxiv icon

MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

Add code
Feb 17, 2025
Viaarxiv icon

CodeBrain: Impute Any Brain MRI via Instance-specific Scalar-quantized Codes

Add code
Jan 30, 2025
Viaarxiv icon

Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP

Add code
Dec 27, 2024
Figure 1 for Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP
Figure 2 for Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP
Figure 3 for Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP
Figure 4 for Toward Modality Gap: Vision Prototype Learning for Weakly-supervised Semantic Segmentation with CLIP
Viaarxiv icon