Picture for Hongbin Zhou

Hongbin Zhou

Fine-grained Preference Optimization Improves Zero-shot Text-to-Speech

Add code
Feb 05, 2025
Viaarxiv icon

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training

Add code
Dec 16, 2024
Viaarxiv icon

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching

Add code
Dec 10, 2024
Viaarxiv icon

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations

Add code
Dec 10, 2024
Figure 1 for OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
Figure 2 for OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
Figure 3 for OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
Figure 4 for OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations
Viaarxiv icon

Chimera: Improving Generalist Model with Domain-Specific Experts

Add code
Dec 08, 2024
Viaarxiv icon

ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving

Add code
Nov 08, 2024
Figure 1 for ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
Figure 2 for ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
Figure 3 for ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
Figure 4 for ZOPP: A Framework of Zero-shot Offboard Panoptic Perception for Autonomous Driving
Viaarxiv icon

CTEFM-VC: Zero-Shot Voice Conversion Based on Content-Aware Timbre Ensemble Modeling and Flow Matching

Add code
Nov 04, 2024
Viaarxiv icon

Takin-ADA: Emotion Controllable Audio-Driven Animation with Canonical and Landmark Loss Optimization

Add code
Oct 18, 2024
Viaarxiv icon

Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling

Add code
Oct 02, 2024
Figure 1 for Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling
Figure 2 for Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling
Figure 3 for Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling
Figure 4 for Takin-VC: Zero-shot Voice Conversion via Jointly Hybrid Content and Memory-Augmented Context-Aware Timbre Modeling
Viaarxiv icon

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models

Add code
Jun 17, 2024
Figure 1 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 2 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 3 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Figure 4 for DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Viaarxiv icon