Picture for Peng Wang

Peng Wang

SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild

Add code
Jan 07, 2025
Figure 1 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 2 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 3 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Figure 4 for SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Viaarxiv icon

Understanding How Nonlinear Layers Create Linearly Separable Features for Low-Dimensional Data

Add code
Jan 04, 2025
Figure 1 for Understanding How Nonlinear Layers Create Linearly Separable Features for Low-Dimensional Data
Figure 2 for Understanding How Nonlinear Layers Create Linearly Separable Features for Low-Dimensional Data
Figure 3 for Understanding How Nonlinear Layers Create Linearly Separable Features for Low-Dimensional Data
Figure 4 for Understanding How Nonlinear Layers Create Linearly Separable Features for Low-Dimensional Data
Viaarxiv icon

Dual Diffusion for Unified Image Generation and Understanding

Add code
Dec 31, 2024
Viaarxiv icon

TL-Training: A Task-Feature-Based Framework for Training Large Language Models in Tool Use

Add code
Dec 20, 2024
Viaarxiv icon

Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

Add code
Dec 10, 2024
Figure 1 for Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning
Figure 2 for Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning
Figure 3 for Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning
Figure 4 for Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning
Viaarxiv icon

Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models

Add code
Dec 09, 2024
Figure 1 for Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models
Figure 2 for Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models
Figure 3 for Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models
Figure 4 for Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models
Viaarxiv icon

Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud

Add code
Dec 06, 2024
Figure 1 for Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud
Figure 2 for Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud
Figure 3 for Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud
Figure 4 for Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud
Viaarxiv icon

Sustainable Self-evolution Adversarial Training

Add code
Dec 03, 2024
Figure 1 for Sustainable Self-evolution Adversarial Training
Figure 2 for Sustainable Self-evolution Adversarial Training
Figure 3 for Sustainable Self-evolution Adversarial Training
Figure 4 for Sustainable Self-evolution Adversarial Training
Viaarxiv icon

CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy

Add code
Dec 03, 2024
Viaarxiv icon

GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction

Add code
Dec 02, 2024
Figure 1 for GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction
Figure 2 for GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction
Figure 3 for GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction
Figure 4 for GausSurf: Geometry-Guided 3D Gaussian Splatting for Surface Reconstruction
Viaarxiv icon