Picture for Chenyu Yang

Chenyu Yang

SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor

Add code
Dec 18, 2024
Figure 1 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 2 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 3 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 4 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Viaarxiv icon

PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models

Add code
Dec 12, 2024
Figure 1 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 2 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 3 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Figure 4 for PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Viaarxiv icon

VQ-ACE: Efficient Policy Search for Dexterous Robotic Manipulation via Action Chunking Embedding

Add code
Nov 05, 2024
Figure 1 for VQ-ACE: Efficient Policy Search for Dexterous Robotic Manipulation via Action Chunking Embedding
Figure 2 for VQ-ACE: Efficient Policy Search for Dexterous Robotic Manipulation via Action Chunking Embedding
Figure 3 for VQ-ACE: Efficient Policy Search for Dexterous Robotic Manipulation via Action Chunking Embedding
Figure 4 for VQ-ACE: Efficient Policy Search for Dexterous Robotic Manipulation via Action Chunking Embedding
Viaarxiv icon

Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning

Add code
Jun 11, 2024
Figure 1 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 2 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 3 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Figure 4 for Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Viaarxiv icon

CRAG -- Comprehensive RAG Benchmark

Add code
Jun 07, 2024
Figure 1 for CRAG -- Comprehensive RAG Benchmark
Figure 2 for CRAG -- Comprehensive RAG Benchmark
Figure 3 for CRAG -- Comprehensive RAG Benchmark
Figure 4 for CRAG -- Comprehensive RAG Benchmark
Viaarxiv icon

VerifAI: Verified Generative AI

Add code
Jul 06, 2023
Viaarxiv icon

Ghost in the Minecraft: Generally Capable Agents for Open-World Environments via Large Language Models with Text-based Knowledge and Memory

Add code
Jun 01, 2023
Viaarxiv icon

Seeing Through the Grass: Semantic Pointcloud Filter for Support Surface Learning

Add code
May 13, 2023
Viaarxiv icon

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

Add code
Nov 18, 2022
Viaarxiv icon

EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer

Add code
Jul 20, 2022
Figure 1 for EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer
Figure 2 for EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer
Figure 3 for EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer
Figure 4 for EleGANt: Exquisite and Locally Editable GAN for Makeup Transfer
Viaarxiv icon