Picture for Yanxin Long

Yanxin Long

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Add code
May 14, 2024
Viaarxiv icon

DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation

Add code
Mar 13, 2024
Viaarxiv icon

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

Add code
Mar 09, 2024
Viaarxiv icon

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining

Add code
Mar 15, 2023
Figure 1 for CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Figure 2 for CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Figure 3 for CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Figure 4 for CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Viaarxiv icon

NLIP: Noise-robust Language-Image Pre-training

Add code
Jan 04, 2023
Viaarxiv icon

P$^3$OVD: Fine-grained Visual-Text Prompt-Driven Self-Training for Open-Vocabulary Object Detection

Add code
Nov 02, 2022
Viaarxiv icon

Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation

Add code
Jul 23, 2021
Figure 1 for Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation
Figure 2 for Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation
Figure 3 for Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation
Figure 4 for Adversarial Reinforced Instruction Attacker for Robust Vision-Language Navigation
Viaarxiv icon