Picture for Zhi-Qi Cheng

Zhi-Qi Cheng

HA-VLN: A Benchmark for Human-Aware Navigation in Discrete-Continuous Environments with Dynamic Multi-Human Interactions, Real-World Validation, and an Open Leaderboard

Add code
Mar 18, 2025
Viaarxiv icon

MaxSup: Overcoming Representation Collapse in Label Smoothing

Add code
Feb 18, 2025
Viaarxiv icon

A Video-grounded Dialogue Dataset and Metric for Event-driven Activities

Add code
Jan 30, 2025
Figure 1 for A Video-grounded Dialogue Dataset and Metric for Event-driven Activities
Figure 2 for A Video-grounded Dialogue Dataset and Metric for Event-driven Activities
Figure 3 for A Video-grounded Dialogue Dataset and Metric for Event-driven Activities
Figure 4 for A Video-grounded Dialogue Dataset and Metric for Event-driven Activities
Viaarxiv icon

UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval

Add code
Dec 14, 2024
Figure 1 for UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
Figure 2 for UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
Figure 3 for UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
Figure 4 for UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
Viaarxiv icon

StableAnimator: High-Quality Identity-Preserving Human Image Animation

Add code
Nov 26, 2024
Viaarxiv icon

ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding

Add code
Oct 29, 2024
Viaarxiv icon

Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios

Add code
Oct 22, 2024
Viaarxiv icon

POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search

Add code
Oct 15, 2024
Figure 1 for POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search
Figure 2 for POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search
Figure 3 for POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search
Figure 4 for POPoS: Improving Efficient and Robust Facial Landmark Detection with Parallel Optimal Position Search
Viaarxiv icon

DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing

Add code
Sep 02, 2024
Viaarxiv icon

FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing

Add code
Aug 22, 2024
Figure 1 for FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing
Figure 2 for FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing
Figure 3 for FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing
Figure 4 for FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing
Viaarxiv icon