Picture for Si Liu

Si Liu

Image Understanding Makes for A Good Tokenizer for Image Generation

Add code
Nov 07, 2024
Viaarxiv icon

Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology

Add code
Oct 10, 2024
Figure 1 for Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Figure 2 for Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Figure 3 for Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Figure 4 for Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology
Viaarxiv icon

MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More

Add code
Oct 08, 2024
Figure 1 for MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More
Figure 2 for MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More
Figure 3 for MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More
Figure 4 for MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More
Viaarxiv icon

FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction

Add code
Sep 26, 2024
Viaarxiv icon

Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding

Add code
Sep 12, 2024
Figure 1 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Figure 2 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Figure 3 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Figure 4 for Dynamic Prompting of Frozen Text-to-Image Diffusion Models for Panoptic Narrative Grounding
Viaarxiv icon

Knowledge Distillation via Query Selection for Detection Transformer

Add code
Sep 10, 2024
Figure 1 for Knowledge Distillation via Query Selection for Detection Transformer
Figure 2 for Knowledge Distillation via Query Selection for Detection Transformer
Figure 3 for Knowledge Distillation via Query Selection for Detection Transformer
Figure 4 for Knowledge Distillation via Query Selection for Detection Transformer
Viaarxiv icon

LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation

Add code
Aug 28, 2024
Viaarxiv icon

Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation

Add code
Aug 28, 2024
Viaarxiv icon

MV2DFusion: Leveraging Modality-Specific Object Semantics for Multi-Modal 3D Detection

Add code
Aug 12, 2024
Viaarxiv icon

LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction

Add code
Jul 16, 2024
Viaarxiv icon