Picture for Jiayi Ji

Jiayi Ji

MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning

Add code
Mar 26, 2025
Viaarxiv icon

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension

Add code
Mar 11, 2025
Viaarxiv icon

IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation

Add code
Jan 09, 2025
Figure 1 for IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation
Figure 2 for IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation
Figure 3 for IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation
Figure 4 for IPDN: Image-enhanced Prompt Decoding Network for 3D Referring Expression Segmentation
Viaarxiv icon

RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation

Add code
Dec 03, 2024
Figure 1 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 2 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 3 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Figure 4 for RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation
Viaarxiv icon

Mixed Degradation Image Restoration via Local Dynamic Optimization and Conditional Embedding

Add code
Nov 25, 2024
Viaarxiv icon

Any-to-3D Generation via Hybrid Diffusion Supervision

Add code
Nov 22, 2024
Figure 1 for Any-to-3D Generation via Hybrid Diffusion Supervision
Figure 2 for Any-to-3D Generation via Hybrid Diffusion Supervision
Figure 3 for Any-to-3D Generation via Hybrid Diffusion Supervision
Figure 4 for Any-to-3D Generation via Hybrid Diffusion Supervision
Viaarxiv icon

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension

Add code
Nov 20, 2024
Figure 1 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Figure 2 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Figure 3 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Figure 4 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Viaarxiv icon

Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image

Add code
Oct 20, 2024
Figure 1 for Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
Figure 2 for Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
Figure 3 for Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
Figure 4 for Synergistic Dual Spatial-aware Generation of Image-to-Text and Text-to-Image
Viaarxiv icon

$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models

Add code
Oct 17, 2024
Figure 1 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 2 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 3 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Figure 4 for $γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Viaarxiv icon

I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing

Add code
Aug 26, 2024
Figure 1 for I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
Figure 2 for I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
Figure 3 for I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
Figure 4 for I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing
Viaarxiv icon