Picture for Qibin Hou

Qibin Hou

AR-1-to-3: Single Image to Consistent 3D Object Generation via Next-View Prediction

Add code
Mar 17, 2025
Viaarxiv icon

K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs

Add code
Feb 25, 2025
Viaarxiv icon

Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT

Add code
Feb 10, 2025
Viaarxiv icon

LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding

Add code
Jan 09, 2025
Figure 1 for LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding
Figure 2 for LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding
Figure 3 for LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding
Figure 4 for LLaVA-Octopus: Unlocking Instruction-Driven Adaptive Projector Fusion for Video Understanding
Viaarxiv icon

Strip R-CNN: Large Strip Convolution for Remote Sensing Object Detection

Add code
Jan 08, 2025
Viaarxiv icon

SM3Det: A Unified Model for Multi-Modal Remote Sensing Object Detection

Add code
Dec 30, 2024
Viaarxiv icon

TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction

Add code
Dec 22, 2024
Figure 1 for TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Figure 2 for TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Figure 3 for TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Figure 4 for TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Viaarxiv icon

MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation

Add code
Dec 16, 2024
Figure 1 for MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation
Figure 2 for MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation
Figure 3 for MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation
Figure 4 for MaskCLIP++: A Mask-Based CLIP Fine-tuning Framework for Open-Vocabulary Image Segmentation
Viaarxiv icon

DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction

Add code
Dec 09, 2024
Figure 1 for DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction
Figure 2 for DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction
Figure 3 for DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction
Figure 4 for DenseVLM: A Retrieval and Decoupled Alignment Framework for Open-Vocabulary Dense Prediction
Viaarxiv icon

ClearSR: Latent Low-Resolution Image Embeddings Help Diffusion-Based Real-World Super Resolution Models See Clearer

Add code
Oct 18, 2024
Figure 1 for ClearSR: Latent Low-Resolution Image Embeddings Help Diffusion-Based Real-World Super Resolution Models See Clearer
Figure 2 for ClearSR: Latent Low-Resolution Image Embeddings Help Diffusion-Based Real-World Super Resolution Models See Clearer
Figure 3 for ClearSR: Latent Low-Resolution Image Embeddings Help Diffusion-Based Real-World Super Resolution Models See Clearer
Figure 4 for ClearSR: Latent Low-Resolution Image Embeddings Help Diffusion-Based Real-World Super Resolution Models See Clearer
Viaarxiv icon