Picture for Jiale Cao

Jiale Cao

VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos

Add code
Nov 07, 2024
Figure 1 for VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Figure 2 for VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Figure 3 for VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Figure 4 for VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Viaarxiv icon

DB-SAM: Delving into High Quality Universal Medical Image Segmentation

Add code
Oct 05, 2024
Viaarxiv icon

iSeg: An Iterative Refinement-based Framework for Training-free Segmentation

Add code
Sep 05, 2024
Viaarxiv icon

Parameter-Efficient Fine-Tuning for Continual Learning: A Neural Tangent Kernel Perspective

Add code
Jul 24, 2024
Viaarxiv icon

Multi-Granularity Language-Guided Multi-Object Tracking

Add code
Jun 07, 2024
Viaarxiv icon

VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection

Add code
Apr 15, 2024
Viaarxiv icon

Implicit and Explicit Language Guidance for Diffusion-based Visual Perception

Add code
Apr 11, 2024
Viaarxiv icon

SGD: Street View Synthesis with Gaussian Splatting and Diffusion Prior

Add code
Mar 29, 2024
Viaarxiv icon

CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation

Add code
Mar 19, 2024
Viaarxiv icon

SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation

Add code
Nov 27, 2023
Viaarxiv icon