Picture for Shijia Huang

Shijia Huang

Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding

Add code
Nov 30, 2024
Viaarxiv icon

Enhancing Temporal Modeling of Video LLMs via Time Gating

Add code
Oct 08, 2024
Figure 1 for Enhancing Temporal Modeling of Video LLMs via Time Gating
Figure 2 for Enhancing Temporal Modeling of Video LLMs via Time Gating
Figure 3 for Enhancing Temporal Modeling of Video LLMs via Time Gating
Figure 4 for Enhancing Temporal Modeling of Video LLMs via Time Gating
Viaarxiv icon

Towards Learning a Generalist Model for Embodied Navigation

Add code
Dec 06, 2023
Viaarxiv icon

LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models

Add code
Dec 05, 2023
Viaarxiv icon

CLEVA: Chinese Language Models EVAluation Platform

Add code
Aug 09, 2023
Figure 1 for CLEVA: Chinese Language Models EVAluation Platform
Figure 2 for CLEVA: Chinese Language Models EVAluation Platform
Figure 3 for CLEVA: Chinese Language Models EVAluation Platform
Figure 4 for CLEVA: Chinese Language Models EVAluation Platform
Viaarxiv icon

MP-Former: Mask-Piloted Transformer for Image Segmentation

Add code
Mar 15, 2023
Viaarxiv icon

DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding

Add code
Nov 30, 2022
Viaarxiv icon

A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation

Add code
Nov 15, 2022
Viaarxiv icon

DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

Add code
Apr 09, 2022
Figure 1 for DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors
Figure 2 for DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors
Figure 3 for DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors
Figure 4 for DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors
Viaarxiv icon

Multi-View Transformer for 3D Visual Grounding

Add code
Apr 05, 2022
Figure 1 for Multi-View Transformer for 3D Visual Grounding
Figure 2 for Multi-View Transformer for 3D Visual Grounding
Figure 3 for Multi-View Transformer for 3D Visual Grounding
Figure 4 for Multi-View Transformer for 3D Visual Grounding
Viaarxiv icon