Picture for Sifei Liu

Sifei Liu

NVILA: Efficient Frontier Visual Language Models

Add code
Dec 05, 2024
Figure 1 for NVILA: Efficient Frontier Visual Language Models
Figure 2 for NVILA: Efficient Frontier Visual Language Models
Figure 3 for NVILA: Efficient Frontier Visual Language Models
Figure 4 for NVILA: Efficient Frontier Visual Language Models
Viaarxiv icon

NaVILA: Legged Robot Vision-Language-Action Model for Navigation

Add code
Dec 05, 2024
Viaarxiv icon

No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images

Add code
Oct 31, 2024
Figure 1 for No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
Figure 2 for No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
Figure 3 for No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
Figure 4 for No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
Viaarxiv icon

SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation

Add code
Sep 20, 2024
Figure 1 for SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation
Figure 2 for SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation
Figure 3 for SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation
Figure 4 for SSE: Multimodal Semantic Data Selection and Enrichment for Industrial-scale Data Assimilation
Viaarxiv icon

GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation

Add code
Jun 18, 2024
Figure 1 for GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation
Figure 2 for GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation
Figure 3 for GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation
Figure 4 for GroPrompt: Efficient Grounded Prompting and Adaptation for Referring Video Object Segmentation
Viaarxiv icon

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

Add code
Jun 04, 2024
Figure 1 for CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Figure 2 for CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Figure 3 for CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Figure 4 for CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation
Viaarxiv icon

SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model

Add code
Jun 03, 2024
Viaarxiv icon

Compositional Text-to-Image Generation with Dense Blob Representations

Add code
May 14, 2024
Figure 1 for Compositional Text-to-Image Generation with Dense Blob Representations
Figure 2 for Compositional Text-to-Image Generation with Dense Blob Representations
Figure 3 for Compositional Text-to-Image Generation with Dense Blob Representations
Figure 4 for Compositional Text-to-Image Generation with Dense Blob Representations
Viaarxiv icon

HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data

Add code
Mar 18, 2024
Viaarxiv icon

RegionGPT: Towards Region Understanding Vision Language Model

Add code
Mar 04, 2024
Figure 1 for RegionGPT: Towards Region Understanding Vision Language Model
Figure 2 for RegionGPT: Towards Region Understanding Vision Language Model
Figure 3 for RegionGPT: Towards Region Understanding Vision Language Model
Figure 4 for RegionGPT: Towards Region Understanding Vision Language Model
Viaarxiv icon