Picture for Dingkang Liang

Dingkang Liang

A Unified Image-Dense Annotation Generation Model for Underwater Scenes

Add code
Mar 27, 2025
Viaarxiv icon

ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation

Add code
Mar 25, 2025
Viaarxiv icon

Seeing the Future, Perceiving the Future: A Unified Driving World Model for Future Generation and Perception

Add code
Mar 17, 2025
Viaarxiv icon

HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation

Add code
Jan 24, 2025
Figure 1 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Figure 2 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Figure 3 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Figure 4 for HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Viaarxiv icon

MINIMA: Modality Invariant Image Matching

Add code
Dec 27, 2024
Figure 1 for MINIMA: Modality Invariant Image Matching
Figure 2 for MINIMA: Modality Invariant Image Matching
Figure 3 for MINIMA: Modality Invariant Image Matching
Figure 4 for MINIMA: Modality Invariant Image Matching
Viaarxiv icon

Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning

Add code
Oct 10, 2024
Viaarxiv icon

Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression

Add code
Sep 01, 2024
Figure 1 for Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Figure 2 for Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Figure 3 for Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Figure 4 for Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Viaarxiv icon

Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models

Add code
Aug 09, 2024
Figure 1 for Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Figure 2 for Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Figure 3 for Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Figure 4 for Mini-Monkey: Multi-Scale Adaptive Cropping for Multimodal Large Language Models
Viaarxiv icon

Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping

Add code
Aug 04, 2024
Figure 1 for Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping
Figure 2 for Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping
Figure 3 for Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping
Figure 4 for Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping
Viaarxiv icon

A Unified Framework for 3D Scene Understanding

Add code
Jul 03, 2024
Viaarxiv icon