Picture for Teng Wang

Teng Wang

ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination

Add code
Oct 13, 2024
Viaarxiv icon

Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models

Add code
Oct 10, 2024
Figure 1 for Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Figure 2 for Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Figure 3 for Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Figure 4 for Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Viaarxiv icon

Large Language Models are Good Multi-lingual Learners : When LLMs Meet Cross-lingual Prompts

Add code
Sep 17, 2024
Viaarxiv icon

Leveraging Large Language Models for Solving Rare MIP Challenges

Add code
Sep 03, 2024
Viaarxiv icon

Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models

Add code
Jul 16, 2024
Viaarxiv icon

LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition

Add code
Jul 09, 2024
Figure 1 for LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition
Figure 2 for LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition
Figure 3 for LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition
Figure 4 for LVLM-empowered Multi-modal Representation Learning for Visual Place Recognition
Viaarxiv icon

Window-to-Window BEV Representation Learning for Limited FoV Cross-View Geo-localization

Add code
Jul 09, 2024
Figure 1 for Window-to-Window BEV Representation Learning for Limited FoV Cross-View Geo-localization
Figure 2 for Window-to-Window BEV Representation Learning for Limited FoV Cross-View Geo-localization
Figure 3 for Window-to-Window BEV Representation Learning for Limited FoV Cross-View Geo-localization
Figure 4 for Window-to-Window BEV Representation Learning for Limited FoV Cross-View Geo-localization
Viaarxiv icon

Two in One Go: Single-stage Emotion Recognition with Decoupled Subject-context Transformer

Add code
Apr 29, 2024
Viaarxiv icon

UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization

Add code
Apr 04, 2024
Viaarxiv icon

Video Understanding with Large Language Models: A Survey

Add code
Jan 04, 2024
Viaarxiv icon