https://github.com/lrm20011/WTRP_Searcher.
Zero-Shot Object Navigation (ZSON) requires agents to navigate to objects specified via open-ended natural language without predefined categories or prior environmental knowledge. While recent methods leverage foundation models or multi-modal maps, they often rely on 2D representations and greedy strategies or require additional training or modules with high computation load, limiting performance in complex environments and real applications. We propose WTRP-Searcher, a novel framework that formulates ZSON as a Weighted Traveling Repairman Problem (WTRP), minimizing the weighted waiting time of viewpoints. Using a Vision-Language Model (VLM), we score viewpoints based on object-description similarity, projected onto a 2D map with depth information. An open-vocabulary detector identifies targets, dynamically updating goals, while a 3D embedding feature map enhances spatial awareness and environmental recall. WTRP-Searcher outperforms existing methods, offering efficient global planning and improved performance in complex ZSON tasks. Code and more demos will be avaliable on