Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yunfan Gao

CitySeeker: How Do VLMS Explore Embodied Urban Navigation With Implicit Human Needs?

Dec 18, 2025

Siqi Wang, Chao Liang, Yunfan Gao, Erxin Yu, Sen Li, Yushi Li, Jing Li, Haofen Wang

Abstract:Vision-Language Models (VLMs) have made significant progress in explicit instruction-based navigation; however, their ability to interpret implicit human needs (e.g., "I am thirsty") in dynamic urban environments remains underexplored. This paper introduces CitySeeker, a novel benchmark designed to assess VLMs' spatial reasoning and decision-making capabilities for exploring embodied urban navigation to address implicit needs. CitySeeker includes 6,440 trajectories across 8 cities, capturing diverse visual characteristics and implicit needs in 7 goal-driven scenarios. Extensive experiments reveal that even top-performing models (e.g., Qwen2.5-VL-32B-Instruct) achieve only 21.1% task completion. We find key bottlenecks in error accumulation in long-horizon reasoning, inadequate spatial cognition, and deficient experiential recall. To further analyze them, we investigate a series of exploratory strategies-Backtracking Mechanisms, Enriching Spatial Cognition, and Memory-Based Retrieval (BCR), inspired by human cognitive mapping's emphasis on iterative observation-reasoning cycles and adaptive path optimization. Our analysis provides actionable insights for developing VLMs with robust spatial intelligence required for tackling "last-mile" navigation challenges.

Via

Access Paper or Ask Questions

Semi-Infinite Programming for Collision-Avoidance in Optimal and Model Predictive Control

Aug 17, 2025

Yunfan Gao, Florian Messerer, Niels van Duijkeren, Rashmi Dabir, Moritz Diehl

Figure 1 for Semi-Infinite Programming for Collision-Avoidance in Optimal and Model Predictive Control

Figure 2 for Semi-Infinite Programming for Collision-Avoidance in Optimal and Model Predictive Control

Figure 3 for Semi-Infinite Programming for Collision-Avoidance in Optimal and Model Predictive Control

Figure 4 for Semi-Infinite Programming for Collision-Avoidance in Optimal and Model Predictive Control

Abstract:This paper presents a novel approach for collision avoidance in optimal and model predictive control, in which the environment is represented by a large number of points and the robot as a union of padded polygons. The conditions that none of the points shall collide with the robot can be written in terms of an infinite number of constraints per obstacle point. We show that the resulting semi-infinite programming (SIP) optimal control problem (OCP) can be efficiently tackled through a combination of two methods: local reduction and an external active-set method. Specifically, this involves iteratively identifying the closest point obstacles, determining the lower-level distance minimizer among all feasible robot shape parameters, and solving the upper-level finitely-constrained subproblems. In addition, this paper addresses robust collision avoidance in the presence of ellipsoidal state uncertainties. Enforcing constraint satisfaction over all possible uncertainty realizations extends the dimension of constraint infiniteness. The infinitely many constraints arising from translational uncertainty are handled by local reduction together with the robot shape parameterization, while rotational uncertainty is addressed via a backoff reformulation. A controller implemented based on the proposed method is demonstrated on a real-world robot running at 20Hz, enabling fast and collision-free navigation in tight spaces. An application to 3D collision avoidance is also demonstrated in simulation.

* 21 pages, 15 figures

Via

Access Paper or Ask Questions

Synergizing RAG and Reasoning: A Systematic Review

Apr 22, 2025

Yunfan Gao, Yun Xiong, Yijie Zhong, Yuxi Bi, Ming Xue, Haofen Wang

Figure 1 for Synergizing RAG and Reasoning: A Systematic Review

Figure 2 for Synergizing RAG and Reasoning: A Systematic Review

Figure 3 for Synergizing RAG and Reasoning: A Systematic Review

Figure 4 for Synergizing RAG and Reasoning: A Systematic Review

Abstract:Recent breakthroughs in large language models (LLMs), particularly in reasoning capabilities, have propelled Retrieval-Augmented Generation (RAG) to unprecedented levels. By synergizing retrieval mechanisms with advanced reasoning, LLMs can now tackle increasingly complex problems. This paper presents a systematic review of the collaborative interplay between RAG and reasoning, clearly defining "reasoning" within the RAG context. It construct a comprehensive taxonomy encompassing multi-dimensional collaborative objectives, representative paradigms, and technical implementations, and analyze the bidirectional synergy methods. Additionally, we critically evaluate current limitations in RAG assessment, including the absence of intermediate supervision for multi-step reasoning and practical challenges related to cost-risk trade-offs. To bridge theory and practice, we provide practical guidelines tailored to diverse real-world applications. Finally, we identify promising research directions, such as graph-based knowledge integration, hybrid model collaboration, and RL-driven optimization. Overall, this work presents a theoretical framework and practical foundation to advance RAG systems in academia and industry, fostering the next generation of RAG solutions.

Via

Access Paper or Ask Questions

StePO-Rec: Towards Personalized Outfit Styling Assistant via Knowledge-Guided Multi-Step Reasoning

Apr 14, 2025

Yuxi Bi, Yunfan Gao, Haofen Wang

Figure 1 for StePO-Rec: Towards Personalized Outfit Styling Assistant via Knowledge-Guided Multi-Step Reasoning

Figure 2 for StePO-Rec: Towards Personalized Outfit Styling Assistant via Knowledge-Guided Multi-Step Reasoning

Figure 3 for StePO-Rec: Towards Personalized Outfit Styling Assistant via Knowledge-Guided Multi-Step Reasoning

Figure 4 for StePO-Rec: Towards Personalized Outfit Styling Assistant via Knowledge-Guided Multi-Step Reasoning

Abstract:Advancements in Generative AI offers new opportunities for FashionAI, surpassing traditional recommendation systems that often lack transparency and struggle to integrate expert knowledge, leaving the potential for personalized fashion styling remain untapped. To address these challenges, we present PAFA (Principle-Aware Fashion), a multi-granular knowledge base that organizes professional styling expertise into three levels of metadata, domain principles, and semantic relationships. Using PAFA, we develop StePO-Rec, a knowledge-guided method for multi-step outfit recommendation. StePO-Rec provides structured suggestions using a scenario-dimension-attribute framework, employing recursive tree construction to align recommendations with both professional principles and individual preferences. A preference-trend re-ranking system further adapts to fashion trends while maintaining the consistency of the user's original style. Experiments on the widely used personalized outfit dataset IQON show a 28% increase in Recall@1 and 32.8% in MAP. Furthermore, case studies highlight improved explainability, traceability, result reliability, and the seamless integration of expertise and personalization.

Via

Access Paper or Ask Questions

Decoding Urban Industrial Complexity: Enhancing Knowledge-Driven Insights via IndustryScopeGPT

Nov 24, 2024

Siqi Wang, Chao Liang, Yunfan Gao, Yang Liu, Jing Li, Haofen Wang

Figure 1 for Decoding Urban Industrial Complexity: Enhancing Knowledge-Driven Insights via IndustryScopeGPT

Figure 2 for Decoding Urban Industrial Complexity: Enhancing Knowledge-Driven Insights via IndustryScopeGPT

Figure 3 for Decoding Urban Industrial Complexity: Enhancing Knowledge-Driven Insights via IndustryScopeGPT

Figure 4 for Decoding Urban Industrial Complexity: Enhancing Knowledge-Driven Insights via IndustryScopeGPT

Abstract:Industrial parks are critical to urban economic growth. Yet, their development often encounters challenges stemming from imbalances between industrial requirements and urban services, underscoring the need for strategic planning and operations. This paper introduces IndustryScopeKG, a pioneering large-scale multi-modal, multi-level industrial park knowledge graph, which integrates diverse urban data including street views, corporate, socio-economic, and geospatial information, capturing the complex relationships and semantics within industrial parks. Alongside this, we present the IndustryScopeGPT framework, which leverages Large Language Models (LLMs) with Monte Carlo Tree Search to enhance tool-augmented reasoning and decision-making in Industrial Park Planning and Operation (IPPO). Our work significantly improves site recommendation and functional planning, demonstrating the potential of combining LLMs with structured datasets to advance industrial park management. This approach sets a new benchmark for intelligent IPPO research and lays a robust foundation for advancing urban industrial development. The dataset and related code are available at https://github.com/Tongji-KGLLM/IndustryScope.

* In Proceedings of the 32nd ACM International Conference on Multimedia, pp. 4757-4765 (2024, October)
* 9 pages, 6 figures, the 32nd ACM International Conference on Multimedia

Via

Access Paper or Ask Questions

Real-Time-Feasible Collision-Free Motion Planning For Ellipsoidal Objects

Sep 18, 2024

Yunfan Gao, Florian Messerer, Niels van Duijkeren, Boris Houska, Moritz Diehl

Figure 1 for Real-Time-Feasible Collision-Free Motion Planning For Ellipsoidal Objects

Figure 2 for Real-Time-Feasible Collision-Free Motion Planning For Ellipsoidal Objects

Figure 3 for Real-Time-Feasible Collision-Free Motion Planning For Ellipsoidal Objects

Figure 4 for Real-Time-Feasible Collision-Free Motion Planning For Ellipsoidal Objects

Abstract:Online planning of collision-free trajectories is a fundamental task for robotics and self-driving car applications. This paper revisits collision avoidance between ellipsoidal objects using differentiable constraints. Two ellipsoids do not overlap if and only if the endpoint of the vector between the center points of the ellipsoids does not lie in the interior of the Minkowski sum of the ellipsoids. This condition is formulated using a parametric over-approximation of the Minkowski sum, which can be made tight in any given direction. The resulting collision avoidance constraint is included in an optimal control problem (OCP) and evaluated in comparison to the separating-hyperplane approach. Not only do we observe that the Minkowski-sum formulation is computationally more efficient in our experiments, but also that using pre-determined over-approximation parameters based on warm-start trajectories leads to a very limited increase in suboptimality. This gives rise to a novel real-time scheme for collision-free motion planning with model predictive control (MPC). Both the real-time feasibility and the effectiveness of the constraint formulation are demonstrated in challenging real-world experiments.

Via

Access Paper or Ask Questions

Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks

Jul 26, 2024

Yunfan Gao, Yun Xiong, Meng Wang, Haofen Wang

Figure 1 for Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks

Figure 2 for Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks

Figure 3 for Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks

Figure 4 for Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks

Abstract:Retrieval-augmented Generation (RAG) has markedly enhanced the capabilities of Large Language Models (LLMs) in tackling knowledge-intensive tasks. The increasing demands of application scenarios have driven the evolution of RAG, leading to the integration of advanced retrievers, LLMs and other complementary technologies, which in turn has amplified the intricacy of RAG systems. However, the rapid advancements are outpacing the foundational RAG paradigm, with many methods struggling to be unified under the process of "retrieve-then-generate". In this context, this paper examines the limitations of the existing RAG paradigm and introduces the modular RAG framework. By decomposing complex RAG systems into independent modules and specialized operators, it facilitates a highly reconfigurable framework. Modular RAG transcends the traditional linear architecture, embracing a more advanced design that integrates routing, scheduling, and fusion mechanisms. Drawing on extensive research, this paper further identifies prevalent RAG patterns-linear, conditional, branching, and looping-and offers a comprehensive analysis of their respective implementation nuances. Modular RAG presents innovative opportunities for the conceptualization and deployment of RAG systems. Finally, the paper explores the potential emergence of new operators and paradigms, establishing a solid theoretical foundation and a practical roadmap for the continued evolution and practical deployment of RAG technologies.

Via

Access Paper or Ask Questions

Stochastic Model Predictive Control with Optimal Linear Feedback for Mobile Robots in Dynamic Environments

Jul 19, 2024

Yunfan Gao, Florian Messerer, Niels van Duijkeren, Moritz Diehl

Abstract:Robot navigation around humans can be a challenging problem since human movements are hard to predict. Stochastic model predictive control (MPC) can account for such uncertainties and approximately bound the probability of a collision to take place. In this paper, to counteract the rapidly growing human motion uncertainty over time, we incorporate state feedback in the stochastic MPC. This allows the robot to more closely track reference trajectories. To this end the feedback policy is left as a degree of freedom in the optimal control problem. The stochastic MPC with feedback is validated in simulation experiments and is compared against nominal MPC and stochastic MPC without feedback. The added computation time can be limited by reducing the number of additional variables for the feedback law with a small compromise in control performance.

Via

Access Paper or Ask Questions

Retrieval-Augmented Generation for Large Language Models: A Survey

Jan 03, 2024

Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang(+1 more)

Figure 1 for Retrieval-Augmented Generation for Large Language Models: A Survey

Figure 2 for Retrieval-Augmented Generation for Large Language Models: A Survey

Figure 3 for Retrieval-Augmented Generation for Large Language Models: A Survey

Figure 4 for Retrieval-Augmented Generation for Large Language Models: A Survey

Abstract:Large Language Models (LLMs) demonstrate significant capabilities but face challenges such as hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This enhances the accuracy and credibility of the models, particularly for knowledge-intensive tasks, and allows for continuous knowledge updates and integration of domain-specific information. RAG synergistically merges LLMs' intrinsic knowledge with the vast, dynamic repositories of external databases. This comprehensive review paper offers a detailed examination of the progression of RAG paradigms, encompassing the Naive RAG, the Advanced RAG, and the Modular RAG. It meticulously scrutinizes the tripartite foundation of RAG frameworks, which includes the retrieval , the generation and the augmentation techniques. The paper highlights the state-of-the-art technologies embedded in each of these critical components, providing a profound understanding of the advancements in RAG systems. Furthermore, this paper introduces the metrics and benchmarks for assessing RAG models, along with the most up-to-date evaluation framework. In conclusion, the paper delineates prospective avenues for research, including the identification of challenges, the expansion of multi-modalities, and the progression of the RAG infrastructure and its ecosystem.

* Ongoing Work

Via

Access Paper or Ask Questions

Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System

Apr 04, 2023

Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, Jiawei Zhang

Abstract:Large language models (LLMs) have demonstrated their significant potential to be applied for addressing various application tasks. However, traditional recommender systems continue to face great challenges such as poor interactivity and explainability, which actually also hinder their broad deployment in real-world systems. To address these limitations, this paper proposes a novel paradigm called Chat-Rec (ChatGPT Augmented Recommender System) that innovatively augments LLMs for building conversational recommender systems by converting user profiles and historical interactions into prompts. Chat-Rec is demonstrated to be effective in learning user preferences and establishing connections between users and products through in-context learning, which also makes the recommendation process more interactive and explainable. What's more, within the Chat-Rec framework, user's preferences can transfer to different products for cross-domain recommendations, and prompt-based injection of information into LLMs can also handle the cold-start scenarios with new items. In our experiments, Chat-Rec effectively improve the results of top-k recommendations and performs better in zero-shot rating prediction task. Chat-Rec offers a novel approach to improving recommender systems and presents new practical scenarios for the implementation of AIGC (AI generated content) in recommender system studies.

Via

Access Paper or Ask Questions