Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuchen Lu

Hierarchical Image Matching for UAV Absolute Visual Localization via Semantic and Structural Constraints

Jun 11, 2025

Xiangkai Zhang, Xiang Zhou, Mao Chen, Yuchen Lu, Xu Yang, Zhiyong Liu

Abstract:Absolute localization, aiming to determine an agent's location with respect to a global reference, is crucial for unmanned aerial vehicles (UAVs) in various applications, but it becomes challenging when global navigation satellite system (GNSS) signals are unavailable. Vision-based absolute localization methods, which locate the current view of the UAV in a reference satellite map to estimate its position, have become popular in GNSS-denied scenarios. However, existing methods mostly rely on traditional and low-level image matching, suffering from difficulties due to significant differences introduced by cross-source discrepancies and temporal variations. To overcome these limitations, in this paper, we introduce a hierarchical cross-source image matching method designed for UAV absolute localization, which integrates a semantic-aware and structure-constrained coarse matching module with a lightweight fine-grained matching module. Specifically, in the coarse matching module, semantic features derived from a vision foundation model first establish region-level correspondences under semantic and structural constraints. Then, the fine-grained matching module is applied to extract fine features and establish pixel-level correspondences. Building upon this, a UAV absolute visual localization pipeline is constructed without any reliance on relative localization techniques, mainly by employing an image retrieval module before the proposed hierarchical image matching modules. Experimental evaluations on public benchmark datasets and a newly introduced CS-UAV dataset demonstrate superior accuracy and robustness of the proposed method under various challenging conditions, confirming its effectiveness.

* 8 pages, 6 figures

Via

Access Paper or Ask Questions

Estimating quantum relative entropies on quantum computers

Jan 13, 2025

Yuchen Lu, Kun Fang

Figure 1 for Estimating quantum relative entropies on quantum computers

Figure 2 for Estimating quantum relative entropies on quantum computers

Figure 3 for Estimating quantum relative entropies on quantum computers

Figure 4 for Estimating quantum relative entropies on quantum computers

Abstract:Quantum relative entropy, a quantum generalization of the well-known Kullback-Leibler divergence, serves as a fundamental measure of the distinguishability between quantum states and plays a pivotal role in quantum information science. Despite its importance, efficiently estimating quantum relative entropy between two quantum states on quantum computers remains a significant challenge. In this work, we propose the first quantum algorithm for estimating quantum relative entropy and Petz R\'{e}nyi divergence from two unknown quantum states on quantum computers, addressing open problems highlighted in [Phys. Rev. A 109, 032431 (2024)] and [IEEE Trans. Inf. Theory 70, 5653-5680 (2024)]. This is achieved by combining quadrature approximations of relative entropies, the variational representation of quantum f-divergences, and a new technique for parameterizing Hermitian polynomial operators to estimate their traces with quantum states. Notably, the circuit size of our algorithm is at most 2n+1 with n being the number of qubits in the quantum states and it is directly applicable to distributed scenarios, where quantum states to be compared are hosted on cross-platform quantum computers. We validate our algorithm through numerical simulations, laying the groundwork for its future deployment on quantum hardware devices.

* 24 pages, 10 figures; comments are welcome

Via

Access Paper or Ask Questions

Quantum Langevin Dynamics for Optimization

Nov 27, 2023

Zherui Chen, Yuchen Lu, Hao Wang, Yizhou Liu, Tongyang Li

Abstract:We initiate the study of utilizing Quantum Langevin Dynamics (QLD) to solve optimization problems, particularly those non-convex objective functions that present substantial obstacles for traditional gradient descent algorithms. Specifically, we examine the dynamics of a system coupled with an infinite heat bath. This interaction induces both random quantum noise and a deterministic damping effect to the system, which nudge the system towards a steady state that hovers near the global minimum of objective functions. We theoretically prove the convergence of QLD in convex landscapes, demonstrating that the average energy of the system can approach zero in the low temperature limit with an exponential decay rate correlated with the evolution time. Numerically, we first show the energy dissipation capability of QLD by retracing its origins to spontaneous emission. Furthermore, we conduct detailed discussion of the impact of each parameter. Finally, based on the observations when comparing QLD with classical Fokker-Plank-Smoluchowski equation, we propose a time-dependent QLD by making temperature and $\hbar$ time-dependent parameters, which can be theoretically proven to converge better than the time-independent case and also outperforms a series of state-of-the-art quantum and classical optimization algorithms in many non-convex landscapes.

* 33 pages, 1 table, 26 figures

Via

Access Paper or Ask Questions

Hyper-Decision Transformer for Efficient Online Policy Adaptation

Apr 17, 2023

Mengdi Xu, Yuchen Lu, Yikang Shen, Shun Zhang, Ding Zhao, Chuang Gan

Abstract:Decision Transformers (DT) have demonstrated strong performances in offline reinforcement learning settings, but quickly adapting to unseen novel tasks remains challenging. To address this challenge, we propose a new framework, called Hyper-Decision Transformer (HDT), that can generalize to novel tasks from a handful of demonstrations in a data- and parameter-efficient manner. To achieve such a goal, we propose to augment the base DT with an adaptation module, whose parameters are initialized by a hyper-network. When encountering unseen tasks, the hyper-network takes a handful of demonstrations as inputs and initializes the adaptation module accordingly. This initialization enables HDT to efficiently adapt to novel tasks by only fine-tuning the adaptation module. We validate HDT's generalization capability on object manipulation tasks. We find that with a single expert demonstration and fine-tuning only 0.5% of DT parameters, HDT adapts faster to unseen tasks than fine-tuning the whole DT model. Finally, we explore a more challenging setting where expert actions are not available, and we show that HDT outperforms state-of-the-art baselines in terms of task success rates by a large margin.

* ICLR 2023. Project page: https://sites.google.com/view/hdtforiclr2023/home

Via

Access Paper or Ask Questions

Uniform Masking Prevails in Vision-Language Pretraining

Dec 10, 2022

Siddharth Verma, Yuchen Lu, Rui Hou, Hanchao Yu, Nicolas Ballas, Madian Khabsa, Amjad Almahairi

Figure 1 for Uniform Masking Prevails in Vision-Language Pretraining

Figure 2 for Uniform Masking Prevails in Vision-Language Pretraining

Figure 3 for Uniform Masking Prevails in Vision-Language Pretraining

Figure 4 for Uniform Masking Prevails in Vision-Language Pretraining

Abstract:Masked Language Modeling (MLM) has proven to be an essential component of Vision-Language (VL) pretraining. To implement MLM, the researcher must make two design choices: the masking strategy, which determines which tokens to mask, and the masking rate, which determines how many tokens to mask. Previous work has focused primarily on the masking strategy while setting the masking rate at a default of 15\%. In this paper, we show that increasing this masking rate improves downstream performance while simultaneously reducing performance gap among different masking strategies, rendering the uniform masking strategy competitive to other more complex ones. Surprisingly, we also discover that increasing the masking rate leads to gains in Image-Text Matching (ITM) tasks, suggesting that the role of MLM goes beyond language modeling in VL pretraining.

Via

Access Paper or Ask Questions

Prompting Decision Transformer for Few-Shot Policy Generalization

Jun 27, 2022

Mengdi Xu, Yikang Shen, Shun Zhang, Yuchen Lu, Ding Zhao, Joshua B. Tenenbaum, Chuang Gan

Figure 1 for Prompting Decision Transformer for Few-Shot Policy Generalization

Figure 2 for Prompting Decision Transformer for Few-Shot Policy Generalization

Figure 3 for Prompting Decision Transformer for Few-Shot Policy Generalization

Figure 4 for Prompting Decision Transformer for Few-Shot Policy Generalization

Abstract:Humans can leverage prior experience and learn novel tasks from a handful of demonstrations. In contrast to offline meta-reinforcement learning, which aims to achieve quick adaptation through better algorithm design, we investigate the effect of architecture inductive bias on the few-shot learning capability. We propose a Prompt-based Decision Transformer (Prompt-DT), which leverages the sequential modeling ability of the Transformer architecture and the prompt framework to achieve few-shot adaptation in offline RL. We design the trajectory prompt, which contains segments of the few-shot demonstrations, and encodes task-specific information to guide policy generation. Our experiments in five MuJoCo control benchmarks show that Prompt-DT is a strong few-shot learner without any extra finetuning on unseen target tasks. Prompt-DT outperforms its variants and strong meta offline RL baselines by a large margin with a trajectory prompt containing only a few timesteps. Prompt-DT is also robust to prompt length changes and can generalize to out-of-distribution (OOD) environments.

* ICML 2022. Project page: https://mxu34.github.io/PromptDT/

Via

Access Paper or Ask Questions

Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning

Jun 02, 2022

Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni

Figure 1 for Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning

Figure 2 for Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning

Figure 3 for Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning

Figure 4 for Expressiveness and Learnability: A Unifying View for Evaluating Self-Supervised Learning

Abstract:We propose a unifying view to analyze the representation quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training. We argue that representations can be evaluated through the lens of expressiveness and learnability. We propose to use the Intrinsic Dimension (ID) to assess expressiveness and introduce Cluster Learnability (CL) to assess learnability. CL is measured as the learning speed of a KNN classifier trained to predict labels obtained by clustering the representations with K-means. We thus combine CL and ID into a single predictor: CLID. Through a large-scale empirical study with a diverse family of SSL algorithms, we find that CLID better correlates with in-distribution model performance than other competing recent evaluation schemes. We also benchmark CLID on out-of-domain generalization, where CLID serves as a predictor of the transfer performance of SSL models on several classification tasks, yielding improvements with respect to the competing baselines.

Via

Access Paper or Ask Questions

Iterated learning for emergent systematicity in VQA

May 03, 2021

Ankit Vani, Max Schwarzer, Yuchen Lu, Eeshan Dhekane, Aaron Courville

Figure 1 for Iterated learning for emergent systematicity in VQA

Figure 2 for Iterated learning for emergent systematicity in VQA

Figure 3 for Iterated learning for emergent systematicity in VQA

Figure 4 for Iterated learning for emergent systematicity in VQA

Abstract:Although neural module networks have an architectural bias towards compositionality, they require gold standard layouts to generalize systematically in practice. When instead learning layouts and modules jointly, compositionality does not arise automatically and an explicit pressure is necessary for the emergence of layouts exhibiting the right structure. We propose to address this problem using iterated learning, a cognitive science theory of the emergence of compositional languages in nature that has primarily been applied to simple referential games in machine learning. Considering the layouts of module networks as samples from an emergent language, we use iterated learning to encourage the development of structure within this language. We show that the resulting layouts support systematic generalization in neural agents solving the more complex task of visual question-answering. Our regularized iterated learning method can outperform baselines without iterated learning on SHAPES-SyGeT (SHAPES Systematic Generalization Test), a new split of the SHAPES dataset we introduce to evaluate systematic generalization, and on CLOSURE, an extension of CLEVR also designed to test systematic generalization. We demonstrate superior performance in recovering ground-truth compositional program structure with limited supervision on both SHAPES-SyGeT and CLEVR.

* Published as a conference paper at ICLR 2021. 9 pages main, 21 pages total including references and appendix

Via

Access Paper or Ask Questions

Learning Task Decomposition with Ordered Memory Policy Network

Mar 19, 2021

Yuchen Lu, Yikang Shen, Siyuan Zhou, Aaron Courville, Joshua B. Tenenbaum, Chuang Gan

Figure 1 for Learning Task Decomposition with Ordered Memory Policy Network

Figure 2 for Learning Task Decomposition with Ordered Memory Policy Network

Figure 3 for Learning Task Decomposition with Ordered Memory Policy Network

Figure 4 for Learning Task Decomposition with Ordered Memory Policy Network

Abstract:Many complex real-world tasks are composed of several levels of sub-tasks. Humans leverage these hierarchical structures to accelerate the learning process and achieve better generalization. In this work, we study the inductive bias and propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration. The discovered subtask hierarchy could be used to perform task decomposition, recovering the subtask boundaries in an unstruc-tured demonstration. Experiments on Craft and Dial demonstrate that our modelcan achieve higher task decomposition performance under both unsupervised and weakly supervised settings, comparing with strong baselines. OMPN can also bedirectly applied to partially observable environments and still achieve higher task decomposition performance. Our visualization further confirms that the subtask hierarchy can emerge in our model.

Via

Access Paper or Ask Questions

Supervised Seeded Iterated Learning for Interactive Language Learning

Oct 06, 2020

Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville

Figure 1 for Supervised Seeded Iterated Learning for Interactive Language Learning

Figure 2 for Supervised Seeded Iterated Learning for Interactive Language Learning

Figure 3 for Supervised Seeded Iterated Learning for Interactive Language Learning

Figure 4 for Supervised Seeded Iterated Learning for Interactive Language Learning

Abstract:Language drift has been one of the major obstacles to train language models through interaction. When word-based conversational agents are trained towards completing a task, they tend to invent their language rather than leveraging natural language. In recent literature, two general methods partially counter this phenomenon: Supervised Selfplay (S2P) and Seeded Iterated Learning (SIL). While S2P jointly trains interactive and supervised losses to counter the drift, SIL changes the training dynamics to prevent language drift from occurring. In this paper, we first highlight their respective weaknesses, i.e., late-stage training collapses and higher negative likelihood when evaluated on human corpus. Given these observations, we introduce Supervised Seeded Iterated Learning to combine both methods to minimize their respective weaknesses. We then show the effectiveness of \algo in the language-drift translation game.

Via

Access Paper or Ask Questions