CUHK
Abstract:Transformer-based large language models (LLMs) have displayed remarkable creative prowess and emergence capabilities. Existing empirical studies have revealed a strong connection between these LLMs' impressive emergence abilities and their in-context learning (ICL) capacity, allowing them to solve new tasks using only task-specific prompts without further fine-tuning. On the other hand, existing empirical and theoretical studies also show that there is a linear regularity of the multi-concept encoded semantic representation behind transformer-based LLMs. However, existing theoretical work fail to build up an understanding of the connection between this regularity and the innovative power of ICL. Additionally, prior work often focuses on simplified, unrealistic scenarios involving linear transformers or unrealistic loss functions, and they achieve only linear or sub-linear convergence rates. In contrast, this work provides a fine-grained mathematical analysis to show how transformers leverage the multi-concept semantics of words to enable powerful ICL and excellent out-of-distribution ICL abilities, offering insights into how transformers innovate solutions for certain unseen tasks encoded with multiple cross-concept semantics. Inspired by empirical studies on the linear latent geometry of LLMs, the analysis is based on a concept-based low-noise sparse coding prompt model. Leveraging advanced techniques, this work showcases the exponential 0-1 loss convergence over the highly non-convex training dynamics, which pioneeringly incorporates the challenges of softmax self-attention, ReLU-activated MLPs, and cross-entropy loss. Empirical simulations corroborate the theoretical findings.
Abstract:When tracking maneuvering targets, model-driven approaches encounter difficulties in comprehensively delineating complex real-world scenarios and are prone to model mismatch when the targets maneuver. Meanwhile, contemporary data-driven methods have overlooked measurements' confidence, markedly escalating the challenge of fitting a mapping from measurement sequences to target state sequences. To address these issues, this paper presents a deep maneuvering target tracking methodology based on target state space projection. The proposed methodology initially establishes a projection from the target measurement sequence to the target state space by formulating the probability density function of measurement error and samples the distribution information of measurement noise within the target state space as a measurement representation. Under this representation, the sequential regression task of target state estimation can be transmuted into a task of detecting the target location in the state space. Subsequently, a deep detection network is devised to accomplish target location detection in the target state space. Finally, a loss function is designed to facilitate the network's training for attaining the desired network performance. Simulation experiments suggest that the proposed method can maintain satisfactory tracking performance even when the target maneuvers, and can rapidly converge and achieve higher estimation accuracy compared with existing methods after the target maneuvers.
Abstract:Multi-frame detection algorithms can effectively utilize the correlation between consecutive echoes to improve the detection performance of weak targets. Existing efficient multi-frame detection algorithms are typically based on three sequential steps: plot extraction via a relative low primary threshold, track search and track detection. However, these three-stage processing algorithms may result in a notable loss of detection performance and do not fully leverage the available echo information across frames. As to applying graph neural networks in multi-frame detection, the algorithms are primarily based on node classification tasks, which cannot directly output target tracks. In this paper, we reformulate the multi-frame detection problem as a link prediction task in graphs. First, we perform a rough association of multi-frame observations that exceed the low threshold to construct observation association graphs. Subsequently, a multi-feature link prediction network is designed based on graph neural networks, which integrates multi-dimensional information, including echo structure, Doppler information, and spatio-temporal coupling of plots. By leveraging the principle of link prediction, we unifies the processes of track search and track detection into one step to reduce performance loss and directly output target tracks. Experimental results show that, compared with traditional single-frame and multi-frame detection algorithms, the proposed algorithm improves the detection performance of weak targets while suppressing false alarms. Additionally, interpretable analysis indicates that the designed network effectively integrates the utilized features, allowing for accurate associations between targets and false alarms.
Abstract:Symbolic regression (SR) methods have been extensively investigated to explore explicit algebraic Reynolds stress models (EARSM) for turbulence closure of Reynolds-averaged Navier-Stokes (RANS) equations. The deduced EARSM can be readily implemented in existing computational fluid dynamic (CFD) codes and promotes the identification of physically interpretable turbulence models. The existing SR methods, such as genetic programming, sparse regression, or artificial neural networks, require user-defined functional operators, a library of candidates, or complex optimization algorithms. In this work, a novel framework using LLMs to automatically discover algebraic expressions for correcting the RSM is proposed. The direct observation of Reynolds stress and the indirect output of the CFD simulation are both involved in the training process to guarantee data consistency and avoid numerical stiffness. Constraints of functional complexity and convergence are supplementally imposed in the objective function on account of the tremendous flexibility of LLMs. The evolutionary search is employed for global optimization. The proposed method is performed for separated flow over periodic hills at Re = 10,595. The generalizability of the discovered model is verified on a set of 2D turbulent separated flow configurations with different Reynolds numbers and geometries. It is demonstrated that the corrective RANS can improve the prediction for both the Reynolds stress and mean velocity fields. Compared with algebraic models discovered by other works, the discovered model performs better in accuracy and generalization capability. The proposed approach provides a promising paradigm for using LLMs to improve turbulence modeling for a given class of flows.
Abstract:Heuristics are commonly used to tackle diverse search and optimization problems. Design heuristics usually require tedious manual crafting with domain knowledge. Recent works have incorporated large language models (LLMs) into automatic heuristic search leveraging their powerful language and coding capacity. However, existing research focuses on the optimal performance on the target problem as the sole objective, neglecting other criteria such as efficiency and scalability, which are vital in practice. To tackle this challenge, we propose to model heuristic search as a multi-objective optimization problem and consider introducing other practical criteria beyond optimal performance. Due to the complexity of the search space, conventional multi-objective optimization methods struggle to effectively handle multi-objective heuristic search. We propose the first LLM-based multi-objective heuristic search framework, Multi-objective Evolution of Heuristic (MEoH), which integrates LLMs in a zero-shot manner to generate a non-dominated set of heuristics to meet multiple design criteria. We design a new dominance-dissimilarity mechanism for effective population management and selection, which incorporates both code dissimilarity in the search space and dominance in the objective space. MEoH is demonstrated in two well-known combinatorial optimization problems: the online Bin Packing Problem (BPP) and the Traveling Salesman Problem (TSP). Results indicate that a variety of elite heuristics are automatically generated in a single run, offering more trade-off options than existing methods. It successfully achieves competitive or superior performance while improving efficiency up to 10 times. Moreover, we also observe that the multi-objective search introduces novel insights into heuristic design and leads to the discovery of diverse heuristics.
Abstract:Multiobjective optimization problems (MOPs) are prevalent in machine learning, with applications in multi-task learning, learning under fairness or robustness constraints, etc. Instead of reducing multiple objective functions into a scalar objective, MOPs aim to optimize for the so-called Pareto optimality or Pareto set learning, which involves optimizing more than one objective function simultaneously, over models with millions of parameters. Existing benchmark libraries for MOPs mainly focus on evolutionary algorithms, most of which are zeroth-order methods that do not effectively utilize higher-order information from objectives and cannot scale to large-scale models with millions of parameters. In light of the above gap, this paper introduces LibMOON, the first multiobjective optimization library that supports state-of-the-art gradient-based methods, provides a fair benchmark, and is open-sourced for the community.
Abstract:Automated heuristic design (AHD) has gained considerable attention for its potential to automate the development of effective heuristics. The recent advent of large language models (LLMs) has paved a new avenue for AHD, with initial efforts focusing on framing AHD as an evolutionary program search (EPS) problem. However, inconsistent benchmark settings, inadequate baselines, and a lack of detailed component analysis have left the necessity of integrating LLMs with search strategies and the true progress achieved by existing LLM-based EPS methods to be inadequately justified. This work seeks to fulfill these research queries by conducting a large-scale benchmark comprising four LLM-based EPS methods and four AHD problems across nine LLMs and five independent runs. Our extensive experiments yield meaningful insights, providing empirical grounding for the importance of evolutionary search in LLM-based AHD approaches, while also contributing to the advancement of future EPS algorithmic development. To foster accessibility and reproducibility, we have fully open-sourced our benchmark and corresponding results.
Abstract:Neural Network-based active learning (NAL) is a cost-effective data selection technique that utilizes neural networks to select and train on a small subset of samples. While existing work successfully develops various effective or theory-justified NAL algorithms, the understanding of the two commonly used query criteria of NAL: uncertainty-based and diversity-based, remains in its infancy. In this work, we try to move one step forward by offering a unified explanation for the success of both query criteria-based NAL from a feature learning view. Specifically, we consider a feature-noise data model comprising easy-to-learn or hard-to-learn features disrupted by noise, and conduct analysis over 2-layer NN-based NALs in the pool-based scenario. We provably show that both uncertainty-based and diversity-based NAL are inherently amenable to one and the same principle, i.e., striving to prioritize samples that contain yet-to-be-learned features. We further prove that this shared principle is the key to their success-achieve small test error within a small labeled set. Contrastingly, the strategy-free passive learning exhibits a large test error due to the inadequate learning of yet-to-be-learned features, necessitating resort to a significantly larger label complexity for a sufficient test error reduction. Experimental results validate our findings.
Abstract:Multi-objective optimization can be found in many real-world applications where some conflicting objectives can not be optimized by a single solution. Existing optimization methods often focus on finding a set of Pareto solutions with different optimal trade-offs among the objectives. However, the required number of solutions to well approximate the whole Pareto optimal set could be exponentially large with respect to the number of objectives, which makes these methods unsuitable for handling many optimization objectives. In this work, instead of finding a dense set of Pareto solutions, we propose a novel Tchebycheff set scalarization method to find a few representative solutions (e.g., 5) to cover a large number of objectives (e.g., $>100$) in a collaborative and complementary manner. In this way, each objective can be well addressed by at least one solution in the small solution set. In addition, we further develop a smooth Tchebycheff set scalarization approach for efficient optimization with good theoretical guarantees. Experimental studies on different problems with many optimization objectives demonstrate the effectiveness of our proposed method.
Abstract:Neural combinatorial optimization (NCO) is a promising learning-based approach to solving various vehicle routing problems without much manual algorithm design. However, the current NCO methods mainly focus on the in-distribution performance, while the real-world problem instances usually come from different distributions. A costly fine-tuning approach or generalized model retraining from scratch could be needed to tackle the out-of-distribution instances. Unlike the existing methods, this work investigates an efficient prompt learning approach in NCO for cross-distribution adaptation. To be concrete, we propose a novel prompt learning method to facilitate fast zero-shot adaptation of a pre-trained model to solve routing problem instances from different distributions. The proposed model learns a set of prompts among various distributions and then selects the best-matched one to prompt a pre-trained attention model for each problem instance. Extensive experiments show that the proposed prompt learning approach facilitates the fast adaptation of pre-trained routing models. It also outperforms existing generalized models on both in-distribution prediction and zero-shot generalization to a diverse set of new tasks. Our code implementation is available online https://github.com/FeiLiu36/PromptVRP.