Abstract:Model merging, which combines multiple models into a single model, has gained increasing popularity in recent years. By efficiently integrating the capabilities of various models without their original training data, this significantly reduces the parameter count and memory usage. However, current methods can only produce one single merged model. This necessitates a performance trade-off due to conflicts among the various models, and the resultant one-size-fits-all model may not align with the preferences of different users who may prioritize certain models over others. To address this issue, we propose preference-aware model merging, and formulate this as a multi-objective optimization problem in which the performance of the merged model on each base model's task is treated as an objective. In only one merging process, the proposed parameter-efficient structure can generate the whole Pareto set of merged models, each representing the Pareto-optimal model for a given user-specified preference. Merged models can also be selected from the learned Pareto set that are tailored to different user preferences. Experimental results on a number of benchmark datasets demonstrate that the proposed preference-aware Pareto Merging can obtain a diverse set of trade-off models and outperforms state-of-the-art model merging baselines.
Abstract:Multi-task learning, which optimizes performance across multiple tasks, is inherently a multi-objective optimization problem. Various algorithms are developed to provide discrete trade-off solutions on the Pareto front. Recently, continuous Pareto front approximations using a linear combination of base networks have emerged as a compelling strategy. However, it suffers from scalability issues when the number of tasks is large. To address this issue, we propose a novel approach that integrates a main network with several low-rank matrices to efficiently learn the Pareto manifold. It significantly reduces the number of parameters and facilitates the extraction of shared features. We also introduce orthogonal regularization to further bolster performance. Extensive experimental results demonstrate that the proposed approach outperforms state-of-the-art baselines, especially on datasets with a large number of tasks.
Abstract:Translating literary works has perennially stood as an elusive dream in machine translation (MT), a journey steeped in intricate challenges. To foster progress in this domain, we hold a new shared task at WMT 2023, the first edition of the Discourse-Level Literary Translation. First, we (Tencent AI Lab and China Literature Ltd.) release a copyrighted and document-level Chinese-English web novel corpus. Furthermore, we put forth an industry-endorsed criteria to guide human evaluation process. This year, we totally received 14 submissions from 7 academia and industry teams. We employ both automatic and human evaluations to measure the performance of the submitted systems. The official ranking of the systems is based on the overall human judgments. In addition, our extensive analysis reveals a series of interesting findings on literary and discourse-aware MT. We release data, system outputs, and leaderboard at http://www2.statmt.org/wmt23/literary-translation-task.html.
Abstract:In this letter, we propose HV-Net, a new method for hypervolume approximation in evolutionary multi-objective optimization. The basic idea of HV-Net is to use DeepSets, a deep neural network with permutation invariant property, to approximate the hypervolume of a non-dominated solution set. The input of HV-Net is a non-dominated solution set in the objective space, and the output is an approximated hypervolume value of this solution set. The performance of HV-Net is evaluated through computational experiments by comparing it with two commonly-used hypervolume approximation methods (i.e., point-based method and line-based method). Our experimental results show that HV-Net outperforms the other two methods in terms of both the approximation error and the runtime, which shows the potential of using deep learning technique for hypervolume approximation.
Abstract:Subset selection is an important component in evolutionary multiobjective optimization (EMO) algorithms. Clustering, as a classic method to group similar data points together, has been used for subset selection in some fields. However, clustering-based methods have not been evaluated in the context of subset selection from solution sets obtained by EMO algorithms. In this paper, we first review some classic clustering algorithms. We also point out that another popular subset selection method, i.e., inverted generational distance (IGD)-based subset selection, can be viewed as clustering. Then, we perform a comprehensive experimental study to evaluate the performance of various clustering algorithms in different scenarios. Experimental results are analyzed in detail, and some suggestions about the use of clustering algorithms for subset selection are derived. Additionally, we demonstrate that decision maker's preference can be introduced to clustering-based subset selection.
Abstract:Hypervolume is widely used in the evolutionary multi-objective optimization (EMO) field to evaluate the quality of a solution set. For a solution set with $\mu$ solutions on a Pareto front, a larger hypervolume means a better solution set. Investigating the distribution of the solution set with the largest hypervolume is an important topic in EMO, which is the so-called hypervolume optimal $\mu$-distribution. Theoretical results have shown that the $\mu$ solutions are uniformly distributed on a linear Pareto front in two dimensions. However, the $\mu$ solutions are not always uniformly distributed on a single-line Pareto front in three dimensions. They are only uniform when the single-line Pareto front has one constant objective. In this paper, we further investigate the hypervolume optimal $\mu$-distribution in three dimensions. We consider the line- and plane-based Pareto fronts. For the line-based Pareto fronts, we extend the single-line Pareto front to two-line and three-line Pareto fronts, where each line has one constant objective. For the plane-based Pareto fronts, the linear triangular and inverted triangular Pareto fronts are considered. First, we show that the $\mu$ solutions are not always uniformly distributed on the line-based Pareto fronts. The uniformity depends on how the lines are combined. Then, we show that a uniform solution set on the plane-based Pareto front is not always optimal for hypervolume maximization. It is locally optimal with respect to a $(\mu+1)$ selection scheme. Our results can help researchers in the community to better understand and utilize the hypervolume indicator.
Abstract:Subset selection is an interesting and important topic in the field of evolutionary multi-objective optimization (EMO). Especially, in an EMO algorithm with an unbounded external archive, subset selection is an essential post-processing procedure to select a pre-specified number of solutions as the final result. In this paper, we discuss the efficiency of greedy subset selection for the hypervolume, IGD and IGD+ indicators. Greedy algorithms usually efficiently handle subset selection. However, when a large number of solutions are given (e.g., subset selection from tens of thousands of solutions in an unbounded external archive), they often become time-consuming. Our idea is to use the submodular property, which is known for the hypervolume indicator, to improve their efficiency. First, we prove that the IGD and IGD+ indicators are also submodular. Next, based on the submodular property, we propose an efficient greedy inclusion algorithm for each indicator. Then, we demonstrate through computational experiments that the proposed algorithms are much faster than the standard greedy subset selection algorithms.
Abstract:Subset selection is a popular topic in recent years and a number of subset selection methods have been proposed. Among those methods, hypervolume subset selection is widely used. Greedy hypervolume subset selection algorithms can achieve good approximations to the optimal subset. However, when the candidate set is large (e.g., an unbounded external archive with a large number of solutions), the algorithm is very time-consuming. In this paper, we propose a new lazy greedy algorithm exploiting the submodular property of the hypervolume indicator. The core idea is to avoid unnecessary hypervolume contribution calculation when finding the solution with the largest contribution. Experimental results show that the proposed algorithm is hundreds of times faster than the original greedy inclusion algorithm and several times faster than the fastest known greedy inclusion algorithm on many test problems.
Abstract:Recently, the discretization of decision and objective spaces has been discussed in the literature. In some studies, it is shown that the decision space discretization improves the performance of evolutionary multi-objective optimization (EMO) algorithms on continuous multi-objective test problems. In other studies, it is shown that the objective space discretization improves the performance on combinatorial multi-objective problems. However, the effect of the simultaneous discretization of both spaces has not been examined in the literature. In this paper, we examine the effects of the decision space discretization, objective space discretization and simultaneous discretization on the performance of NSGA-II through computational experiments on the DTLZ and WFG problems. Using various settings about the number of decision variables and the number of objectives, our experiments are performed on four types of problems: standard problems, large-scale problems, many-objective problems, and large-scale many-objective problems. We show that the decision space discretization has a positive effect for large-scale problems and the objective space discretization has a positive effect for many-objective problems. We also show the discretization of both spaces is useful for large-scale many-objective problems.
Abstract:The increasing popularity of e-learning has created demand for improving online education through techniques such as predictive analytics and content recommendations. In this paper, we study learner outcome predictions, i.e., predictions of how they will perform at the end of a course. We propose a novel Two Branch Decision Network for performance prediction that incorporates two important factors: how learners progress through the course and how the content progresses through the course. We combine clickstream features which log every action the learner takes while learning, and textual features which are generated through pre-trained GloVe word embeddings. To assess the performance of our proposed network, we collect data from a short online course designed for corporate training and evaluate both neural network and non-neural network based algorithms on it. Our proposed algorithm achieves 95.7% accuracy and 0.958 AUC score, which outperforms all other models. The results also indicate the combination of behavior features and text features are more predictive than behavior features only and neural network models are powerful in capturing the joint relationship between user behavior and course content.