Abstract:This paper studies the problem of stochastic bilevel optimization where the upper-level function is nonconvex with potentially unbounded smoothness and the lower-level function is strongly convex. This problem is motivated by meta-learning applied to sequential data, such as text classification using recurrent neural networks, where the smoothness constant of the upper-level loss function scales linearly with the gradient norm and can be potentially unbounded. Existing algorithm crucially relies on the nested loop design, which requires significant tuning efforts and is not practical. In this paper, we address this issue by proposing a Single Loop bIlevel oPtimizer (SLIP). The proposed algorithm first updates the lower-level variable by a few steps of stochastic gradient descent, and then simultaneously updates the upper-level variable by normalized stochastic gradient descent with momentum and the lower-level variable by stochastic gradient descent. Under standard assumptions, we show that our algorithm finds an $\epsilon$-stationary point within $\widetilde{O}(1/\epsilon^4)$\footnote{Here $\widetilde{O}(\cdot)$ compresses logarithmic factors of $1/\epsilon$ and $1/\delta$, where $\delta\in(0,1)$ denotes the failure probability.} oracle calls of stochastic gradient or Hessian-vector product, both in expectation and with high probability. This complexity result is nearly optimal up to logarithmic factors without mean-square smoothness of the stochastic gradient oracle. Our proof relies on (i) a refined characterization and control of the lower-level variable and (ii) establishing a novel connection between bilevel optimization and stochastic optimization under distributional drift. Our experiments on various tasks show that our algorithm significantly outperforms strong baselines in bilevel optimization.
Abstract:Image quality assessment (IQA) of user-generated content (UGC) is a critical technique for human quality of experience (QoE). However, for robot-generated content (RGC), will its image quality be consistent with the Moravec paradox and counter to human common sense? Human subjective scoring is more based on the attractiveness of the image. Embodied agent are required to interact and perceive in the environment, and finally perform specific tasks. Visual images as inputs directly influence downstream tasks. In this paper, we first propose an embodied image quality assessment (EIQA) frameworks. We establish assessment metrics for input images based on the downstream tasks of robot. In addition, we construct an Embodied Preference Database (EPD) containing 5,000 reference and distorted image annotations. The performance of mainstream IQA algorithms on EPD dataset is finally verified. The experiments demonstrate that quality assessment of embodied images is different from that of humans. We sincerely hope that the EPD can contribute to the development of embodied AI by focusing on image quality assessment. The benchmark is available at https://github.com/Jianbo-maker/EPD_benchmark.
Abstract:The distinct characteristics of multiomics data, including complex interactions within and across biological layers and disease heterogeneity (e.g., heterogeneity in etiology and clinical symptoms), drive us to develop novel designs to address unique challenges in multiomics prediction. In this paper, we propose the multi-view knowledge transfer learning (MVKTrans) framework, which transfers intra- and inter-omics knowledge in an adaptive manner by reviewing data heterogeneity and suppressing bias transfer, thereby enhancing classification performance. Specifically, we design a graph contrastive module that is trained on unlabeled data to effectively learn and transfer the underlying intra-omics patterns to the supervised task. This unsupervised pretraining promotes learning general and unbiased representations for each modality, regardless of the downstream tasks. In light of the varying discriminative capacities of modalities across different diseases and/or samples, we introduce an adaptive and bi-directional cross-omics distillation module. This module automatically identifies richer modalities and facilitates dynamic knowledge transfer from more informative to less informative omics, thereby enabling a more robust and generalized integration. Extensive experiments on four real biomedical datasets demonstrate the superior performance and robustness of MVKTrans compared to the state-of-the-art. Code and data are available at https://github.com/Yaolab-fantastic/MVKTrans.
Abstract:With just a few speech samples, it is possible to perfectly replicate a speaker's voice in recent years, while malicious voice exploitation (e.g., telecom fraud for illegal financial gain) has brought huge hazards in our daily lives. Therefore, it is crucial to protect publicly accessible speech data that contains sensitive information, such as personal voiceprints. Most previous defense methods have focused on spoofing speaker verification systems in timbre similarity but the synthesized deepfake speech is still of high quality. In response to the rising hazards, we devise an effective, transferable, and robust proactive protection technology named Pivotal Objective Perturbation (POP) that applies imperceptible error-minimizing noises on original speech samples to prevent them from being effectively learned for text-to-speech (TTS) synthesis models so that high-quality deepfake speeches cannot be generated. We conduct extensive experiments on state-of-the-art (SOTA) TTS models utilizing objective and subjective metrics to comprehensively evaluate our proposed method. The experimental results demonstrate outstanding effectiveness and transferability across various models. Compared to the speech unclarity score of 21.94% from voice synthesizers trained on samples without protection, POP-protected samples significantly increase it to 127.31%. Moreover, our method shows robustness against noise reduction and data augmentation techniques, thereby greatly reducing potential hazards.
Abstract:This paper investigates a class of stochastic bilevel optimization problems where the upper-level function is nonconvex with potentially unbounded smoothness and the lower-level problem is strongly convex. These problems have significant applications in sequential data learning, such as text classification using recurrent neural networks. The unbounded smoothness is characterized by the smoothness constant of the upper-level function scaling linearly with the gradient norm, lacking a uniform upper bound. Existing state-of-the-art algorithms require $\widetilde{O}(1/\epsilon^4)$ oracle calls of stochastic gradient or Hessian/Jacobian-vector product to find an $\epsilon$-stationary point. However, it remains unclear if we can further improve the convergence rate when the assumptions for the function in the population level also hold for each random realization almost surely (e.g., Lipschitzness of each realization of the stochastic gradient). To address this issue, we propose a new Accelerated Bilevel Optimization algorithm named AccBO. The algorithm updates the upper-level variable by normalized stochastic gradient descent with recursive momentum and the lower-level variable by the stochastic Nesterov accelerated gradient descent algorithm with averaging. We prove that our algorithm achieves an oracle complexity of $\widetilde{O}(1/\epsilon^3)$ to find an $\epsilon$-stationary point. Our proof relies on a novel lemma characterizing the dynamics of stochastic Nesterov accelerated gradient descent algorithm under distribution drift with high probability for the lower-level variable, which is of independent interest and also plays a crucial role in analyzing the hypergradient estimation error over time. Experimental results on various tasks confirm that our proposed algorithm achieves the predicted theoretical acceleration and significantly outperforms baselines in bilevel optimization.
Abstract:This paper proposes a weight-aware deep reinforcement learning (WADRL) approach designed to address the multiobjective vehicle routing problem with time windows (MOVRPTW), aiming to use a single deep reinforcement learning (DRL) model to solve the entire multiobjective optimization problem. The Non-dominated sorting genetic algorithm-II (NSGA-II) method is then employed to optimize the outcomes produced by the WADRL, thereby mitigating the limitations of both approaches. Firstly, we design an MOVRPTW model to balance the minimization of travel cost and the maximization of customer satisfaction. Subsequently, we present a novel DRL framework that incorporates a transformer-based policy network. This network is composed of an encoder module, a weight embedding module where the weights of the objective functions are incorporated, and a decoder module. NSGA-II is then utilized to optimize the solutions generated by WADRL. Finally, extensive experimental results demonstrate that our method outperforms the existing and traditional methods. Due to the numerous constraints in VRPTW, generating initial solutions of the NSGA-II algorithm can be time-consuming. However, using solutions generated by the WADRL as initial solutions for NSGA-II significantly reduces the time required for generating initial solutions. Meanwhile, the NSGA-II algorithm can enhance the quality of solutions generated by WADRL, resulting in solutions with better scalability. Notably, the weight-aware strategy significantly reduces the training time of DRL while achieving better results, enabling a single DRL model to solve the entire multiobjective optimization problem.
Abstract:Multi-hop question answering (MHQA) requires a model to retrieve and integrate information from multiple passages to answer a complex question. Recent systems leverage the power of large language models and integrate evidence retrieval with reasoning prompts (e.g., chain-of-thought reasoning) for the MHQA task. However, the complexities in the question types (bridge v.s. comparison questions) and the reasoning types (sequential v.s. parallel reasonings) require more novel and fine-grained prompting methods to enhance the performance of MHQA under the zero-shot setting. In this paper, we propose STOC-TOT, a stochastic tree-of-thought reasoning prompting method with constrained decoding for MHQA and conduct a detailed comparison with other reasoning prompts on different question types and reasoning types. Specifically, we construct a tree-like reasoning structure by prompting the model to break down the original question into smaller sub-questions to form different reasoning paths. In addition, we prompt the model to provide a probability estimation for each reasoning path at each reasoning step. At answer time, we conduct constrained decoding on the model to generate more grounded answers and reduce hallucination. Experiments comparing STOC-TOT with two MHQA datasets and five large language models showed that our framework outperforms other reasoning prompts by a significant margin.
Abstract:In light of the widespread application of Automatic Speech Recognition (ASR) systems, their security concerns have received much more attention than ever before, primarily due to the susceptibility of Deep Neural Networks. Previous studies have illustrated that surreptitiously crafting adversarial perturbations enables the manipulation of speech recognition systems, resulting in the production of malicious commands. These attack methods mostly require adding noise perturbations under $\ell_p$ norm constraints, inevitably leaving behind artifacts of manual modifications. Recent research has alleviated this limitation by manipulating style vectors to synthesize adversarial examples based on Text-to-Speech (TTS) synthesis audio. However, style modifications based on optimization objectives significantly reduce the controllability and editability of audio styles. In this paper, we propose an attack on ASR systems based on user-customized style transfer. We first test the effect of Style Transfer Attack (STA) which combines style transfer and adversarial attack in sequential order. And then, as an improvement, we propose an iterative Style Code Attack (SCA) to maintain audio quality. Experimental results show that our method can meet the need for user-customized styles and achieve a success rate of 82% in attacks, while keeping sound naturalness due to our user study.
Abstract:Bilevel optimization is an important formulation for many machine learning problems. Current bilevel optimization algorithms assume that the gradient of the upper-level function is Lipschitz. However, recent studies reveal that certain neural networks such as recurrent neural networks (RNNs) and long-short-term memory networks (LSTMs) exhibit potential unbounded smoothness, rendering conventional bilevel optimization algorithms unsuitable. In this paper, we design a new bilevel optimization algorithm, namely BO-REP, to address this challenge. This algorithm updates the upper-level variable using normalized momentum and incorporates two novel techniques for updating the lower-level variable: \textit{initialization refinement} and \textit{periodic updates}. Specifically, once the upper-level variable is initialized, a subroutine is invoked to obtain a refined estimate of the corresponding optimal lower-level variable, and the lower-level variable is updated only after every specific period instead of each iteration. When the upper-level problem is nonconvex and unbounded smooth, and the lower-level problem is strongly convex, we prove that our algorithm requires $\widetilde{\mathcal{O}}(1/\epsilon^4)$ iterations to find an $\epsilon$-stationary point in the stochastic setting, where each iteration involves calling a stochastic gradient or Hessian-vector product oracle. Notably, this result matches the state-of-the-art complexity results under the bounded smoothness setting and without mean-squared smoothness of the stochastic gradient, up to logarithmic factors. Our proof relies on novel technical lemmas for the periodically updated lower-level variable, which are of independent interest. Our experiments on hyper-representation learning, hyperparameter optimization, and data hyper-cleaning for text classification tasks demonstrate the effectiveness of our proposed algorithm.
Abstract:Due to shortage of water resources and increasing water demands, the joint operation of multireservoir systems for balancing power generation, ecological protection, and the residential water supply has become a critical issue in hydropower management. However, the numerous constraints and nonlinearity of multiple reservoirs make solving this problem time-consuming. To address this challenge, a deep reinforcement learning approach that incorporates a transformer framework is proposed. The multihead attention mechanism of the encoder effectively extracts information from reservoirs and residential areas, and the multireservoir attention network of the decoder generates suitable operational decisions. The proposed method is applied to Lake Mead and Lake Powell in the Colorado River Basin. The experimental results demonstrate that the transformer-based deep reinforcement learning approach can produce appropriate operational outcomes. Compared to a state-of-the-art method, the operation strategies produced by the proposed approach generate 10.11% more electricity, reduce the amended annual proportional flow deviation by 39.69%, and increase water supply revenue by 4.10%. Consequently, the proposed approach offers an effective method for the multiobjective operation of multihydropower reservoir systems.