Abstract:The Lipschitz bandit is a key variant of stochastic bandit problems where the expected reward function satisfies a Lipschitz condition with respect to an arm metric space. With its wide-ranging practical applications, various Lipschitz bandit algorithms have been developed, achieving the cumulative regret lower bound of order $\tilde O(T^{(d_z+1)/(d_z+2)})$ over time horizon $T$. Motivated by recent advancements in quantum computing and the demonstrated success of quantum Monte Carlo in simpler bandit settings, we introduce the first quantum Lipschitz bandit algorithms to address the challenges of continuous action spaces and non-linear reward functions. Specifically, we first leverage the elimination-based framework to propose an efficient quantum Lipschitz bandit algorithm named Q-LAE. Next, we present novel modifications to the classical Zooming algorithm, which results in a simple quantum Lipschitz bandit method, Q-Zooming. Both algorithms exploit the computational power of quantum methods to achieve an improved regret bound of $\tilde O(T^{d_z/(d_z+1)})$. Comprehensive experiments further validate our improved theoretical findings, demonstrating superior empirical performance compared to existing Lipschitz bandit methods.
Abstract:We address the challenge of offline reinforcement learning using realistic data, specifically non-expert data collected through sub-optimal behavior policies. Under such circumstance, the learned policy must be safe enough to manage distribution shift while maintaining sufficient flexibility to deal with non-expert (bad) demonstrations from offline data.To tackle this issue, we introduce a novel method called Outcome-Driven Action Flexibility (ODAF), which seeks to reduce reliance on the empirical action distribution of the behavior policy, hence reducing the negative impact of those bad demonstrations.To be specific, a new conservative reward mechanism is developed to deal with distribution shift by evaluating actions according to whether their outcomes meet safety requirements - remaining within the state support area, rather than solely depending on the actions' likelihood based on offline data.Besides theoretical justification, we provide empirical evidence on widely used MuJoCo and various maze benchmarks, demonstrating that our ODAF method, implemented using uncertainty quantification techniques, effectively tolerates unseen transitions for improved "trajectory stitching," while enhancing the agent's ability to learn from realistic non-expert data.
Abstract:There are many deep learning (DL) powered mobile and wearable applications today continuously and unobtrusively sensing the ambient surroundings to enhance all aspects of human lives.To enable robust and private mobile sensing, DL models are often deployed locally on resource-constrained mobile devices using techniques such as model compression or offloading.However, existing methods, either front-end algorithm level (i.e. DL model compression/partitioning) or back-end scheduling level (i.e. operator/resource scheduling), cannot be locally online because they require offline retraining to ensure accuracy or rely on manually pre-defined strategies, struggle with dynamic adaptability.The primary challenge lies in feeding back runtime performance from the back-end level to the front-end level optimization decision. Moreover, the adaptive mobile DL model porting middleware with cross-level co-adaptation is less explored, particularly in mobile environments with diversity and dynamics. In response, we introduce CrowdHMTware, a dynamic context-adaptive DL model deployment middleware for heterogeneous mobile devices. It establishes an automated adaptation loop between cross-level functional components, i.e. elastic inference, scalable offloading, and model-adaptive engine, enhancing scalability and adaptability. Experiments with four typical tasks across 15 platforms and a real-world case study demonstrate that CrowdHMTware can effectively scale DL model, offloading, and engine actions across diverse platforms and tasks. It hides run-time system issues from developers, reducing the required developer expertise.
Abstract:We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.8 trillion diverse and high-quality tokens, followed by Supervised Fine-Tuning and Reinforcement Learning stages to fully harness its capabilities. Comprehensive evaluations reveal that DeepSeek-V3 outperforms other open-source models and achieves performance comparable to leading closed-source models. Despite its excellent performance, DeepSeek-V3 requires only 2.788M H800 GPU hours for its full training. In addition, its training process is remarkably stable. Throughout the entire training process, we did not experience any irrecoverable loss spikes or perform any rollbacks. The model checkpoints are available at https://github.com/deepseek-ai/DeepSeek-V3.
Abstract:Deep learning-based image denoising models demonstrate remarkable performance, but their lack of robustness analysis remains a significant concern. A major issue is that these models are susceptible to adversarial attacks, where small, carefully crafted perturbations to input data can cause them to fail. Surprisingly, perturbations specifically crafted for one model can easily transfer across various models, including CNNs, Transformers, unfolding models, and plug-and-play models, leading to failures in those models as well. Such high adversarial transferability is not observed in classification models. We analyze the possible underlying reasons behind the high adversarial transferability through a series of hypotheses and validation experiments. By characterizing the manifolds of Gaussian noise and adversarial perturbations using the concept of typical set and the asymptotic equipartition property, we prove that adversarial samples deviate slightly from the typical set of the original input distribution, causing the models to fail. Based on these insights, we propose a novel adversarial defense method: the Out-of-Distribution Typical Set Sampling Training strategy (TS). TS not only significantly enhances the model's robustness but also marginally improves denoising performance compared to the original model.
Abstract:Deep learning is reshaping mobile applications, with a growing trend of deploying deep neural networks (DNNs) directly to mobile and embedded devices to address real-time performance and privacy. To accommodate local resource limitations, techniques like weight compression, convolution decomposition, and specialized layer architectures have been developed. However, the \textit{dynamic} and \textit{diverse} deployment contexts of mobile devices pose significant challenges. Adapting deep models to meet varied device-specific requirements for latency, accuracy, memory, and energy is labor-intensive. Additionally, changing processor states, fluctuating memory availability, and competing processes frequently necessitate model re-compression to preserve user experience. To address these issues, we introduce AdaScale, an elastic inference framework that automates the adaptation of deep models to dynamic contexts. AdaScale leverages a self-evolutionary model to streamline network creation, employs diverse compression operator combinations to reduce the search space and improve outcomes, and integrates a resource availability awareness block and performance profilers to establish an automated adaptation loop. Our experiments demonstrate that AdaScale significantly enhances accuracy by 5.09%, reduces training overhead by 66.89%, speeds up inference latency by 1.51 to 6.2 times, and lowers energy costs by 4.69 times.
Abstract:Multiplicative Gamma noise remove is a critical research area in the application of synthetic aperture radar (SAR) imaging, where neural networks serve as a potent tool. However, real-world data often diverges from theoretical models, exhibiting various disturbances, which makes the neural network less effective. Adversarial attacks work by finding perturbations that significantly disrupt functionality of neural networks, as the inherent instability of neural networks makes them highly susceptible. A network designed to withstand such extreme cases can more effectively mitigate general disturbances in real SAR data. In this work, the dissipative nature of diffusion equations is employed to underpin a novel approach for countering adversarial attacks and improve the resistance of real noise disturbance. We propose a tunable, regularized neural network that unrolls a denoising unit and a regularization unit into a single network for end-to-end training. In the network, the denoising unit and the regularization unit are composed of the denoising network and the simplest linear diffusion equation respectively. The regularization unit enhances network stability, allowing post-training time step adjustments to effectively mitigate the adverse impacts of adversarial attacks. The stability and convergence of our model are theoretically proven, and in the experiments, we compare our model with several state-of-the-art denoising methods on simulated images, adversarial samples, and real SAR images, yielding superior results in both quantitative and visual evaluations.
Abstract:SLAM is a fundamental capability of unmanned systems, with LiDAR-based SLAM gaining widespread adoption due to its high precision. Current SLAM systems can achieve centimeter-level accuracy within a short period. However, there are still several challenges when dealing with largescale mapping tasks including significant storage requirements and difficulty of reusing the constructed maps. To address this, we first design an elastic and lightweight map representation called CELLmap, composed of several CELLs, each representing the local map at the corresponding location. Then, we design a general backend including CELL-based bidirectional registration module and loop closure detection module to improve global map consistency. Our experiments have demonstrated that CELLmap can represent the precise geometric structure of large-scale maps of KITTI dataset using only about 60 MB. Additionally, our general backend achieves up to a 26.88% improvement over various LiDAR odometry methods.
Abstract:Image/video coding has been a remarkable research area for both academia and industry for many years. Testing datasets, especially high-quality image/video datasets are desirable for the justified evaluation of coding-related research, practical applications, and standardization activities. We put forward a test dataset namely USTC-TD, which has been successfully adopted in the practical end-to-end image/video coding challenge of the IEEE International Conference on Visual Communications and Image Processing in 2022 and 2023. USTC-TD contains 40 images at 4K spatial resolution and 10 video sequences at 1080p spatial resolution, featuring various content due to the diverse environmental factors (scene type, texture, motion, view) and the designed imaging factors (illumination, shadow, lens). We quantitatively evaluate USTC-TD on different image/video features (spatial, temporal, color, lightness), and compare it with the previous image/video test datasets, which verifies the wider coverage and more diversity of the proposed dataset. We also evaluate both classic standardized and recent learned image/video coding schemes on USTC-TD with PSNR and MS-SSIM, and provide an extensive benchmark for the evaluated schemes. Based on the characteristics and specific design of the proposed test dataset, we analyze the benchmark performance and shed light on the future research and development of image/video coding. All the data are released online: https://esakak.github.io/USTC-TD.
Abstract:Efficient and high-fidelity polarization demosaicking is critical for industrial applications of the division of focal plane (DoFP) polarization imaging systems. However, existing methods have an unsatisfactory balance of speed, accuracy, and complexity. This study introduces a novel polarization demosaicking algorithm that interpolates within a three-stage basic demosaicking framework to obtain DoFP images. Our method incorporates a DoFP low-cost edge-aware technique (DLE) to guide the interpolation process. Furthermore, the inter-channel correlation is used to calibrate the initial estimate in the polarization difference domain. The proposed algorithm is available in both a lightweight and a full version, tailored to different application requirements. Experiments on simulated and real DoFP images demonstrate that our two methods have the highest interpolation accuracy and speed, respectively, and significantly enhance the visuals. Both versions efficiently process a 1024*1024 image on an AMD Ryzen 5600X CPU in 0.1402s and 0.2693s, respectively. Additionally, since our methods only involve computational processes within a 5*5 window, the potential for parallel acceleration on GPUs or FPGAs is highly feasible.