Abstract:Current diffusion-based super-resolution (SR) approaches achieve commendable performance at the cost of high inference overhead. Therefore, distillation techniques are utilized to accelerate the multi-step teacher model into one-step student model. Nevertheless, these methods significantly raise training costs and constrain the performance of the student model by the teacher model. To overcome these tough challenges, we propose Consistency Trajectory Matching for Super-Resolution (CTMSR), a distillation-free strategy that is able to generate photo-realistic SR results in one step. Concretely, we first formulate a Probability Flow Ordinary Differential Equation (PF-ODE) trajectory to establish a deterministic mapping from low-resolution (LR) images with noise to high-resolution (HR) images. Then we apply the Consistency Training (CT) strategy to directly learn the mapping in one step, eliminating the necessity of pre-trained diffusion model. To further enhance the performance and better leverage the ground-truth during the training process, we aim to align the distribution of SR results more closely with that of the natural images. To this end, we propose to minimize the discrepancy between their respective PF-ODE trajectories from the LR image distribution by our meticulously designed Distribution Trajectory Matching (DTM) loss, resulting in improved realism of our recovered HR images. Comprehensive experimental results demonstrate that the proposed methods can attain comparable or even superior capabilities on both synthetic and real datasets while maintaining minimal inference latency.
Abstract:Transformer-based methods have achieved remarkable results in image super-resolution tasks because they can capture non-local dependencies in low-quality input images. However, this feature-intensive modeling approach is computationally expensive because it calculates the similarities between numerous features that are irrelevant to the query features when obtaining attention weights. These unnecessary similarity calculations not only degrade the reconstruction performance but also introduce significant computational overhead. How to accurately identify the features that are important to the current query features and avoid similarity calculations between irrelevant features remains an urgent problem. To address this issue, we propose a novel and effective Progressive Focused Transformer (PFT) that links all isolated attention maps in the network through Progressive Focused Attention (PFA) to focus attention on the most important tokens. PFA not only enables the network to capture more critical similar features, but also significantly reduces the computational cost of the overall network by filtering out irrelevant features before calculating similarities. Extensive experiments demonstrate the effectiveness of the proposed method, achieving state-of-the-art performance on various single image super-resolution benchmarks.
Abstract:The sensitivity of machine learning algorithms to outliers, particularly in high-dimensional spaces, necessitates the development of robust methods. Within the framework of $\epsilon$-contamination model, where the adversary can inspect and replace up to $\epsilon$ fraction of the samples, a fundamental open question is determining the optimal rates for robust stochastic convex optimization (robust SCO), provided the samples under $\epsilon$-contamination. We develop novel algorithms that achieve minimax-optimal excess risk (up to logarithmic factors) under the $\epsilon$-contamination model. Our approach advances beyonds existing algorithms, which are not only suboptimal but also constrained by stringent requirements, including Lipschitzness and smoothness conditions on sample functions.Our algorithms achieve optimal rates while removing these restrictive assumptions, and notably, remain effective for nonsmooth but Lipschitz population risks.
Abstract:The advent of ultra-massive multiple-input-multiple output systems holds great promise for next-generation communications, yet their channels exhibit hybrid far- and near- field beam-squint (HFBS) effect. In this paper, we not only overcome but also harness the HFBS effect to propose an integrated location sensing and communication (ILSC) framework. During the uplink training stage, user terminals (UTs) transmit reference signals for simultaneous channel estimation and location sensing. This stage leverages an elaborately designed hybrid-field projection matrix to overcome the HFBS effect and estimate the channel in compressive manner. Subsequently, the scatterers' locations can be sensed from the spherical wavefront based on the channel estimation results. By treating the sensed scatterers as virtual anchors, we employ a weighted least-squares approach to derive UT' s location. Moreover, we propose an iterative refinement mechanism, which utilizes the accurately estimated time difference of arrival of multipath components to enhance location sensing precision. In the following downlink data transmission stage, we leverage the acquired location information to further optimize the hybrid beamformer, which combines the beam broadening and focusing to mitigate the spectral efficiency degradation resulted from the HFBS effect. Extensive simulation experiments demonstrate that the proposed ILSC scheme has superior location sensing and communication performance than conventional methods.
Abstract:With more autonomous vehicles (AVs) sharing roadways with human-driven vehicles (HVs), ensuring safe and courteous maneuvers that respect HVs' behavior becomes increasingly important. To promote both safety and courtesy in AV's behavior, an extension of Control Barrier Functions (CBFs)-inspired risk evaluation framework is proposed in this paper by considering both noisy observed positions and velocities of surrounding vehicles. The perceived risk by the ego vehicle can be visualized as a risk map that reflects the understanding of the surrounding environment and thus shows the potential for facilitating safe and courteous driving. By incorporating the risk evaluation framework into the Model Predictive Control (MPC) scheme, we propose a Courteous MPC for ego AV to generate courteous behaviors that 1) reduce the overall risk imposed on other vehicles and 2) respect the hard safety constraints and the original objective for efficiency. We demonstrate the performance of the proposed Courteous MPC via theoretical analysis and simulation experiments.
Abstract:Along with the prosperity of generative artificial intelligence (AI), its potential for solving conventional challenges in wireless communications has also surfaced. Inspired by this trend, we investigate the application of the advanced diffusion models (DMs), a representative class of generative AI models, to high dimensional wireless channel estimation. By capturing the structure of multiple-input multiple-output (MIMO) wireless channels via a deep generative prior encoded by DMs, we develop a novel posterior inference method for channel reconstruction. We further adapt the proposed method to recover channel information from low-resolution quantized measurements. Additionally, to enhance the over-the-air viability, we integrate the DM with the unsupervised Stein's unbiased risk estimator to enable learning from noisy observations and circumvent the requirements for ground truth channel data that is hardly available in practice. Results reveal that the proposed estimator achieves high-fidelity channel recovery while reducing estimation latency by a factor of 10 compared to state-of-the-art schemes, facilitating real-time implementation. Moreover, our method outperforms existing estimators while reducing the pilot overhead by half, showcasing its scalability to ultra-massive antenna arrays.
Abstract:We present foundation language models developed to power Apple Intelligence features, including a ~3 billion parameter model designed to run efficiently on devices and a large server-based language model designed for Private Cloud Compute. These models are designed to perform a wide range of tasks efficiently, accurately, and responsibly. This report describes the model architecture, the data used to train the model, the training process, how the models are optimized for inference, and the evaluation results. We highlight our focus on Responsible AI and how the principles are applied throughout the model development.
Abstract:Massive multiple-input multiple-output (MIMO) technology has significantly enhanced spectral and power efficiency in cellular communications and is expected to further evolve towards extra-large-scale MIMO. However, centralized processing for massive MIMO faces practical obstacles, including excessive computational complexity and a substantial volume of baseband data to be exchanged. To address these challenges, decentralized baseband processing has emerged as a promising solution. This approach involves partitioning the antenna array into clusters with dedicated computing hardware for parallel processing. In this paper, we investigate the gradient-based Markov chain Monte Carlo (MCMC) method -- an advanced MIMO detection technique known for its near-optimal performance in centralized implementation -- within the context of a decentralized baseband processing architecture. This decentralized design mitigates the computation burden at a single processing unit by utilizing computational resources in a distributed and parallel manner. Additionally, we integrate the mini-batch stochastic gradient descent method into the proposed decentralized detector, achieving remarkable performance with high efficiency. Simulation results demonstrate substantial performance gains of the proposed method over existing decentralized detectors across various scenarios. Moreover, complexity analysis reveals the advantages of the proposed decentralized strategy in terms of computation delay and interconnection bandwidth when compared to conventional centralized detectors.
Abstract:We revisit the problem of federated learning (FL) with private data from people who do not trust the server or other silos/clients. In this context, every silo (e.g. hospital) has data from several people (e.g. patients) and needs to protect the privacy of each person's data (e.g. health records), even if the server and/or other silos try to uncover this data. Inter-Silo Record-Level Differential Privacy (ISRL-DP) prevents each silo's data from being leaked, by requiring that silo i's communications satisfy item-level differential privacy. Prior work arXiv:2203.06735 characterized the optimal excess risk bounds for ISRL-DP algorithms with homogeneous (i.i.d.) silo data and convex loss functions. However, two important questions were left open: (1) Can the same excess risk bounds be achieved with heterogeneous (non-i.i.d.) silo data? (2) Can the optimal risk bounds be achieved with fewer communication rounds? In this paper, we give positive answers to both questions. We provide novel ISRL-DP FL algorithms that achieve the optimal excess risk bounds in the presence of heterogeneous silo data. Moreover, our algorithms are more communication-efficient than the prior state-of-the-art. For smooth loss functions, our algorithm achieves the optimal excess risk bound and has communication complexity that matches the non-private lower bound. Additionally, our algorithms are more computationally efficient than the previous state-of-the-art.
Abstract:The discrete nature of transmitted symbols poses challenges for achieving optimal detection in multiple-input multiple-output (MIMO) systems associated with a large number of antennas. Recently, the combination of two powerful machine learning methods, Markov chain Monte Carlo (MCMC) sampling and gradient descent, has emerged as a highly efficient solution to address this issue. However, existing gradient-based MCMC detectors are heuristically designed and thus are theoretically untenable. To bridge this gap, we introduce a novel sampling algorithm tailored for discrete spaces. This algorithm leverages gradients from the underlying continuous spaces for acceleration while maintaining the validity of probabilistic sampling. We prove the convergence of this method and also analyze its convergence rate using both MCMC theory and empirical diagnostics. On this basis, we develop a MIMO detector that precisely samples from the target discrete distribution and generates posterior Bayesian estimates using these samples, whose performance is thereby theoretically guaranteed. Furthermore, our proposed detector is highly parallelizable and scalable to large MIMO dimensions, positioning it as a compelling candidate for next-generation wireless networks. Simulation results show that our detector achieves near-optimal performance, significantly outperforms state-of-the-art baselines, and showcases resilience to various system setups.