Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Gong

Towards Understanding the Optimization Mechanisms in Deep Learning

Mar 29, 2025

Binchuan Qi, Wei Gong, Li Li

Abstract:In this paper, we adopt a probability distribution estimation perspective to explore the optimization mechanisms of supervised classification using deep neural networks. We demonstrate that, when employing the Fenchel-Young loss, despite the non-convex nature of the fitting error with respect to the model's parameters, global optimal solutions can be approximated by simultaneously minimizing both the gradient norm and the structural error. The former can be controlled through gradient descent algorithms. For the latter, we prove that it can be managed by increasing the number of parameters and ensuring parameter independence, thereby providing theoretical insights into mechanisms such as over-parameterization and random initialization. Ultimately, the paper validates the key conclusions of the proposed method through empirical results, illustrating its practical effectiveness.

Via

Access Paper or Ask Questions

Injecting Imbalance Sensitivity for Multi-Task Learning

Mar 11, 2025

Zhipeng Zhou, Liu Liu, Peilin Zhao, Wei Gong

Abstract:Multi-task learning (MTL) has emerged as a promising approach for deploying deep learning models in real-life applications. Recent studies have proposed optimization-based learning paradigms to establish task-shared representations in MTL. However, our paper empirically argues that these studies, specifically gradient-based ones, primarily emphasize the conflict issue while neglecting the potentially more significant impact of imbalance/dominance in MTL. In line with this perspective, we enhance the existing baseline method by injecting imbalance-sensitivity through the imposition of constraints on the projected norms. To demonstrate the effectiveness of our proposed IMbalance-sensitive Gradient (IMGrad) descent method, we evaluate it on multiple mainstream MTL benchmarks, encompassing supervised learning tasks as well as reinforcement learning. The experimental results consistently demonstrate competitive performance.

* 9 pages, 6 figures, 4 tables

Via

Access Paper or Ask Questions

LT-DARTS: An Architectural Approach to Enhance Deep Long-Tailed Learning

Nov 13, 2024

Yuhan Pan, Yanan Sun, Wei Gong

Abstract:Deep long-tailed recognition has been widely studied to address the issue of imbalanced data distributions in real-world scenarios. However, there has been insufficient focus on the design of neural architectures, despite empirical evidence suggesting that architecture can significantly impact performance. In this paper, we attempt to mitigate long-tailed issues through architectural improvements. To simplify the design process, we utilize Differential Architecture Search (DARTS) to achieve this goal. Unfortunately, existing DARTS methods struggle to perform well in long-tailed scenarios. To tackle this challenge, we introduce Long-Tailed Differential Architecture Search (LT-DARTS). Specifically, we conduct extensive experiments to explore architectural components that demonstrate better performance on long-tailed data and propose a new search space based on our observations. This ensures that the architecture obtained through our search process incorporates superior components. Additionally, we propose replacing the learnable linear classifier with an Equiangular Tight Frame (ETF) classifier to further enhance our method. This classifier effectively alleviates the biased search process and prevents performance collapse. Extensive experimental evaluations demonstrate that our approach consistently improves upon existing methods from an orthogonal perspective and achieves state-of-the-art results with simple enhancements.

Via

Access Paper or Ask Questions

Offline Inverse Constrained Reinforcement Learning for Safe-Critical Decision Making in Healthcare

Oct 10, 2024

Nan Fang, Guiliang Liu, Wei Gong

Figure 1 for Offline Inverse Constrained Reinforcement Learning for Safe-Critical Decision Making in Healthcare

Figure 2 for Offline Inverse Constrained Reinforcement Learning for Safe-Critical Decision Making in Healthcare

Figure 3 for Offline Inverse Constrained Reinforcement Learning for Safe-Critical Decision Making in Healthcare

Figure 4 for Offline Inverse Constrained Reinforcement Learning for Safe-Critical Decision Making in Healthcare

Abstract:Reinforcement Learning (RL) applied in healthcare can lead to unsafe medical decisions and treatment, such as excessive dosages or abrupt changes, often due to agents overlooking common-sense constraints. Consequently, Constrained Reinforcement Learning (CRL) is a natural choice for safe decisions. However, specifying the exact cost function is inherently difficult in healthcare. Recent Inverse Constrained Reinforcement Learning (ICRL) is a promising approach that infers constraints from expert demonstrations. ICRL algorithms model Markovian decisions in an interactive environment. These settings do not align with the practical requirement of a decision-making system in healthcare, where decisions rely on historical treatment recorded in an offline dataset. To tackle these issues, we propose the Constraint Transformer (CT). Specifically, 1) we utilize a causal attention mechanism to incorporate historical decisions and observations into the constraint modeling, while employing a Non-Markovian layer for weighted constraints to capture critical states. 2) A generative world model is used to perform exploratory data augmentation, enabling offline RL methods to simulate unsafe decision sequences. In multiple medical scenarios, empirical results demonstrate that CT can capture unsafe states and achieve strategies that approximate lower mortality rates, reducing the occurrence probability of unsafe behaviors.

Via

Access Paper or Ask Questions

Adaptive Hopping for Bluetooth Backscatter using Commodity Edges

Oct 10, 2024

Maoran Jiang, Wei Gong

Figure 1 for Adaptive Hopping for Bluetooth Backscatter using Commodity Edges

Figure 2 for Adaptive Hopping for Bluetooth Backscatter using Commodity Edges

Figure 3 for Adaptive Hopping for Bluetooth Backscatter using Commodity Edges

Figure 4 for Adaptive Hopping for Bluetooth Backscatter using Commodity Edges

Abstract:Channel hopping is essential to BLE backscatter as commodity BLE switches channels frequently during transmission to overcome interferences in busy radio environments. Existing Bluetooth backscatter systems, however, suffer from slow responses to excitation change and poor control of the target channel. To address these issues, this paper presents ChannelDance, a BLE backscatter system that utilizes a low-latency edge server to achieve fast and accurate hopping. Specifically, we show that the backscattered channel relies on the excitation channel and tag toggling frequency. By identifying excitation frequency, the tag can achieve accurate hopping with a dynamically configured clock. Further, we introduce a low-latency architecture, which is centralized, asynchronous, and equipped with high-speed interfaces. This architecture supports the tag to respond to excitation changes fastly. We prototype the ChannelDance tag with FPGA and build the low latency edge server with commodity MCU and off-the-shelf BLE and WiFi radios. Experimental results show that ChannelDance can realize 40 to 40 channel mapping with a median success rate of 93% and achieve 3.5x goodput gain with channel optimization. Moreover, with adaptive hopping, the ChannelDance tag successfully establishes a connection with commodity BLE.

Via

Access Paper or Ask Questions

ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers

Sep 02, 2024

Luoyu Mei, Shuai Wang, Yun Cheng, Ruofeng Liu, Zhimeng Yin, Wenchao Jiang, Wei Gong

Figure 1 for ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers

Figure 2 for ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers

Figure 3 for ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers

Figure 4 for ESP-PCT: Enhanced VR Semantic Performance through Efficient Compression of Temporal and Spatial Redundancies in Point Cloud Transformers

Abstract:Semantic recognition is pivotal in virtual reality (VR) applications, enabling immersive and interactive experiences. A promising approach is utilizing millimeter-wave (mmWave) signals to generate point clouds. However, the high computational and memory demands of current mmWave point cloud models hinder their efficiency and reliability. To address this limitation, our paper introduces ESP-PCT, a novel Enhanced Semantic Performance Point Cloud Transformer with a two-stage semantic recognition framework tailored for VR applications. ESP-PCT takes advantage of the accuracy of sensory point cloud data and optimizes the semantic recognition process, where the localization and focus stages are trained jointly in an end-to-end manner. We evaluate ESP-PCT on various VR semantic recognition conditions, demonstrating substantial enhancements in recognition efficiency. Notably, ESP-PCT achieves a remarkable accuracy of 93.2% while reducing the computational requirements (FLOPs) by 76.9% and memory usage by 78.2% compared to the existing Point Transformer model simultaneously. These underscore ESP-PCT's potential in VR semantic recognition by achieving high accuracy and reducing redundancy. The code and data of this project are available at \url{https://github.com/lymei-SEU/ESP-PCT}.

* Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024

Via

Access Paper or Ask Questions

Double-decker: Productive Backscatter Communication Using a Single Commodity Receiver

Aug 29, 2024

Qiwei Wang, Wei Gong

Figure 1 for Double-decker: Productive Backscatter Communication Using a Single Commodity Receiver

Figure 2 for Double-decker: Productive Backscatter Communication Using a Single Commodity Receiver

Figure 3 for Double-decker: Productive Backscatter Communication Using a Single Commodity Receiver

Figure 4 for Double-decker: Productive Backscatter Communication Using a Single Commodity Receiver

Abstract:Backscatter communication has attracted significant attention for Internet-of-Things applications due to its ultra-low-power consumption. The state-of-the-art backscatter systems no longer require dedicated carrier generators and leverage ambient signals as carriers. However, there is an emerging challenge: most prior systems need dual receivers to capture the original and backscattered signals at the same time for tag data demodulation. This is not conducive to the widespread deployment of backscatter communication. To address this problem, we present double-decker, a novel backscatter system that only requires a single commercial device for backscatter communication. The key technology of double-decker is to divide the carrier OFDM symbols into two parts, which are pilot symbols and data symbols. Pilot symbols can be used as reference signals for tag data demodulation, thus getting rid of the dependence on the dual receiver structure. We have built an FPGA prototype and conducted extensive experiments. Empirical results show that when the excitation signal is 802.11g, double-decker achieves a tag data rate of 35.2kbps and a productive data rate of 38kbps, respectively. The communication range of double-decker is up to 28m in LOS deployment and 24m in NLOS deployment.

Via

Access Paper or Ask Questions

SFPrompt: Communication-Efficient Split Federated Fine-Tuning for Large Pre-Trained Models over Resource-Limited Devices

Jul 24, 2024

Linxiao Cao, Yifei Zhu, Wei Gong

Abstract:Large pre-trained models have exhibited remarkable achievements across various domains. The substantial training costs associated with these models have led to wide studies of fine-tuning for effectively harnessing their capabilities in solving downstream tasks. Yet, conventional fine-tuning approaches become infeasible when the model lacks access to downstream data due to privacy concerns. Naively integrating fine-tuning approaches with the emerging federated learning frameworks incurs substantial communication overhead and exerts high demand on local computing resources, making it impractical for common resource-limited devices. In this paper, we introduce SFPrompt, an innovative privacy-preserving fine-tuning method tailored for the federated setting where direct uploading of raw data is prohibited and local devices are resource-constrained to run a complete pre-trained model. In essence, SFPrompt judiciously combines split learning with federated learning to handle these challenges. Specifically, the pre-trained model is first partitioned into client and server components, thereby streamlining the client-side model and substantially alleviating computational demands on local resources. SFPrompt then introduces soft prompts into the federated model to enhance the fine-tuning performance. To further reduce communication costs, a novel dataset pruning algorithm and a local-loss update strategy are devised during the fine-tuning process. Extensive experiments demonstrate that SFPrompt delivers competitive performance as the federated full fine-tuning approach while consuming a mere 0.46% of local computing resources and incurring 53% less communication cost.

Via

Access Paper or Ask Questions

General Distribution Learning: A theoretical framework for Deep Learning

Jun 12, 2024

Binchuan Qi, Li Li, Wei Gong

Abstract:There remain numerous unanswered research questions on deep learning (DL) within the classical learning theory framework. These include the remarkable generalization capabilities of overparametrized neural networks (NNs), the efficient optimization performance despite non-convexity of objectives, the mechanism of flat minima for generalization, and the exceptional performance of deep architectures in solving physical problems. This paper introduces General Distribution Learning (GD Learning), a novel theoretical learning framework designed to address a comprehensive range of machine learning and statistical tasks, including classification, regression and parameter estimation. Departing from traditional statistical machine learning, GD Learning focuses on the true underlying distribution. In GD Learning, learning error, corresponding to the expected error in classical statistical learning framework, is divided into fitting errors due to models and algorithms, as well as sampling errors introduced by limited sampling data. The framework significantly incorporates prior knowledge, especially in scenarios characterized by data scarcity, thereby enhancing performance. Within the GD Learning framework, we demonstrate that the global optimal solutions in non-convex optimization can be approached by minimizing the gradient norm and the non-uniformity of the eigenvalues of the model's Jacobian matrix. This insight leads to the development of the gradient structure control algorithm. GD Learning also offers fresh insights into the questions on deep learning, including overparameterization and non-convex optimization, bias-variance trade-off, and the mechanism of flat minima.

* arXiv admin note: text overlap with arXiv:2105.04026 by other authors

Via

Access Paper or Ask Questions

Error Bounds of Supervised Classification from Information-Theoretic Perspective

Jun 07, 2024

Binchuan Qi, Wei Gong, Li Li

Abstract:There remains a list of unanswered research questions on deep learning (DL), including the remarkable generalization power of overparametrized neural networks, the efficient optimization performance despite the non-convexity, and the mechanisms behind flat minima in generalization. In this paper, we adopt an information-theoretic perspective to explore the theoretical foundations of supervised classification using deep neural networks (DNNs). Our analysis introduces the concepts of fitting error and model risk, which, together with generalization error, constitute an upper bound on the expected risk. We demonstrate that the generalization errors are bounded by the complexity, influenced by both the smoothness of distribution and the sample size. Consequently, task complexity serves as a reliable indicator of the dataset's quality, guiding the setting of regularization hyperparameters. Furthermore, the derived upper bound fitting error links the back-propagated gradient, Neural Tangent Kernel (NTK), and the model's parameter count with the fitting error. Utilizing the triangle inequality, we establish an upper bound on the expected risk. This bound offers valuable insights into the effects of overparameterization, non-convex optimization, and the flat minima in DNNs.Finally, empirical verification confirms a significant positive correlation between the derived theoretical bounds and the practical expected risk, confirming the practical relevance of the theoretical findings.

Via

Access Paper or Ask Questions