University of Toronto
Abstract:Contrastive learning is a model pre-training technique by first creating similar views of the original data, and then encouraging the data and its corresponding views to be close in the embedding space. Contrastive learning has witnessed success in image and natural language data, thanks to the domain-specific augmentation techniques that are both intuitive and effective. Nonetheless, in tabular domain, the predominant augmentation technique for creating views is through corrupting tabular entries via swapping values, which is not as sound or effective. We propose a simple yet powerful improvement to this augmentation technique: corrupting tabular data conditioned on class identity. Specifically, when corrupting a specific tabular entry from an anchor row, instead of randomly sampling a value in the same feature column from the entire table uniformly, we only sample from rows that are identified to be within the same class as the anchor row. We assume the semi-supervised learning setting, and adopt the pseudo labeling technique for obtaining class identities over all table rows. We also explore the novel idea of selecting features to be corrupted based on feature correlation structures. Extensive experiments show that the proposed approach consistently outperforms the conventional corruption method for tabular data classification tasks. Our code is available at https://github.com/willtop/Tabular-Class-Conditioned-SSL.
Abstract:In most applications of utilizing neural networks for mathematical optimization, a dedicated model is trained for each specific optimization objective. However, in many scenarios, several distinct yet correlated objectives or tasks often need to be optimized on the same set of problem inputs. Instead of independently training a different neural network for each problem separately, it would be more efficient to exploit the correlations between these objectives and to train multiple neural network models with shared model parameters and feature representations. To achieve this, this paper first establishes the concept of common information: the shared knowledge required for solving the correlated tasks, then proposes a novel approach for model training by adding into the model an additional reconstruction stage associated with a new reconstruction loss. This loss is for reconstructing the common information starting from a selected hidden layer in the model. The proposed approach encourages the learned features to be general and transferable, and therefore can be readily used for efficient transfer learning. For numerical simulations, three applications are studied: transfer learning on classifying MNIST handwritten digits, the device-to-device wireless network power allocation, and the multiple-input-single-output network downlink beamforming and localization. Simulation results suggest that the proposed approach is highly efficient in data and model complexity, is resilient to over-fitting, and has competitive performances.
Abstract:Current spectral compressed sensing methods via Hankel matrix completion employ symmetric factorization to demonstrate the low-rank property of the Hankel matrix. However, previous non-convex gradient methods only utilize asymmetric factorization to achieve spectral compressed sensing. In this paper, we propose a novel nonconvex projected gradient descent method for spectral compressed sensing via symmetric factorization named Symmetric Hankel Projected Gradient Descent (SHGD), which updates only one matrix and avoids a balancing regularization term. SHGD reduces about half of the computation and storage costs compared to the prior gradient method based on asymmetric factorization. {Besides, the symmetric factorization employed in our work is completely novel to the prior low-rank factorization model, introducing a new factorization ambiguity under complex orthogonal transformation}. Novel distance metrics are designed for our factorization method and a linear convergence guarantee to the desired signal is established with $O(r^2\log(n))$ observations. Numerical simulations demonstrate the superior performance of the proposed SHGD method in phase transitions and computation efficiency compared to state-of-the-art methods.
Abstract:Generative adversarial network (GAN) is one of the widely-adopted machine-learning frameworks for a wide range of applications such as generating high-quality images, video, and audio contents. However, training a GAN could become computationally expensive for large neural networks. In this work, we propose a hybrid quantum-classical architecture for improving GAN (denoted as QC-GAN). The performance was examed numerically by benchmarking with a classical GAN using MindSpore Quantum on the task of hand-written image generation. The generator of the QC-GAN consists of a quantum variational circuit together with a one-layer neural network, and the discriminator consists of a traditional neural network. Leveraging the entangling and expressive power of quantum circuits, our hybrid architecture achieved better performance (Frechet Inception Distance) than the classical GAN, with much fewer training parameters and number of iterations for convergence. We have also demonstrated the superiority of QC-GAN over an alternative quantum GAN, namely pathGAN, which could hardly generate 16$\times$16 or larger images. This work demonstrates the value of combining ideas from quantum computing with machine learning for both areas of Quantum-for-AI and AI-for-Quantum.
Abstract:Molecular docking is an important tool for structure-based drug design, accelerating the efficiency of drug development. Complex and dynamic binding processes between proteins and small molecules require searching and sampling over a wide spatial range. Traditional docking by searching for possible binding sites and conformations is computationally complex and results poorly under blind docking. Quantum-inspired algorithms combining quantum properties and annealing show great advantages in solving combinatorial optimization problems. Inspired by this, we achieve an improved in blind docking by using quantum-inspired combined with gradients learned by deep learning in the encoded molecular space. Numerical simulation shows that our method outperforms traditional docking algorithms and deep learning-based algorithms over 10\%. Compared to the current state-of-the-art deep learning-based docking algorithm DiffDock, the success rate of Top-1 (RMSD<2) achieves an improvement from 33\% to 35\% in our same setup. In particular, a 6\% improvement is realized in the high-precision region(RMSD<1) on molecules data unseen in DiffDock, which demonstrates the well-generalized of our method.
Abstract:Web applications are increasingly becoming the primary platform for AI service delivery, making in-browser deep learning (DL) inference more prominent. However, current in-browser inference systems fail to effectively utilize advanced web programming techniques and customize kernels for various client devices, leading to suboptimal performance. To address the issues, this paper presents the first in-browser inference system, nn-JIT.web, which enables just-in-time (JIT) auto-generation of optimized kernels for both CPUs and GPUs during inference. The system achieves this by using two novel web programming techniques that can significantly reduce kernel generation time, compared to other tensor compilers such as TVM, while maintaining or even improving performance. The first technique, Tensor-Web Compiling Co-Design, lowers compiling costs by unifying tensor and web compiling and eliminating redundant and ineffective compiling passes. The second technique, Web-Specific Lite Kernel Optimization Space Design, reduces kernel tuning costs by focusing on web programming requirements and efficient hardware resource utilization, limiting the optimization space to only dozens. nn-JIT.web is evaluated for modern transformer models on a range of client devices, including the mainstream CPUs and GPUs from ARM, Intel, AMD and Nvidia. Results show that nn-JIT.web can achieve up to 8.2x faster within 30 seconds compared to the baselines across various models.
Abstract:In reinforcement learning, the objective is almost always defined as a \emph{cumulative} function over the rewards along the process. However, there are many optimal control and reinforcement learning problems in various application fields, especially in communications and networking, where the objectives are not naturally expressed as summations of the rewards. In this paper, we recognize the prevalence of non-cumulative objectives in various problems, and propose a modification to existing algorithms for optimizing such objectives. Specifically, we dive into the fundamental building block for many optimal control and reinforcement learning algorithms: the Bellman optimality equation. To optimize a non-cumulative objective, we replace the original summation operation in the Bellman update rule with a generalized operation corresponding to the objective. Furthermore, we provide sufficient conditions on the form of the generalized operation as well as assumptions on the Markov decision process under which the globally optimal convergence of the generalized Bellman updates can be guaranteed. We demonstrate the idea experimentally with the bottleneck objective, i.e., the objectives determined by the minimum reward along the process, on classical optimal control and reinforcement learning tasks, as well as on two network routing problems on maximizing the flow rates.
Abstract:This paper proposes a paradigm of uncertainty injection for training deep learning model to solve robust optimization problems. The majority of existing studies on deep learning focus on the model learning capability, while assuming the quality and accuracy of the inputs data can be guaranteed. However, in realistic applications of deep learning for solving optimization problems, the accuracy of inputs, which are the problem parameters in this case, plays a large role. This is because, in many situations, it is often costly or sometime impossible to obtain the problem parameters accurately, and correspondingly, it is highly desirable to develop learning algorithms that can account for the uncertainties in the input and produce solutions that are robust against these uncertainties. This paper presents a novel uncertainty injection scheme for training machine learning models that are capable of implicitly accounting for the uncertainties and producing statistically robust solutions. We further identify the wireless communications as an application field where uncertainties are prevalent in problem parameters such as the channel coefficients. We show the effectiveness of the proposed training scheme in two applications: the robust power loading for multiuser multiple-input-multiple-output (MIMO) downlink transmissions; and the robust power control for device-to-device (D2D) networks.
Abstract:WiFi-based smart human sensing technology enabled by Channel State Information (CSI) has received great attention in recent years. However, CSI-based sensing systems suffer from performance degradation when deployed in different environments. Existing works solve this problem by domain adaptation using massive unlabeled high-quality data from the new environment, which is usually unavailable in practice. In this paper, we propose a novel augmented environment-invariant robust WiFi gesture recognition system named AirFi that deals with the issue of environment dependency from a new perspective. The AirFi is a novel domain generalization framework that learns the critical part of CSI regardless of different environments and generalizes the model to unseen scenarios, which does not require collecting any data for adaptation to the new environment. AirFi extracts the common features from several training environment settings and minimizes the distribution differences among them. The feature is further augmented to be more robust to environments. Moreover, the system can be further improved by few-shot learning techniques. Compared to state-of-the-art methods, AirFi is able to work in different environment settings without acquiring any CSI data from the new environment. The experimental results demonstrate that our system remains robust in the new environment and outperforms the compared systems.
Abstract:Generalized Complete Intersection Calabi-Yau Manifold (gCICY) is a new construction of Calabi-Yau manifolds established recently. However, the generation of new gCICYs using standard algebraic method is very laborious. Due to this complexity, the number of gCICYs and their classification still remain unknown. In this paper, we try to make some progress in this direction using neural network. The results showed that our trained models can have a high precision on the existing type $(1,1)$ and type $(2,1)$ gCICYs in the literature. Moreover, They can achieve a $97\%$ precision in predicting new gCICY which is generated differently from those used for training and testing. This shows that machine learning could be an effective method to classify and generate new gCICY.