Abstract:Face Forgery Detection (FFD), or Deepfake detection, aims to determine whether a digital face is real or fake. Due to different face synthesis algorithms with diverse forgery patterns, FFD models often overfit specific patterns in training datasets, resulting in poor generalization to other unseen forgeries. This severe challenge requires FFD models to possess strong capabilities in representing complex facial features and extracting subtle forgery cues. Although previous FFD models directly employ existing backbones to represent and extract facial forgery cues, the critical role of backbones is often overlooked, particularly as their knowledge and capabilities are insufficient to address FFD challenges, inevitably limiting generalization. Therefore, it is essential to integrate the backbone pre-training configurations and seek practical solutions by revisiting the complete FFD workflow, from backbone pre-training and fine-tuning to inference of discriminant results. Specifically, we analyze the crucial contributions of backbones with different configurations in FFD task and propose leveraging the ViT network with self-supervised learning on real-face datasets to pre-train a backbone, equipping it with superior facial representation capabilities. We then build a competitive backbone fine-tuning framework that strengthens the backbone's ability to extract diverse forgery cues within a competitive learning mechanism. Moreover, we devise a threshold optimization mechanism that utilizes prediction confidence to improve the inference reliability. Comprehensive experiments demonstrate that our FFD model with the elaborate backbone achieves excellent performance in FFD and extra face-related tasks, i.e., presentation attack detection. Code and models are available at https://github.com/zhenglab/FFDBackbone.
Abstract:Group Equivariant Convolution (GConv) can effectively handle rotational symmetry data. They assume uniform and strict rotational symmetry across all features, as the transformations under the specific group. However, real-world data rarely conforms to strict rotational symmetry commonly referred to as Rotational Symmetry-Breaking in the system or dataset, making GConv unable to adapt effectively to this phenomenon. Motivated by this, we propose a simple but highly effective method to address this problem, which utilizes a set of learnable biases called the $G$-Biases under the group order to break strict group constraints and achieve \textbf{R}elaxed \textbf{R}otational \textbf{E}quivarant \textbf{Conv}olution (RREConv). We conduct extensive experiments to validate Relaxed Rotational Equivariance on rotational symmetry groups $\mathcal{C}_n$ (e.g. $\mathcal{C}_2$, $\mathcal{C}_4$, and $\mathcal{C}_6$ groups). Further experiments demonstrate that our proposed RREConv-based methods achieve excellent performance, compared to existing GConv-based methods in classification and detection tasks on natural image datasets.
Abstract:Introducing Group Equivariant Convolution (GConv) empowers models to explore symmetries hidden in visual data, improving their performance. However, in real-world scenarios, objects or scenes often exhibit perturbations of a symmetric system, specifically a deviation from a symmetric architecture, which can be characterized by a non-trivial action of a symmetry group, known as Symmetry-Breaking. Traditional GConv methods are limited by the strict operation rules in the group space, only ensuring features remain strictly equivariant under limited group transformations, making it difficult to adapt to Symmetry-Breaking or non-rigid transformations. Motivated by this, we introduce a novel Relaxed Rotation GConv (R2GConv) with our defined Relaxed Rotation-Equivariant group $\mathbf{R}_4$. Furthermore, we propose a Relaxed Rotation-Equivariant Network (R2Net) as the backbone and further develop the Symmetry-Breaking Object Detector (SBDet) for 2D object detection built upon it. Experiments demonstrate the effectiveness of our proposed R2GConv in natural image classification tasks, and SBDet achieves excellent performance in object detection tasks with improved generalization capabilities and robustness.
Abstract:Although Federated Learning (FL) enables collaborative learning in Artificial Intelligence of Things (AIoT) design, it fails to work on low-memory AIoT devices due to its heavy memory usage. To address this problem, various federated pruning methods are proposed to reduce memory usage during inference. However, few of them can substantially mitigate the memory burdens during pruning and training. As an alternative, zeroth-order or backpropagation-free (BP-Free) methods can partially alleviate the memory consumption, but they suffer from scaling up and large computation overheads, since the gradient estimation error and floating point operations (FLOPs) increase as the dimensionality of the model parameters grows. In this paper, we propose a federated foresight pruning method based on Neural Tangent Kernel (NTK), which can seamlessly integrate with federated BP-Free training frameworks. We present an approximation to the computation of federated NTK by using the local NTK matrices. Moreover, we demonstrate that the data-free property of our method can substantially reduce the approximation error in extreme data heterogeneity scenarios. Since our approach improves the performance of the vanilla BP-Free method with fewer FLOPs and truly alleviates memory pressure during training and inference, it makes FL more friendly to low-memory devices. Comprehensive experimental results obtained from simulation- and real test-bed-based platforms show that our federated foresight-pruning method not only preserves the ability of the dense model with a memory reduction up to 9x but also boosts the performance of the vanilla BP-Free method with dramatically fewer FLOPs.
Abstract:In this paper, a deep learning method for solving an improved one-dimensional Poisson-Nernst-Planck ion channel (PNPic) model, called the PNPic deep learning solver, is presented. In particular, it combines a novel local neural network scheme with an effective PNPic finite element solver. Since the input data of the neural network scheme only involves a small local patch of coarse grid solutions, which the finite element solver can quickly produce, the PNPic deep learning solver can be trained much faster than any corresponding conventional global neural network solvers. After properly trained, it can output a predicted PNPic solution in a much higher degree of accuracy than the low cost coarse grid solutions and can reflect different perturbation cases on the parameters, ion channel subregions, and interface and boundary values, etc. Consequently, the PNPic deep learning solver can generate a numerical solution with high accuracy for a family of PNPic models. As an initial study, two types of numerical tests were done by perturbing one and two parameters of the PNPic model, respectively, as well as the tests done by using a few perturbed interface positions of the model as training samples. These tests demonstrate that the PNPic deep learning solver can generate highly accurate PNPic numerical solutions.
Abstract:In recent years, surrogate models based on deep neural networks (DNN) have been widely used to solve partial differential equations, which were traditionally handled by means of numerical simulations. This kind of surrogate models, however, focuses on global interpolation of the training dataset, and thus requires a large network structure. The process is both time consuming and computationally costly, thereby restricting their use for high-fidelity prediction of complex physical problems. In the present study, we develop a neural network with local converging input (NNLCI) for high-fidelity prediction using unstructured data. The framework utilizes the local domain of dependence with converging coarse solutions as input, which greatly reduces computational resource and training time. As a validation case, the NNLCI method is applied to study inviscid supersonic flows in channels with bumps. Different bump geometries and locations are considered to benchmark the effectiveness and versability of the proposed approach. Detailed flow structures, including shock-wave interactions, are examined systematically.
Abstract:Although Deep Learning (DL) has achieved success in complex Artificial Intelligence (AI) tasks, it suffers from various notorious problems (e.g., feature redundancy, and vanishing or exploding gradients), since updating parameters in Euclidean space cannot fully exploit the geometric structure of the solution space. As a promising alternative solution, Riemannian-based DL uses geometric optimization to update parameters on Riemannian manifolds and can leverage the underlying geometric information. Accordingly, this article presents a comprehensive survey of applying geometric optimization in DL. At first, this article introduces the basic procedure of the geometric optimization, including various geometric optimizers and some concepts of Riemannian manifold. Subsequently, this article investigates the application of geometric optimization in different DL networks in various AI tasks, e.g., convolution neural network, recurrent neural network, transfer learning, and optimal transport. Additionally, typical public toolboxes that implement optimization on manifold are also discussed. Finally, this article makes a performance comparison between different deep geometric optimization methods under image recognition scenarios.
Abstract:In this paper we apply neural networks with local converging inputs (NNLCI), originally introduced in [arXiv:2109.09316], to solve the two dimensional Maxwell's equation around perfect electric conductors (PECs). The input to the networks consist of local patches of low cost numerical solutions to the equation computed on two coarse grids, and the output is a more accurate solution at the center of the local patch. We apply the recently developed second order finite difference method [arXiv:2209.00740] to generate the input and training data which captures the scattering of electromagnetic waves off of a PEC at a given terminal time. The advantage of NNLCI is that once trained it offers an efficient alternative to costly high-resolution conventional numerical methods; our numerical experiments indicate the computational complexity saving by a factor of $8^3$ in terms of the number of spatial-temporal grid points. In contrast with existing research work on applying neural networks to directly solve PDEs, our method takes advantage of the local domain of dependence of the Maxwell's equation in the input solution patches, and is therefore simpler, yet still robust. We demonstrate that we can train our neural network on some PECs to predict accurate solutions to different PECs with quite different geometries from any of the training examples.
Abstract:Inspired by the tremendous success of the self-attention mechanism in natural language processing, the Vision Transformer (ViT) creatively applies it to image patch sequences and achieves incredible performance. However, the scaled dot-product self-attention of ViT brings about scale ambiguity to the structure of the original feature space. To address this problem, we propose a novel method named Orthogonal Vision Transformer (O-ViT), to optimize ViT from the geometric perspective. O-ViT limits parameters of self-attention blocks to be on the norm-keeping orthogonal manifold, which can keep the geometry of the feature space. Moreover, O-ViT achieves both orthogonal constraints and cheap optimization overhead by adopting a surjective mapping between the orthogonal group and its Lie algebra.We have conducted comparative experiments on image recognition tasks to demonstrate O-ViT's validity and experiments show that O-ViT can boost the performance of ViT by up to 3.6%.
Abstract:Patient similarity assessment (PSA) is pivotal to evidence-based and personalized medicine, enabled by analyzing the increasingly available electronic health records (EHRs). However, machine learning approaches for PSA has to deal with inherent data deficiencies of EHRs, namely missing values, noise, and small sample sizes. In this work, an end-to-end discriminative learning framework, called SparGE, is proposed to address these data challenges of EHR for PSA. SparGE measures similarity by jointly sparse coding and graph embedding. First, we use low-rank constrained sparse coding to identify and calculate weight for similar patients, while denoising against missing values. Then, graph embedding on sparse representations is adopted to measure the similarity between patient pairs via preserving local relationships defined by distances. Finally, a global cost function is constructed to optimize related parameters. Experimental results on two private and public real-world healthcare datasets, namely SingHEART and MIMIC-III, show that the proposed SparGE significantly outperforms other machine learning patient similarity methods.