Abstract:Fault diagnosis technology supports the healthy operation of mechanical equipment. However, the variations conditions during the operation of mechanical equipment lead to significant disparities in data distribution, posing challenges to fault diagnosis. Furthermore, when deploying applications, traditional methods often encounter issues such as latency and data security. Therefore, conducting fault diagnosis and deploying application methods under cross-operating conditions holds significant value. This paper proposes a domain adaptation-based lightweight fault diagnosis framework for edge computing scenarios. Incorporating the local maximum mean discrepancy into knowledge transfer aligns the feature distributions of different domains in a high-dimensional feature space, to discover a common feature space across domains. The acquired fault diagnosis expertise from the cloud-model is transferred to the lightweight edge-model using adaptation knowledge transfer methods. While ensuring real-time diagnostic capabilities, accurate fault diagnosis is achieved across working conditions. We conducted validation experiments on the NVIDIA Jetson Xavier NX kit. In terms of diagnostic performance, the proposed method significantly improved diagnostic accuracy, with average increases of 34.44% and 17.33% compared to the comparison method, respectively. Regarding lightweight effectiveness, proposed method achieved an average inference speed increase of 80.47%. Additionally, compared to the cloud-model, the parameter count of the edge-model decreased by 96.37%, while the Flops decreased by 83.08%.
Abstract:Pre-trained large language models (LLM) have emerged as a powerful tool for simulating various scenarios and generating output given specific instructions and multimodal input. In this work, we analyze the specific use of LLM to enhance a classical supervised machine learning method for classification problems. We propose a few approaches to integrate LLM into a classical machine learning estimator to further enhance the prediction performance. We examine the performance of the proposed approaches through both standard supervised learning binary classification tasks, and a transfer learning task where the test data observe distribution changes compared to the training data. Numerical experiments using four publicly available datasets are conducted and suggest that using LLM to enhance classical machine learning estimators can provide significant improvement on prediction performance.
Abstract:Deep learning methods have access to be employed for solving physical systems governed by parametric partial differential equations (PDEs) due to massive scientific data. It has been refined to operator learning that focuses on learning non-linear mapping between infinite-dimensional function spaces, offering interface from observations to solutions. However, state-of-the-art neural operators are limited to constant and uniform discretization, thereby leading to deficiency in generalization on arbitrary discretization schemes for computational domain. In this work, we propose a novel operator learning algorithm, referred to as Dynamic Gaussian Graph Operator (DGGO) that expands neural operators to learning parametric PDEs in arbitrary discrete mechanics problems. The Dynamic Gaussian Graph (DGG) kernel learns to map the observation vectors defined in general Euclidean space to metric vectors defined in high-dimensional uniform metric space. The DGG integral kernel is parameterized by Gaussian kernel weighted Riemann sum approximating and using dynamic message passing graph to depict the interrelation within the integral term. Fourier Neural Operator is selected to localize the metric vectors on spatial and frequency domains. Metric vectors are regarded as located on latent uniform domain, wherein spatial and spectral transformation offer highly regular constraints on solution space. The efficiency and robustness of DGGO are validated by applying it to solve numerical arbitrary discrete mechanics problems in comparison with mainstream neural operators. Ablation experiments are implemented to demonstrate the effectiveness of spatial transformation in the DGG kernel. The proposed method is utilized to forecast stress field of hyper-elastic material with geometrically variable void as engineering application.
Abstract:Machine learning is employed for solving physical systems governed by general nonlinear partial differential equations (PDEs). However, complex multi-physics systems such as acoustic-structure coupling are often described by a series of PDEs that incorporate variable physical quantities, which are referred to as parametric systems. There are lack of strategies for solving parametric systems governed by PDEs that involve explicit and implicit quantities. In this paper, a deep learning-based Multi Physics-Informed PointNet (MPIPN) is proposed for solving parametric acoustic-structure systems. First, the MPIPN induces an enhanced point-cloud architecture that encompasses explicit physical quantities and geometric features of computational domains. Then, the MPIPN extracts local and global features of the reconstructed point-cloud as parts of solving criteria of parametric systems, respectively. Besides, implicit physical quantities are embedded by encoding techniques as another part of solving criteria. Finally, all solving criteria that characterize parametric systems are amalgamated to form distinctive sequences as the input of the MPIPN, whose outputs are solutions of systems. The proposed framework is trained by adaptive physics-informed loss functions for corresponding computational domains. The framework is generalized to deal with new parametric conditions of systems. The effectiveness of the MPIPN is validated by applying it to solve steady parametric acoustic-structure coupling systems governed by the Helmholtz equations. An ablation experiment has been implemented to demonstrate the efficacy of physics-informed impact with a minority of supervised data. The proposed method yields reasonable precision across all computational domains under constant parametric conditions and changeable combinations of parametric conditions for acoustic-structure systems.
Abstract:For the problems of low recognition rate and slow recognition speed of traditional detection methods in IC appearance defect detection, we propose an IC appearance defect detection algo-rithm IH-ViT. Our proposed model takes advantage of the respective strengths of CNN and ViT to acquire image features from both local and global aspects, and finally fuses the two features for decision making to determine the class of defects, thus obtaining better accuracy of IC defect recognition. To address the problem that IC appearance defects are mainly reflected in the dif-ferences in details, which are difficult to identify by traditional algorithms, we improved the tra-ditional ViT by performing an additional convolution operation inside the batch. For the problem of information imbalance of samples due to diverse sources of data sets, we adopt a dual-channel image segmentation technique to further improve the accuracy of IC appearance defects. Finally, after testing, our proposed hybrid IH-ViT model achieved 72.51% accuracy, which is 2.8% and 6.06% higher than ResNet50 and ViT models alone. The proposed algorithm can quickly and accurately detect the defect status of IC appearance and effectively improve the productivity of IC packaging and testing companies.
Abstract:Natural reading orders of words are crucial for information extraction from form-like documents. Despite recent advances in Graph Convolutional Networks (GCNs) on modeling spatial layout patterns of documents, they have limited ability to capture reading orders of given word-level node representations in a graph. We propose Reading Order Equivariant Positional Encoding (ROPE), a new positional encoding technique designed to apprehend the sequential presentation of words in documents. ROPE generates unique reading order codes for neighboring words relative to the target word given a word-level graph connectivity. We study two fundamental document entity extraction tasks including word labeling and word grouping on the public FUNSD dataset and a large-scale payment dataset. We show that ROPE consistently improves existing GCNs with a margin up to 8.4% F1-score.
Abstract:Affinity graphs are widely used in deep architectures, including graph convolutional neural networks and attention networks. Thus far, the literature has focused on abstracting features from such graphs, while the learning of the affinities themselves has been overlooked. Here we propose a principled method to directly supervise the learning of weights in affinity graphs, to exploit meaningful connections between entities in the data source. Applied to a visual attention network, our affinity supervision improves relationship recovery between objects, even without the use of manually annotated relationship labels. We further show that affinity learning between objects boosts scene categorization performance and that the supervision of affinity can also be applied to graphs built from mini-batches, for neural network training. In an image classification task we demonstrate consistent improvement over the baseline, with diverse network architectures and datasets.
Abstract:View based strategies for 3D object recognition have proven to be very successful. The state-of-the-art methods now achieve over 90% correct category level recognition performance on appearance images. We improve upon these methods by introducing a view clustering and pooling layer based on dominant sets. The key idea is to pool information from views which are similar and thus belong to the same cluster. The pooled feature vectors are then fed as inputs to the same layer, in a recurrent fashion. This recurrent clustering and pooling module, when inserted in an off-the-shelf pretrained CNN, boosts performance for multi-view 3D object recognition, achieving a new state of the art test set recognition accuracy of 93.8% on the ModelNet 40 database. We also explore a fast approximate learning strategy for our cluster-pooling CNN, which, while sacrificing end-to-end learning, greatly improves its training efficiency with only a slight reduction of recognition accuracy to 93.3%. Our implementation is available at https://github.com/fate3439/dscnn.
Abstract:Attention networks show promise for both vision and language tasks, by emphasizing relationships between constituent elements through appropriate weighting functions. Such elements could be regions in an image output by a region proposal network, or words in a sentence, represented by word embedding. Thus far, however, the learning of attention weights has been driven solely by the minimization of task specific loss functions. We here introduce a method of learning attention weights to better emphasize informative pair-wise relations between entities. The key idea is to use a novel center-mass cross entropy loss, which can be applied in conjunction with the task specific ones. We then introduce a focused attention backbone to learn these attention weights for general tasks. We demonstrate that the focused attention module leads to a new state-of-the-art for the recovery of relations in a relationship proposal task. Our experiments show that it also boosts performance for diverse vision and language tasks, including object detection, scene categorization and document classification.
Abstract:ProductNet is a collection of high-quality product datasets for better product understanding. Motivated by ImageNet, ProductNet aims at supporting product representation learning by curating product datasets of high quality with properly chosen taxonomy. In this paper, the two goals of building high-quality product datasets and learning product representation support each other in an iterative fashion: the product embedding is obtained via a multi-modal deep neural network (master model) designed to leverage product image and catalog information; and in return, the embedding is utilized via active learning (local model) to vastly accelerate the annotation process. For the labeled data, the proposed master model yields high categorization accuracy (94.7% top-1 accuracy for 1240 classes), which can be used as search indices, partition keys, and input features for machine learning models. The product embedding, as well as the fined-tuned master model for a specific business task, can also be used for various transfer learning tasks.