Abstract:This study proposes a method for imbalanced data classification based on deep probabilistic graphical models (DPGMs) to solve the problem that traditional methods have insufficient learning ability for minority class samples. To address the classification bias caused by class imbalance, we introduce variational inference optimization probability modeling, which enables the model to adaptively adjust the representation ability of minority classes and combines the class-aware weight adjustment strategy to enhance the classifier's sensitivity to minority classes. In addition, we combine the adversarial learning mechanism to generate minority class samples in the latent space so that the model can better characterize the category boundary in the high-dimensional feature space. The experiment is evaluated on the Kaggle "Credit Card Fraud Detection" dataset and compared with a variety of advanced imbalanced classification methods (such as GAN-based sampling, BRF, XGBoost-Cost Sensitive, SAAD, HAN). The results show that the method in this study has achieved the best performance in AUC, Precision, Recall and F1-score indicators, effectively improving the recognition rate of minority classes and reducing the false alarm rate. This method can be widely used in imbalanced classification tasks such as financial fraud detection, medical diagnosis, and anomaly detection, providing a new solution for related research.
Abstract:This paper proposes a data privacy protection framework based on federated learning, which aims to realize effective cross-domain data collaboration under the premise of ensuring data privacy through distributed learning. Federated learning greatly reduces the risk of privacy breaches by training the model locally on each client and sharing only model parameters rather than raw data. The experiment verifies the high efficiency and privacy protection ability of federated learning under different data sources through the simulation of medical, financial, and user data. The results show that federated learning can not only maintain high model performance in a multi-domain data environment but also ensure effective protection of data privacy. The research in this paper provides a new technical path for cross-domain data collaboration and promotes the application of large-scale data analysis and machine learning while protecting privacy.
Abstract:This paper proposes an algorithm based on a staged sliding window Transformer architecture to detect abnormal behaviors in the microstructure of the foreign exchange market, focusing on high-frequency EUR/USD trading data. The method captures multi-scale temporal features through a staged sliding window, extracts global and local dependencies by combining the self-attention mechanism and weighted attention mechanism of the Transformer, and uses a classifier to identify abnormal events. Experimental results on a real high-frequency dataset containing order book depth, spread, and trading volume show that the proposed method significantly outperforms traditional machine learning (such as decision trees and random forests) and deep learning methods (such as MLP, CNN, RNN, LSTM) in terms of accuracy (0.93), F1-Score (0.91), and AUC-ROC (0.95). Ablation experiments verify the contribution of each component, and the visualization of order book depth and anomaly detection further reveals the effectiveness of the model under complex market dynamics. Despite the false positive problem, the model still provides important support for market supervision. In the future, noise processing can be optimized and extended to other markets to improve generalization and real-time performance.
Abstract:This study proposes a risk pricing anomaly detection method for social network user portraits based on graph neural networks (GNNs), aiming to improve the ability to identify abnormal users in social network environments. In view of the limitations of traditional methods in social network data modeling, this paper combines graph autoencoders (GAEs) and graph attention networks (GATs) to achieve accurate detection of abnormal users through dynamic aggregation of neighbor features and reconstruction error evaluation. The Facebook Page-Page Network dataset is used in the experiment and compared with VAE, GNN, Transformer and GAE. The results show that the proposed method achieves the best performance in AUC, F1-score, Precision and Recall, verifying its effectiveness. In addition, this paper explores the computational efficiency of the model in large-scale data and looks forward to combining self-supervised learning, federated learning, and other technologies in the future to improve the robustness and privacy protection of risk assessment. The research results can provide efficient anomaly detection solutions for financial risk control, social security management, and other fields.
Abstract:This study proposes a dynamic rule data mining algorithm based on an improved Transformer architecture, aiming to improve the accuracy and efficiency of rule mining in a dynamic data environment. With the increase in data volume and complexity, traditional data mining methods are difficult to cope with dynamic data with strong temporal and variable characteristics, so new algorithms are needed to capture the temporal regularity in the data. By improving the Transformer architecture, and introducing a dynamic weight adjustment mechanism and a temporal dependency module, we enable the model to adapt to data changes and mine more accurate rules. Experimental results show that compared with traditional rule mining algorithms, the improved Transformer model has achieved significant improvements in rule mining accuracy, coverage, and stability. The contribution of each module in the algorithm performance is further verified by ablation experiments, proving the importance of temporal dependency and dynamic weight adjustment mechanisms in improving the model effect. In addition, although the improved model has certain challenges in computational efficiency, its advantages in accuracy and coverage enable it to perform well in processing complex dynamic data. Future research will focus on optimizing computational efficiency and combining more deep learning technologies to expand the application scope of the algorithm, especially in practical applications in the fields of finance, medical care, and intelligent recommendation.
Abstract:Fine-tuning pre-trained models on targeted datasets enhances task-specific performance but often comes at the expense of generalization. Model merging techniques, which integrate multiple fine-tuned models into a single multi-task model through task arithmetic at various levels: model, layer, or parameter, offer a promising solution. However, task interference remains a fundamental challenge, leading to performance degradation and suboptimal merged models. Existing approaches largely overlook the fundamental role of individual neurons and their connectivity, resulting in a lack of interpretability in both the merging process and the merged models. In this work, we present the first study on the impact of neuronal alignment in model merging. We decompose task-specific representations into two complementary neuronal subspaces that regulate neuron sensitivity and input adaptability. Leveraging this decomposition, we introduce NeuroMerging, a novel merging framework developed to mitigate task interference within neuronal subspaces, enabling training-free model fusion across diverse tasks. Through extensive experiments, we demonstrate that NeuroMerging achieves superior performance compared to existing methods on multi-task benchmarks across both vision and natural language domains. Our findings highlight the importance of aligning neuronal mechanisms in model merging, offering new insights into mitigating task interference and improving knowledge fusion.
Abstract:Bird's-eye-view (BEV) map layout estimation requires an accurate and full understanding of the semantics for the environmental elements around the ego car to make the results coherent and realistic. Due to the challenges posed by occlusion, unfavourable imaging conditions and low resolution, \emph{generating} the BEV semantic maps corresponding to corrupted or invalid areas in the perspective view (PV) is appealing very recently. \emph{The question is how to align the PV features with the generative models to facilitate the map estimation}. In this paper, we propose to utilize a generative model similar to the Vector Quantized-Variational AutoEncoder (VQ-VAE) to acquire prior knowledge for the high-level BEV semantics in the tokenized discrete space. Thanks to the obtained BEV tokens accompanied with a codebook embedding encapsulating the semantics for different BEV elements in the groundtruth maps, we are able to directly align the sparse backbone image features with the obtained BEV tokens from the discrete representation learning based on a specialized token decoder module, and finally generate high-quality BEV maps with the BEV codebook embedding serving as a bridge between PV and BEV. We evaluate the BEV map layout estimation performance of our model, termed VQ-Map, on both the nuScenes and Argoverse benchmarks, achieving 62.2/47.6 mean IoU for surround-view/monocular evaluation on nuScenes, as well as 73.4 IoU for monocular evaluation on Argoverse, which all set a new record for this map layout estimation task. The code and models are available on \url{https://github.com/Z1zyw/VQ-Map}.
Abstract:Federated learning enables the collaborative learning of a global model on diverse data, preserving data locality and eliminating the need to transfer user data to a central server. However, data privacy remains vulnerable, as attacks can target user training data by exploiting the updates sent by users during each learning iteration. Secure aggregation protocols are designed to mask/encrypt user updates and enable a central server to aggregate the masked information. MicroSecAgg (PoPETS 2024) proposes a single server secure aggregation protocol that aims to mitigate the high communication complexity of the existing approaches by enabling a one-time setup of the secret to be re-used in multiple training iterations. In this paper, we identify a security flaw in the MicroSecAgg that undermines its privacy guarantees. We detail the security flaw and our attack, demonstrating how an adversary can exploit predictable masking values to compromise user privacy. Our findings highlight the critical need for enhanced security measures in secure aggregation protocols, particularly the implementation of dynamic and unpredictable masking strategies. We propose potential countermeasures to mitigate these vulnerabilities and ensure robust privacy protection in the secure aggregation frameworks.
Abstract:Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts to generate textual content. Vision-language models (VLMs) stand at the forefront of this advancement, offering innovative ways to combine visual and textual data for enhanced understanding and interaction. However, this integration also enlarges the attack surface. Patch-based adversarial attack is considered the most realistic threat model in physical vision applications, as demonstrated in many existing literature. In this paper, we propose to address patched visual prompt injection, where adversaries exploit adversarial patches to generate target content in VLMs. Our investigation reveals that patched adversarial prompts exhibit sensitivity to pixel-wise randomization, a trait that remains robust even against adaptive attacks designed to counteract such defenses. Leveraging this insight, we introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, specifically tailored to protect VLMs from the threat of patched visual prompt injectors. Our framework significantly lowers the attack success rate to a range between 0% and 5.0% on two leading VLMs, while achieving around 67.3% to 95.0% context recovery of the benign images, demonstrating a balance between security and usability.
Abstract:In this paper, we propose a new image-based visual place recognition (VPR) framework by exploiting the structural cues in bird's-eye view (BEV) from a single monocular camera. The motivation arises from two key observations about VPR: 1) For the methods based on both camera and LiDAR sensors, the integration of LiDAR in robotic systems has led to increased expenses, while the alignment of data between different sensors is also a major challenge. 2) Other image-/camera-based methods, involving integrating RGB images and their derived variants (e.g., pseudo depth images, pseudo 3D point clouds), exhibit several limitations, such as the failure to effectively exploit the explicit spatial relationships between different objects. To tackle the above issues, we design a new BEV-enhanced VPR framework, nemely BEV2PR, which can generate a composite descriptor with both visual cues and spatial awareness solely based on a single camera. For the visual cues, any popular aggregation module for RGB global features can be integrated into our framework. The key points lie in: 1) We use BEV segmentation features as an explicit source of structural knowledge in constructing global features. 2) The lower layers of the pre-trained backbone from BEV map generation are shared for visual and structural streams in VPR, facilitating the learning of fine-grained local features in the visual stream. 3) The complementary visual features and structural features can jointly enhance VPR performance. Our BEV2PR framework enables consistent performance improvements over several popular camera-based VPR aggregation modules when integrating them. The experiments on our collected VPR-NuScenes dataset demonstrate an absolute gain of 2.47% on Recall@1 for the strong Conv-AP baseline to achieve the best performance in our setting, and notably, a 18.06% gain on the hard set.