Soochow University
Abstract:Predicting spatio-temporal traffic flow presents significant challenges due to complex interactions between spatial and temporal factors. Existing approaches often address these dimensions in isolation, neglecting their critical interdependencies. In this paper, we introduce the Spatio-Temporal Unitized Model (STUM), a unified framework designed to capture both spatial and temporal dependencies while addressing spatio-temporal heterogeneity through techniques such as distribution alignment and feature fusion. It also ensures both predictive accuracy and computational efficiency. Central to STUM is the Adaptive Spatio-temporal Unitized Cell (ASTUC), which utilizes low-rank matrices to seamlessly store, update, and interact with space, time, as well as their correlations. Our framework is also modular, allowing it to integrate with various spatio-temporal graph neural networks through components such as backbone models, feature extractors, residual fusion blocks, and predictive modules to collectively enhance forecasting outcomes. Experimental results across multiple real-world datasets demonstrate that STUM consistently improves prediction performance with minimal computational cost. These findings are further supported by hyperparameter optimization, pre-training analysis, and result visualization. We provide our source code for reproducibility at https://anonymous.4open.science/r/STUM-E4F0.
Abstract:Predicting single-cell perturbation responses requires mapping between two unpaired single-cell data distributions. Optimal transport (OT) theory provides a principled framework for constructing such mappings by minimizing transport cost. Recently, Wasserstein-2 ($W_2$) neural optimal transport solvers (\textit{e.g.}, CellOT) have been employed for this prediction task. However, $W_2$ OT relies on the general Kantorovich dual formulation, which involves optimizing over two conjugate functions, leading to a complex min-max optimization problem that converges slowly. To address these challenges, we propose a novel solver based on the Wasserstein-1 ($W_1$) dual formulation. Unlike $W_2$, the $W_1$ dual simplifies the optimization to a maximization problem over a single 1-Lipschitz function, thus eliminating the need for time-consuming min-max optimization. While solving the $W_1$ dual only reveals the transport direction and does not directly provide a unique optimal transport map, we incorporate an additional step using adversarial training to determine an appropriate transport step size, effectively recovering the transport map. Our experiments demonstrate that the proposed $W_1$ neural optimal transport solver can mimic the $W_2$ OT solvers in finding a unique and ``monotonic" map on 2D datasets. Moreover, the $W_1$ OT solver achieves performance on par with or surpasses $W_2$ OT solvers on real single-cell perturbation datasets. Furthermore, we show that $W_1$ OT solver achieves $25 \sim 45\times$ speedup, scales better on high dimensional transportation task, and can be directly applied on single-cell RNA-seq dataset with highly variable genes. Our implementation and experiments are open-sourced at \url{https://github.com/poseidonchan/w1ot}.
Abstract:In recent years, large language models (LLMs) have significantly advanced the field of natural language processing (NLP). By fine-tuning LLMs with data from specific scenarios, these foundation models can better adapt to various downstream tasks. However, the fine-tuning process poses privacy leakage risks, particularly in centralized data processing scenarios. To address user privacy concerns, federated learning (FL) has been introduced to mitigate the risks associated with centralized data collection from multiple sources. Nevertheless, the privacy of LLMs themselves is equally critical, as potential malicious attacks challenge their security, an issue that has received limited attention in current research. Consequently, establishing a trusted multi-party model fine-tuning environment is essential. Additionally, the local deployment of large LLMs incurs significant storage costs and high computational demands. To address these challenges, we propose for the first time a federated discrete and transferable prompt tuning, namely FedDTPT, for black-box large language models. In the client optimization phase, we adopt a token-level discrete prompt optimization method that leverages a feedback loop based on prediction accuracy to drive gradient-free prompt optimization through the MLM API. For server optimization, we employ an attention mechanism based on semantic similarity to filter all local prompt tokens, along with an embedding distance elbow detection and DBSCAN clustering strategy to enhance the filtering process. Experimental results demonstrate that, compared to state-of-the-art methods, our approach achieves higher accuracy, reduced communication overhead, and robustness to non-iid data in a black-box setting. Moreover, the optimized prompts are transferable.
Abstract:Text-to-SQL generation aims to translate natural language questions into SQL statements. In large language models (LLMs) based Text-to-SQL, schema linking is a widely adopted strategy to streamline the input for LLMs by selecting only relevant schema elements, therefore reducing noise and computational overhead. However, schema linking faces risks that requires caution, including the potential omission of necessary elements and disruption of database structural integrity. To address these challenges, we propose a novel framework called RSL-SQL that combines bidirectional schema linking, contextual information augmentation, binary selection strategy, and multi-turn self-correction. Our approach improves the recall of schema linking through forward and backward pruning and hedges the risk by voting between full schema and contextual information augmented simplified schema. Experiments on the BIRD and Spider benchmarks demonstrate that our approach achieves state-of-the-art execution accuracy among open-source solutions, with 67.2% on BIRD and 87.9% on Spider using GPT-4o. Furthermore, our approach outperforms a series of GPT-4 based Text-to-SQL systems when adopting DeepSeek (much cheaper) with same intact prompts. Extensive analysis and ablation studies confirm the effectiveness of each component in our framework. The codes are available at https://github.com/Laqcce-cao/RSL-SQL.
Abstract:Instruction-following capabilities in large language models (LLMs) have significantly progressed, enabling more complex user interactions through detailed prompts. However, retrieval systems have not matched these advances, most of them still relies on traditional lexical and semantic matching techniques that fail to fully capture user intent. Recent efforts have introduced instruction-aware retrieval models, but these primarily focus on intrinsic content relevance, which neglects the importance of customized preferences for broader document-level attributes. This study evaluates the instruction-following capabilities of various retrieval models beyond content relevance, including LLM-based dense retrieval and reranking models. We develop InfoSearch, a novel retrieval evaluation benchmark spanning six document-level attributes: Audience, Keyword, Format, Language, Length, and Source, and introduce novel metrics -- Strict Instruction Compliance Ratio (SICR) and Weighted Instruction Sensitivity Evaluation (WISE) to accurately assess the models' responsiveness to instructions. Our findings reveal that while reranking models generally surpass retrieval models in instruction following, they still face challenges in handling certain attributes. Moreover, although instruction fine-tuning and increased model size lead to better performance, most models fall short of achieving comprehensive instruction compliance as assessed by our benchmark.
Abstract:Despite ongoing efforts to defend neural classifiers from adversarial attacks, they remain vulnerable, especially to unseen attacks. In contrast, humans are difficult to be cheated by subtle manipulations, since we make judgments only based on essential factors. Inspired by this observation, we attempt to model label generation with essential label-causative factors and incorporate label-non-causative factors to assist data generation. For an adversarial example, we aim to discriminate the perturbations as non-causative factors and make predictions only based on the label-causative factors. Concretely, we propose a casual diffusion model (CausalDiff) that adapts diffusion models for conditional data generation and disentangles the two types of casual factors by learning towards a novel casual information bottleneck objective. Empirically, CausalDiff has significantly outperformed state-of-the-art defense methods on various unseen attacks, achieving an average robustness of 86.39% (+4.01%) on CIFAR-10, 56.25% (+3.13%) on CIFAR-100, and 82.62% (+4.93%) on GTSRB (German Traffic Sign Recognition Benchmark).
Abstract:As the volume of image data grows, data-oriented cloud computing in Internet of Video Things (IoVT) systems encounters latency issues. Task-oriented edge computing addresses this by shifting data analysis to the edge. However, limited computational power of edge devices poses challenges for executing visual tasks. Existing methods struggle to balance high model performance with low resource consumption; lightweight neural networks often underperform, while device-specific models designed by Neural Architecture Search (NAS) fail to adapt to heterogeneous devices. For these issues, we propose a novel co-design framework to optimize neural network architecture and deployment strategies during inference for high-throughput. Specifically, it implements a dynamic model structure based on re-parameterization, coupled with a Roofline-based model partitioning strategy to enhance the computational performance of edge devices. We also employ a multi-objective co-optimization approach to balance throughput and accuracy. Additionally, we derive mathematical consistency and convergence of partitioned models. Experimental results demonstrate significant improvements in throughput (12.05\% on MNIST, 18.83\% on ImageNet) and superior classification accuracy compared to baseline algorithms. Our method consistently achieves stable performance across different devices, underscoring its adaptability. Simulated experiments further confirm its efficacy in high-accuracy, real-time detection for small objects in IoVT systems.
Abstract:This paper proposes an unsupervised deep-learning (DL) approach by integrating transformer and Kolmogorov-Arnold networks (KAN) termed KANsformer to realize scalable beamforming for mobile communication systems. Specifically, we consider a classic multi-input-single-output energy efficiency maximization problem subject to the total power budget. The proposed KANsformer first extracts hidden features via a multi-head self-attention mechanism and then reads out the desired beamforming design via KAN. Numerical results are provided to evaluate the KANsformer in terms of generalization performance, transfer learning and ablation experiment. Overall, the KANsformer outperforms existing benchmark DL approaches, and is adaptable to the change in the number of mobile users with real-time and near-optimal inference.
Abstract:As trustworthy AI continues to advance, the fairness issue in recommendations has received increasing attention. A recommender system is considered unfair when it produces unequal outcomes for different user groups based on user-sensitive attributes (e.g., age, gender). Some researchers have proposed data augmentation-based methods aiming at alleviating user-level unfairness by altering the skewed distribution of training data among various user groups. Despite yielding promising results, they often rely on fairness-related assumptions that may not align with reality, potentially reducing the data quality and negatively affecting model effectiveness. To tackle this issue, in this paper, we study how to implement high-quality data augmentation to improve recommendation fairness. Specifically, we propose FairDgcl, a dynamic graph adversarial contrastive learning framework aiming at improving fairness in recommender system. First, FairDgcl develops an adversarial contrastive network with a view generator and a view discriminator to learn generating fair augmentation strategies in an adversarial style. Then, we propose two dynamic, learnable models to generate contrastive views within contrastive learning framework, which automatically fine-tune the augmentation strategies. Meanwhile, we theoretically show that FairDgcl can simultaneously generate enhanced representations that possess both fairness and accuracy. Lastly, comprehensive experiments conducted on four real-world datasets demonstrate the effectiveness of the proposed FairDgcl.
Abstract:The widespread deployment of sensing devices leads to a surge in data for spatio-temporal forecasting applications such as traffic flow, air quality, and wind energy. Although spatio-temporal graph neural networks have achieved success in modeling various static spatio-temporal forecasting scenarios, real-world spatio-temporal data are typically received in a streaming manner, and the network continuously expands with the installation of new sensors. Thus, spatio-temporal forecasting in streaming scenarios faces dual challenges: the inefficiency of retraining models over newly arrived data and the detrimental effects of catastrophic forgetting over long-term history. To address these challenges, we propose a novel prompt tuning-based continuous forecasting method, following two fundamental tuning principles guided by empirical and theoretical analysis: expand and compress, which effectively resolve the aforementioned problems with lightweight tuning parameters. Specifically, we integrate the base spatio-temporal graph neural network with a continuous prompt pool, utilizing stored prompts (i.e., few learnable parameters) in memory, and jointly optimize them with the base spatio-temporal graph neural network. This method ensures that the model sequentially learns from the spatio-temporal data stream to accomplish tasks for corresponding periods. Extensive experimental results on multiple real-world datasets demonstrate the multi-faceted superiority of our method over the state-of-the-art baselines, including effectiveness, efficiency, universality, etc.