Henry
Abstract:With the recent surge in interest surrounding generative paradigms, generative recommendation has increasingly attracted the attention of researchers in the recommendation community. This paradigm generally consists of two stages. In the first stage, pretrained semantic embeddings or collaborative ID embeddings are quantized to create item codes, aiming to capture and preserve rich semantic or collaborative knowledge within these codes. The second stage involves utilizing these discrete codes to perform an autoregressive sequence generation task. Existing methods often either overlook collaborative or semantic knowledge, or combine the two roughly. In this paper, we observe that naively concatenating representations from semantic and collaborative modality leads to a semantic domination issue, where the resulting representation is overly influenced by semantic information, effectively overshadowing the collaborative representation. Consequently, downstream recommendation tasks fail to fully exploit the knowledge from both modalities, resulting in suboptimal performance. To address this, we propose a progressive collaborative and semantic knowledge fusion model for generative recommendation, named PRORec, which integrates semantic and collaborative knowledge with a unified code through a two-stage framework. Specifically, in the first stage, we propose a cross-modality knowledge alignment task, which integrates semantic knowledge into collaborative embeddings, enhancing their representational capability. In the second stage, we propose an in-modality knowledge distillation task, designed to effectively capture and integrate knowledge from both semantic and collaborative modalities. Extensive experiments on three widely used benchmarks validate the effectiveness of our approach, demonstrating its superiority compared to existing methods.
Abstract:Cancer prognosis is a critical task that involves predicting patient outcomes and survival rates. To enhance prediction accuracy, previous studies have integrated diverse data modalities, such as clinical notes, medical images, and genomic data, leveraging their complementary information. However, existing approaches face two major limitations. First, they struggle to incorporate newly arrived data with varying distributions into training, such as patient records from different hospitals, thus rendering sub-optimal generalizability and limited utility in real-world applications. Second, most multimodal integration methods rely on simplistic concatenation or task-specific pipelines, which fail to capture the complex interdependencies across modalities. To address these, we propose a continually evolving multi-modal foundation model. Extensive experiments on the TCGA dataset demonstrate the effectiveness of our approach, highlighting its potential to advance cancer prognosis by enabling robust and adaptive multimodal integration.
Abstract:3D open-world classification is a challenging yet essential task in dynamic and unstructured real-world scenarios, requiring both open-category and open-pose recognition. To address these challenges, recent wisdom often takes sophisticated 2D pre-trained models to provide enriched and stable representations. However, these methods largely rely on how 3D objects can be projected into 2D space, which is unfortunately not well solved, and thus significantly limits their performance. Unlike these present efforts, in this paper we make a pioneering exploration of 3D generative models for 3D open-world classification. Drawing on abundant prior knowledge from 3D generative models, we additionally craft a rotation-invariant feature extractor. This innovative synergy endows our pipeline with the advantages of being training-free, open-category, and pose-invariant, thus well suited to 3D open-world classification. Extensive experiments on benchmark datasets demonstrate the potential of generative models in 3D open-world classification, achieving state-of-the-art performance on ModelNet10 and McGill with 32.0% and 8.7% overall accuracy improvement, respectively.
Abstract:Graph neural networks (GNNs) have exhibited superior performance in various classification tasks on graph-structured data. However, they encounter the potential vulnerability from the link stealing attacks, which can infer the presence of a link between two nodes via measuring the similarity of its incident nodes' prediction vectors produced by a GNN model. Such attacks pose severe security and privacy threats to the training graph used in GNN models. In this work, we propose a novel solution, called Graph Link Disguise (GRID), to defend against link stealing attacks with the formal guarantee of GNN model utility for retaining prediction accuracy. The key idea of GRID is to add carefully crafted noises to the nodes' prediction vectors for disguising adjacent nodes as n-hop indirect neighboring nodes. We take into account the graph topology and select only a subset of nodes (called core nodes) covering all links for adding noises, which can avert the noises offset and have the further advantages of reducing both the distortion loss and the computation cost. Our crafted noises can ensure 1) the noisy prediction vectors of any two adjacent nodes have their similarity level like that of two non-adjacent nodes and 2) the model prediction is unchanged to ensure zero utility loss. Extensive experiments on five datasets are conducted to show the effectiveness of our proposed GRID solution against different representative link-stealing attacks under transductive settings and inductive settings respectively, as well as two influence-based attacks. Meanwhile, it achieves a much better privacy-utility trade-off than existing methods when extended to GNNs.
Abstract:Federated Continual Learning (FCL) aims to enable sequentially privacy-preserving model training on streams of incoming data that vary in edge devices by preserving previous knowledge while adapting to new data. Current FCL literature focuses on restricted data privacy and access to previously seen data while imposing no constraints on the training overhead. This is unreasonable for FCL applications in real-world scenarios, where edge devices are primarily constrained by resources such as storage, computational budget, and label rate. We revisit this problem with a large-scale benchmark and analyze the performance of state-of-the-art FCL approaches under different resource-constrained settings. Various typical FCL techniques and six datasets in two incremental learning scenarios (Class-IL and Domain-IL) are involved in our experiments. Through extensive experiments amounting to a total of over 1,000+ GPU hours, we find that, under limited resource-constrained settings, existing FCL approaches, with no exception, fail to achieve the expected performance. Our conclusions are consistent in the sensitivity analysis. This suggests that most existing FCL methods are particularly too resource-dependent for real-world deployment. Moreover, we study the performance of typical FCL techniques with resource constraints and shed light on future research directions in FCL.
Abstract:In this paper, we propose an integrated sensing and communication (ISAC) system aided by the movable-antenna (MA) array, which can improve the communication and sensing performance via flexible antenna movement over conventional fixed-position antenna (FPA) array. First, we consider the downlink multiuser communication, where each user is randomly distributed within a given three-dimensional zone with local movement. To reduce the overhead of frequent antenna movement, the antenna position vector (APV) is designed based on users' statistical channel state information (CSI), so that the antennas only need to be moved in a large timescale. Then, for target sensing, the Cramer-Rao bounds (CRBs) of the estimation mean square error for different spatial angles of arrival (AoAs) are derived as functions of MAs' positions. Based on the above, we formulate an optimization problem to maximize the expected minimum achievable rate among all communication users, with given constraints on the maximum acceptable CRB thresholds for target sensing. An alternating optimization algorithm is proposed to iteratively optimize one of the horizontal and vertical APVs of the MA array with the other being fixed. Numerical results demonstrate that our proposed MA arrays can significantly enlarge the trade-off region between communication and sensing performance compared to conventional FPA arrays with different inter-antenna spacing. It is also revealed that the steering vectors of the designed MA arrays exhibit low correlation in the angular domain, thus effectively reducing channel correlation among communication users to enhance their achievable rates, while alleviating ambiguity in target angle estimation to achieve improved sensing accuracy.
Abstract:Fluid antenna system (FAS) and movable antenna (MA) have recently emerged as promising technologies to exploit new spatial degrees of freedom (DoFs), which have attracted growing attention in wireless communication. In this paper, we propose a new rotatable antenna (RA) model to improve the performance of wireless communication systems. Different from conventional fixed antennas, the proposed RA system can flexibly alter the three-dimensional (3D) boresight direction of each antenna independently by adjusting its deflection angles to achieve a desired array directional gain pattern. Specifically, we investigate an RA-enabled uplink communication system, where the receive beamforming and the deflection angles of all RAs at the base station (BS) are jointly optimized to maximize the minimum signal-to-interference-plus-noise ratio (SINR) among all the users. In the special single-user and free-space propagation setup, the optimal deflection angles of RAs are derived in closed form with the maximum-ratio combining (MRC) beamformer applied at the BS. Moreover, we analyze the asymptotic performance with an infinite number of antennas based on this solution, which theoretically proves that the RA system can achieve a higher array gain as compared to the fixed-antenna system. In the general multi-user and multi-path channel setup, we first propose an alternating optimization (AO) algorithm to alternately optimize the receive beamforming and the deflection angles of RAs in an iterative manner. Then, a two-stage algorithm that solves the formulated problem without the need for iteration is further proposed to reduce computational complexity. Simulation results are provided to validate our analytical results and demonstrate that the proposed RA system can significantly outperform other benchmark schemes.
Abstract:Music source separation and pitch estimation are two vital tasks in music information retrieval. Typically, the input of pitch estimation is obtained from the output of music source separation. Therefore, existing methods have tried to perform these two tasks simultaneously, so as to leverage the mutually beneficial relationship between both tasks. However, these methods still face two critical challenges that limit the improvement of both tasks: the lack of labeled data and joint learning optimization. To address these challenges, we propose a Model-Agnostic Joint Learning (MAJL) framework for both tasks. MAJL is a generic framework and can use variant models for each task. It includes a two-stage training method and a dynamic weighting method named Dynamic Weights on Hard Samples (DWHS), which addresses the lack of labeled data and joint learning optimization, respectively. Experimental results on public music datasets show that MAJL outperforms state-of-the-art methods on both tasks, with significant improvements of 0.92 in Signal-to-Distortion Ratio (SDR) for music source separation and 2.71% in Raw Pitch Accuracy (RPA) for pitch estimation. Furthermore, comprehensive studies not only validate the effectiveness of each component of MAJL, but also indicate the great generality of MAJL in adapting to different model architectures.
Abstract:The deployment of Large Language Models (LLMs) in recommender systems for predicting Click-Through Rates (CTR) necessitates a delicate balance between computational efficiency and predictive accuracy. This paper presents an optimization framework that combines Retrieval-Augmented Generation (RAG) with an innovative multi-head early exit architecture to concurrently enhance both aspects. By integrating Graph Convolutional Networks (GCNs) as efficient retrieval mechanisms, we are able to significantly reduce data retrieval times while maintaining high model performance. The early exit strategy employed allows for dynamic termination of model inference, utilizing real-time predictive confidence assessments across multiple heads. This not only quickens the responsiveness of LLMs but also upholds or improves their accuracy, making it ideal for real-time application scenarios. Our experiments demonstrate how this architecture effectively decreases computation time without sacrificing the accuracy needed for reliable recommendation delivery, establishing a new standard for efficient, real-time LLM deployment in commercial systems.
Abstract:In this letter, we propose a novel Movable Superdirective Pairs (MSP) approach that combines movable antennas with superdirective pair arrays to enhance the performance of millimeter-wave (mmWave) communications on the user side. By controlling the rotation angles and positions of superdirective antenna pairs, the proposed MSP approach maximizes the received signal-to-noise ratio (SNR) of multipath signals without relying on phase shifters or attenuators. This approach addresses the limitations of traditional superdirective antennas, which are typically restricted to the endfire direction and suffer from reduced scanning bandwidth and increased complexity. An efficient algorithm based on alternating optimization and the gradient projection method is developed to solve the non-convex optimization problem of antennas' joint rotating positioning. Simulation results demonstrate that the MSP approach achieves significant performance gains over fixed-position array (FPA) employing Maximum Ratio Combining (MRC), while reducing system complexity and hardware costs.