Abstract:Representation learning is essential for deep-neural-network-based recommender systems to capture user preferences and item features within fixed-dimensional user and item vectors. Unlike existing representation learning methods that either treat each user preference and item feature uniformly or categorize them into discrete clusters, we argue that in the real world, user preferences and item features are naturally expressed and organized in a hierarchical manner, leading to a new direction for representation learning. In this paper, we introduce a novel matryoshka representation learning method for recommendation (MRL4Rec), by which we restructure user and item vectors into matryoshka representations with incrementally dimensional and overlapping vector spaces to explicitly represent user preferences and item features at different hierarchical levels. We theoretically establish that constructing training triplets specific to each level is pivotal in guaranteeing accurate matryoshka representation learning. Subsequently, we propose the matryoshka negative sampling mechanism to construct training triplets, which further ensures the effectiveness of the matryoshka representation learning in capturing hierarchical user preferences and item features. The experiments demonstrate that MRL4Rec can consistently and substantially outperform a number of state-of-the-art competitors on several real-life datasets. Our code is publicly available at https://github.com/Riwei-HEU/MRL.
Abstract:Truthfulness is paramount for large language models (LLMs) as they are increasingly deployed in real-world applications. However, existing LLMs still struggle with generating truthful content, as evidenced by their modest performance on benchmarks like TruthfulQA. To address this issue, we propose GRAdual self-truTHifying (GRATH), a novel post-processing method to enhance truthfulness of LLMs. GRATH utilizes out-of-domain question prompts to generate pairwise truthfulness training data with each pair containing a question and its correct and incorrect answers, and then optimizes the model via direct preference optimization (DPO) to learn from the truthfulness difference between answer pairs. GRATH iteratively refines truthfulness data and updates the model, leading to a gradual improvement in model truthfulness in a self-supervised manner. Empirically, we evaluate GRATH using different 7B-LLMs and compare with LLMs with similar or even larger sizes on benchmark datasets. Our results show that GRATH effectively improves LLMs' truthfulness without compromising other core capabilities. Notably, GRATH achieves state-of-the-art performance on TruthfulQA, with MC1 accuracy of 54.71% and MC2 accuracy of 69.10%, which even surpass those on 70B-LLMs.
Abstract:Recently, multimodal recommendations have gained increasing attention for effectively addressing the data sparsity problem by incorporating modality-based representations. Although multimodal recommendations excel in accuracy, the introduction of different modalities (e.g., images, text, and audio) may expose more users' sensitive information (e.g., gender and age) to recommender systems, resulting in potentially more serious unfairness issues. Despite many efforts on fairness, existing fairness-aware methods are either incompatible with multimodal scenarios, or lead to suboptimal fairness performance due to neglecting sensitive information of multimodal content. To achieve counterfactual fairness in multimodal recommendations, we propose a novel fairness-aware multimodal recommendation approach (dubbed as FMMRec) to disentangle the sensitive and non-sensitive information from modal representations and leverage the disentangled modal representations to guide fairer representation learning. Specifically, we first disentangle biased and filtered modal representations by maximizing and minimizing their sensitive attribute prediction ability respectively. With the disentangled modal representations, we mine the modality-based unfair and fair (corresponding to biased and filtered) user-user structures for enhancing explicit user representation with the biased and filtered neighbors from the corresponding structures, followed by adversarially filtering out sensitive information. Experiments on two real-world public datasets demonstrate the superiority of our FMMRec relative to the state-of-the-art baselines. Our source code is available at https://anonymous.4open.science/r/FMMRec.
Abstract:Generative Pre-trained Transformer (GPT) models have exhibited exciting progress in capabilities, capturing the interest of practitioners and the public alike. Yet, while the literature on the trustworthiness of GPT models remains limited, practitioners have proposed employing capable GPT models for sensitive applications to healthcare and finance - where mistakes can be costly. To this end, this work proposes a comprehensive trustworthiness evaluation for large language models with a focus on GPT-4 and GPT-3.5, considering diverse perspectives - including toxicity, stereotype bias, adversarial robustness, out-of-distribution robustness, robustness on adversarial demonstrations, privacy, machine ethics, and fairness. Based on our evaluations, we discover previously unpublished vulnerabilities to trustworthiness threats. For instance, we find that GPT models can be easily misled to generate toxic and biased outputs and leak private information in both training data and conversation history. We also find that although GPT-4 is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more vulnerable given jailbreaking system or user prompts, potentially due to the reason that GPT-4 follows the (misleading) instructions more precisely. Our work illustrates a comprehensive trustworthiness evaluation of GPT models and sheds light on the trustworthiness gaps. Our benchmark is publicly available at https://decodingtrust.github.io/.
Abstract:Diffusion models have achieved great success in a range of tasks, such as image synthesis and molecule design. As such successes hinge on large-scale training data collected from diverse sources, the trustworthiness of these collected data is hard to control or audit. In this work, we aim to explore the vulnerabilities of diffusion models under potential training data manipulations and try to answer: How hard is it to perform Trojan attacks on well-trained diffusion models? What are the adversarial targets that such Trojan attacks can achieve? To answer these questions, we propose an effective Trojan attack against diffusion models, TrojDiff, which optimizes the Trojan diffusion and generative processes during training. In particular, we design novel transitions during the Trojan diffusion process to diffuse adversarial targets into a biased Gaussian distribution and propose a new parameterization of the Trojan generative process that leads to an effective training objective for the attack. In addition, we consider three types of adversarial targets: the Trojaned diffusion models will always output instances belonging to a certain class from the in-domain distribution (In-D2D attack), out-of-domain distribution (Out-D2D-attack), and one specific instance (D2I attack). We evaluate TrojDiff on CIFAR-10 and CelebA datasets against both DDPM and DDIM diffusion models. We show that TrojDiff always achieves high attack performance under different adversarial targets using different types of triggers, while the performance in benign environments is preserved. The code is available at https://github.com/chenweixin107/TrojDiff.
Abstract:This paper considers a scenario in which the Terahertz (THz) transmitter equipped with a linear antenna array wishes to focus its beam to a desired spatial region in the array near-field. The goal is to compute the achievable spatial region and determine how the system parameters such as the carrier frequency, the array dimension and the user's location affect its beam focusing performance. First, based on a theorem from analytic geometry, we show that the achievable focusing spatial region constitutes a rotated ellipse, with the x and y coordinates denoting the range and angle, respectively. In this way, the determination of the spatial region is reduced to a problem of deriving the coverage of an ellipse. The achievable coverage is then obtained in closed form, and the construction of carrier frequency offsets that can analytically control the beam focusing performance is provided. Numerical results validate the theoretical findings and demonstrate the performance of the proposed method.
Abstract:Few-shot learning is challenging due to the limited data and labels. Existing algorithms usually resolve this problem by pre-training the model with a considerable amount of annotated data which shares knowledge with the target domain. Nevertheless, large quantities of homogenous data samples are not always available. To tackle this issue, we develop a framework that enables the model to surf the Internet, which implies that the model can collect and annotate data without manual effort. Since the online data is virtually limitless and continues to be generated, the model can thus be empowered to constantly obtain up-to-date knowledge from the Internet. Additionally, we observe that the generalization ability of the learned representation is crucial for self-supervised learning. To present its importance, a naive yet efficient normalization strategy is proposed. Consequentially, this strategy boosts the accuracy of the model significantly (20.46% at most). We demonstrate the superiority of the proposed framework with experiments on miniImageNet, tieredImageNet and Omniglot. The results indicate that our method has surpassed previous unsupervised counterparts by a large margin (more than 10%) and obtained performance comparable with the supervised ones.