Huazhong University of Science and Technology
Abstract:Biomedical knowledge is uniquely complex and structured, requiring distinct reasoning strategies compared to other scientific disciplines like physics or chemistry. Biomedical scientists do not rely on a single approach to reasoning; instead, they use various strategies, including rule-based, prototype-based, and case-based reasoning. This diversity calls for flexible approaches that accommodate multiple reasoning strategies while leveraging in-domain knowledge. We introduce KGARevion, a knowledge graph (KG) based agent designed to address the complexity of knowledge-intensive medical queries. Upon receiving a query, KGARevion generates relevant triplets by using the knowledge base of the LLM. These triplets are then verified against a grounded KG to filter out erroneous information and ensure that only accurate, relevant data contribute to the final answer. Unlike RAG-based models, this multi-step process ensures robustness in reasoning while adapting to different models of medical reasoning. Evaluations on four gold-standard medical QA datasets show that KGARevion improves accuracy by over 5.2%, outperforming 15 models in handling complex medical questions. To test its capabilities, we curated three new medical QA datasets with varying levels of semantic complexity, where KGARevion achieved a 10.4% improvement in accuracy.
Abstract:Generating images with accurately represented text, especially in non-Latin languages, poses a significant challenge for diffusion models. Existing approaches, such as the integration of hint condition diagrams via auxiliary networks (e.g., ControlNet), have made strides towards addressing this issue. However, diffusion models often fall short in tasks requiring controlled text generation, such as specifying particular fonts or producing text in small fonts. In this paper, we introduce a novel approach for multilingual visual text creation, named JoyType, designed to maintain the font style of text during the image generation process. Our methodology begins with assembling a training dataset, JoyType-1M, comprising 1 million pairs of data. Each pair includes an image, its description, and glyph instructions corresponding to the font style within the image. We then developed a text control network, Font ControlNet, tasked with extracting font style information to steer the image generation. To further enhance our model's ability to maintain font style, notably in generating small-font text, we incorporated a multi-layer OCR-aware loss into the diffusion process. This enhancement allows JoyType to direct text rendering using low-level descriptors. Our evaluations, based on both visual and accuracy metrics, demonstrate that JoyType significantly outperforms existing state-of-the-art methods. Additionally, JoyType can function as a plugin, facilitating the creation of varied image styles in conjunction with other stable diffusion models on HuggingFace and CivitAI. Our project is open-sourced on https://jdh-algo.github.io/JoyType/.
Abstract:Recent advancements in unsupervised representation learning often leverage class information to enhance feature extraction and clustering performance. However, this reliance on class priors limits the applicability of such methods in real-world scenarios where class information is unavailable or ambiguous. In this paper, we propose Contrastive Disentangling (CD), a simple and effective framework that learns representations without any reliance on class priors. Our framework employs a multi-level contrastive learning strategy that combines instance-level and feature-level losses with a normalized entropy loss to learn semantically rich and fine-grained representations. Specifically, (1) the instance-level contrastive loss encourages the separation of feature representations for different samples, (2) the feature-level contrastive loss promotes independence among the feature head predictions, and (3) the normalized entropy loss encourages the feature heads to capture meaningful and prevalent attributes from the data. These components work together to enable CD to significantly outperform existing methods, as demonstrated by extensive experiments on benchmark datasets including CIFAR-10, CIFAR-100, STL-10, and ImageNet-10, particularly in scenarios where class priors are absent. The code is available at https://github.com/Hoper-J/Contrastive-Disentangling.
Abstract:Pretraining has been widely explored to augment the adaptability of graph learning models to transfer knowledge from large datasets to a downstream task, such as link prediction or classification. However, the gap between training objectives and the discrepancy between data distributions in pretraining and downstream tasks hinders the transfer of the pretrained knowledge. Inspired by instruction-based prompts widely used in pretrained language models, we introduce instructions into graph pretraining. In this paper, we propose a novel pretraining framework named Instruction-based Hypergraph Pretraining. To overcome the discrepancy between pretraining and downstream tasks, text-based instructions are applied to provide explicit guidance on specific tasks for representation learning. Compared to learnable prompts, whose effectiveness depends on the quality and the diversity of training data, text-based instructions intrinsically encapsulate task information and support the model to generalize beyond the structure seen during pretraining. To capture high-order relations with task information in a context-aware manner, a novel prompting hypergraph convolution layer is devised to integrate instructions into information propagation in hypergraphs. Extensive experiments conducted on three public datasets verify the superiority of IHP in various scenarios.
Abstract:In recent years, knowledge distillation methods based on contrastive learning have achieved promising results on image classification and object detection tasks. However, in this line of research, we note that less attention is paid to semantic segmentation. Existing methods heavily rely on data augmentation and memory buffer, which entail high computational resource demands when applying them to handle semantic segmentation that requires to preserve high-resolution feature maps for making dense pixel-wise predictions. In order to address this problem, we present Augmentation-free Dense Contrastive Knowledge Distillation (Af-DCD), a new contrastive distillation learning paradigm to train compact and accurate deep neural networks for semantic segmentation applications. Af-DCD leverages a masked feature mimicking strategy, and formulates a novel contrastive learning loss via taking advantage of tactful feature partitions across both channel and spatial dimensions, allowing to effectively transfer dense and structured local knowledge learnt by the teacher model to a target student model while maintaining training efficiency. Extensive experiments on five mainstream benchmarks with various teacher-student network pairs demonstrate the effectiveness of our approach. For instance, the DeepLabV3-Res18|DeepLabV3-MBV2 model trained by Af-DCD reaches 77.03%|76.38% mIOU on Cityscapes dataset when choosing DeepLabV3-Res101 as the teacher, setting new performance records. Besides that, Af-DCD achieves an absolute mIOU improvement of 3.26%|3.04%|2.75%|2.30%|1.42% compared with individually trained counterpart on Cityscapes|Pascal VOC|Camvid|ADE20K|COCO-Stuff-164K. Code is available at https://github.com/OSVAI/Af-DCD
Abstract:Social networks have become essential for people's lives. The proliferation of web services further expands social networks at an unprecedented scale, leading to immeasurable commercial value for online platforms. Recently, the group buying (GB) business mode is prevalent and also becoming more popular in E-commerce. GB explicitly forms groups of users with similar interests to secure better discounts from the merchants, often operating within social networks. It is a novel way to further unlock the commercial value by explicitly utilizing the online social network in E-commerce. Participant recommendation, a fundamental problem emerging together with GB, aims to find the participants for a launched group buying process with an initiator and a target item to increase the GB success rate. This paper proposes Multi-View Graph Convolution for Participant Recommendation (MVPRec) to tackle this problem. To differentiate the roles of users (Initiator/Participant) within the GB process, we explicitly reconstruct historical GB data into initiator-view and participant-view graphs. Together with the social graph, we obtain a multi-view user representation with graph encoders. Then MVPRec fuses the GB and social representation with an attention module to obtain the user representation and learns a matching score with the initiator's social friends via a multi-head attention mechanism. Social friends with the Top-k matching score are recommended for the corresponding GB process. Experiments on three datasets justify the effectiveness of MVPRec in the emerging participant recommendation problem.
Abstract:Personalized recommender systems aim to predict users' preferences for items. It has become an indispensable part of online services. Online social platforms enable users to form groups based on their common interests. The users' group participation on social platforms reveals their interests and can be utilized as side information to mitigate the data sparsity and cold-start problem in recommender systems. Users join different groups out of different interests. In this paper, we generate group representation from the user's interests and propose IGRec (Interest-based Group enhanced Recommendation) to utilize the group information accurately. It consists of four modules. (1) Interest disentangler via self-gating that disentangles users' interests from their initial embedding representation. (2) Interest aggregator that generates the interest-based group representation by Gumbel-Softmax aggregation on the group members' interests. (3) Interest-based group aggregation that fuses user's representation with the participated group representation. (4) A dual-trained rating prediction module to utilize both user-item and group-item interactions. We conduct extensive experiments on three publicly available datasets. Results show IGRec can effectively alleviate the data sparsity problem and enhance the recommender system with interest-based group representation. Experiments on the group recommendation task further show the informativeness of interest-based group representation.
Abstract:Although pretraining has garnered significant attention and popularity in recent years, its application in graph-based recommender systems is relatively limited. It is challenging to exploit prior knowledge by pretraining in widely used ID-dependent datasets. On one hand, user-item interaction history in one dataset can hardly be transferred to other datasets through pretraining, where IDs are different. On the other hand, pretraining and finetuning on the same dataset leads to a high risk of overfitting. In this paper, we propose a novel multitask pretraining framework named Unified Pretraining for Recommendation via Task Hypergraphs. For a unified learning pattern to handle diverse requirements and nuances of various pretext tasks, we design task hypergraphs to generalize pretext tasks to hyperedge prediction. A novel transitional attention layer is devised to discriminatively learn the relevance between each pretext task and recommendation. Experimental results on three benchmark datasets verify the superiority of UPRTH. Additional detailed investigations are conducted to demonstrate the effectiveness of the proposed framework.
Abstract:The field of Recommender Systems (RecSys) has been extensively studied to enhance accuracy by leveraging users' historical interactions. Nonetheless, this persistent pursuit of accuracy frequently engenders diminished diversity, culminating in the well-recognized "echo chamber" phenomenon. Diversified RecSys has emerged as a countermeasure, placing diversity on par with accuracy and garnering noteworthy attention from academic circles and industry practitioners. This research explores the realm of diversified RecSys within the intricate context of knowledge graphs (KG). These KGs act as repositories of interconnected information concerning entities and items, offering a propitious avenue to amplify recommendation diversity through the incorporation of insightful contextual information. Our contributions include introducing an innovative metric, Entity Coverage, and Relation Coverage, which effectively quantifies diversity within the KG domain. Additionally, we introduce the Diversified Embedding Learning (DEL) module, meticulously designed to formulate user representations that possess an innate awareness of diversity. In tandem with this, we introduce a novel technique named Conditional Alignment and Uniformity (CAU). It adeptly encodes KG item embeddings while preserving contextual integrity. Collectively, our contributions signify a substantial stride towards augmenting the panorama of recommendation diversity within the realm of KG-informed RecSys paradigms.
Abstract:Traditional recommender systems have heavily relied on identity representations (IDs) to model users and items, while the ascendancy of pre-trained language model (PLM) encoders has enriched the modeling of contextual item descriptions. However, PLMs, although effective in addressing few-shot, zero-shot, or unified modeling scenarios, often neglect the crucial collaborative filtering signal. This neglect gives rise to two pressing challenges: (1) Collaborative Contextualization, the seamless integration of collaborative signals with contextual representations. (2) the imperative to bridge the representation gap between ID-based representations and contextual representations while preserving their contextual semantics. In this paper, we propose CollabContext, a novel model that adeptly combines collaborative filtering signals with contextual representations and aligns these representations within the contextual space, preserving essential contextual semantics. Experimental results across three real-world datasets demonstrate substantial improvements. Leveraging collaborative contextualization, CollabContext can also be effectively applied to cold-start scenarios, achieving remarkable enhancements in recommendation performance. The code is available after the conference accepts the paper.