Abstract:Domain Large Language Models (LLMs) are developed for domain-specific tasks based on general LLMs. But it still requires professional knowledge to facilitate the expertise for some domain-specific tasks. In this paper, we investigate into knowledge-intensive calculation problems. We find that the math problems to be challenging for LLMs, when involving complex domain-specific rules and knowledge documents, rather than simple formulations of terminologies. Therefore, we propose a pipeline to solve the domain-specific calculation problems with Knowledge-Intensive Programs Generator more effectively, named as KIPG. It generates knowledge-intensive programs according to the domain-specific documents. For each query, key variables are extracted, then outcomes which are dependent on domain knowledge are calculated with the programs. By iterative preference alignment, the code generator learns to improve the logic consistency with the domain knowledge. Taking legal domain as an example, we have conducted experiments to prove the effectiveness of our pipeline, and extensive analysis on the modules. We also find that the code generator is also adaptable to other domains, without training on the new knowledge.
Abstract:Graph foundation models (GFMs) have recently gained significant attention. However, the unique data processing and evaluation setups employed by different studies hinder a deeper understanding of their progress. Additionally, current research tends to focus on specific subsets of graph learning tasks, such as structural tasks, node-level tasks, or classification tasks. As a result, they often incorporate specialized modules tailored to particular task types, losing their applicability to other graph learning tasks and contradicting the original intent of foundation models to be universal. Therefore, to enhance consistency, coverage, and diversity across domains, tasks, and research interests within the graph learning community in the evaluation of GFMs, we propose GFMBench-a systematic and comprehensive benchmark comprising 26 datasets. Moreover, we introduce LangGFM, a novel GFM that relies entirely on large language models. By revisiting and exploring the effective graph textualization principles, as well as repurposing successful techniques from graph augmentation and graph self-supervised learning within the language space, LangGFM achieves performance on par with or exceeding the state of the art across GFMBench, which can offer us new perspectives, experiences, and baselines to drive forward the evolution of GFMs.
Abstract:Large Language Models (LLMs) could struggle to fully understand legal theories and perform complex legal reasoning tasks. In this study, we introduce a challenging task (confusing charge prediction) to better evaluate LLMs' understanding of legal theories and reasoning capabilities. We also propose a novel framework: Multi-Agent framework for improving complex Legal Reasoning capability (MALR). MALR employs non-parametric learning, encouraging LLMs to automatically decompose complex legal tasks and mimic human learning process to extract insights from legal rules, helping LLMs better understand legal theories and enhance their legal reasoning abilities. Extensive experiments on multiple real-world datasets demonstrate that the proposed framework effectively addresses complex reasoning issues in practical scenarios, paving the way for more reliable applications in the legal domain.
Abstract:Previous studies on music style transfer have mainly focused on one-to-one style conversion, which is relatively limited. When considering the conversion between multiple styles, previous methods required designing multiple modes to disentangle the complex style of the music, resulting in large computational costs and slow audio generation. The existing music style transfer methods generate spectrograms with artifacts, leading to significant noise in the generated audio. To address these issues, this study proposes a music style transfer framework based on diffusion models (DM) and uses spectrogram-based methods to achieve multi-to-multi music style transfer. The GuideDiff method is used to restore spectrograms to high-fidelity audio, accelerating audio generation speed and reducing noise in the generated audio. Experimental results show that our model has good performance in multi-mode music style transfer compared to the baseline and can generate high-quality audio in real-time on consumer-grade GPUs.
Abstract:Convolutional neural networks (CNNs) with large kernels, drawing inspiration from the key operations of vision transformers (ViTs), have demonstrated impressive performance in various vision-based applications. To address the issue of computational efficiency degradation in existing designs for supporting large-kernel convolutions, an FPGA-based inference accelerator is proposed for the efficient deployment of CNNs with arbitrary kernel sizes. Firstly, a Z-flow method is presented to optimize the computing data flow by maximizing data reuse opportunity. Besides, the proposed design, incorporating the kernel-segmentation (Kseg) scheme, enables extended support for large-kernel convolutions, significantly reducing the storage requirements for overlapped data. Moreover, based on the analysis of typical block structures in emerging CNNs, vertical-fused (VF) and horizontal-fused (HF) methods are developed to optimize CNN deployments from both computation and transmission perspectives. The proposed hardware accelerator, evaluated on Intel Arria 10 FPGA, achieves up to 3.91 times better DSP efficiency than prior art on the same network. Particularly, it demonstrates efficient support for large-kernel CNNs, achieving throughputs of 169.68 GOPS and 244.55 GOPS for RepLKNet-31 and PyConvResNet-50, respectively, both of which are implemented on hardware for the first time.
Abstract:Extreme edge platforms, such as in-vehicle smart devices, require efficient deployment of quantized deep neural networks (DNNs) to enable intelligent applications with limited amounts of energy, memory, and computing resources. However, many edge devices struggle to boost inference throughput of various quantized DNNs due to the varying quantization levels, and these devices lack floating-point (FP) support for on-device learning, which prevents them from improving model accuracy while ensuring data privacy. To tackle the challenges above, we propose a precision-scalable RISC-V DNN processor with on-device learning capability. It facilitates diverse precision levels of fixed-point DNN inference, spanning from 2-bit to 16-bit, and enhances on-device learning through improved support with FP16 operations. Moreover, we employ multiple methods such as FP16 multiplier reuse and multi-precision integer multiplier reuse, along with balanced mapping of FPGA resources, to significantly improve hardware resource utilization. Experimental results on the Xilinx ZCU102 FPGA show that our processor significantly improves inference throughput by 1.6$\sim$14.6$\times$ and energy efficiency by 1.1$\sim$14.6$\times$ across various DNNs, compared to the prior art, XpulpNN. Additionally, our processor achieves a 16.5$\times$ higher FP throughput for on-device learning.
Abstract:Recent studies have shown that dense retrieval models, lacking dedicated training data, struggle to perform well across diverse retrieval tasks, as different retrieval tasks often entail distinct search intents. To address this challenge, in this work we introduce ControlRetriever, a generic and efficient approach with a parameter isolated architecture, capable of controlling dense retrieval models to directly perform varied retrieval tasks, harnessing the power of instructions that explicitly describe retrieval intents in natural language. Leveraging the foundation of ControlNet, which has proven powerful in text-to-image generation, ControlRetriever imbues different retrieval models with the new capacity of controllable retrieval, all while being guided by task-specific instructions. Furthermore, we propose a novel LLM guided Instruction Synthesizing and Iterative Training strategy, which iteratively tunes ControlRetriever based on extensive automatically-generated retrieval data with diverse instructions by capitalizing the advancement of large language models. Extensive experiments show that in the BEIR benchmark, with only natural language descriptions of specific retrieval intent for each task, ControlRetriever, as a unified multi-task retrieval system without task-specific tuning, significantly outperforms baseline methods designed with task-specific retrievers and also achieves state-of-the-art zero-shot performance.
Abstract:Visual relation extraction (VRE) aims to extract relations between entities from visuallyrich documents. Existing methods usually predict relations for each entity pair independently based on entity features but ignore the global structure information, i.e., dependencies between entity pairs. The absence of global structure information may make the model struggle to learn long-range relations and easily predict conflicted results. To alleviate such limitations, we propose a GlObal Structure knowledgeguided relation Extraction (GOSE) framework, which captures dependencies between entity pairs in an iterative manner. Given a scanned image of the document, GOSE firstly generates preliminary relation predictions on entity pairs. Secondly, it mines global structure knowledge based on prediction results of the previous iteration and further incorporates global structure knowledge into entity representations. This "generate-capture-incorporate" schema is performed multiple times so that entity representations and global structure knowledge can mutually reinforce each other. Extensive experiments show that GOSE not only outperforms previous methods on the standard fine-tuning setting but also shows promising superiority in cross-lingual learning; even yields stronger data-efficient performance in the low-resource setting.
Abstract:Prompt tuning is a parameter-efficient method, which freezes all PLM parameters and only prepends some additional tunable tokens called soft prompts to the input text. However, soft prompts heavily rely on a better initialization and may easily result in overfitting under few-shot settings, which causes prompt-tuning performing much worse than fine-tuning. To address the above issues, this paper proposes a novel Self-sUpervised Meta-prompt learning framework with MEtagradient Regularization for few shot generalization (SUMMER). We leverage self-supervised meta-learning to better initialize soft prompts and curriculum-based task augmentation is further proposed to enrich the meta-task distribution. Besides, a novel meta-gradient regularization method is integrated into the meta-prompt learning framework, which meta-learns to transform the raw gradient during few-shot learning into a domain-generalizable direction, thus alleviating the problem of overfitting. Extensive experiments show that SUMMER achieves better performance for different few-shot downstream tasks, and also exhibits a stronger domain generalization ability.
Abstract:With the development of deep learning and Transformer-based pre-trained models like BERT, the accuracy of many NLP tasks has been dramatically improved. However, the large number of parameters and computations also pose challenges for their deployment. For instance, using BERT can improve the predictions in the financial sentiment analysis (FSA) task but slow it down, where speed and accuracy are equally important in terms of profits. To address these issues, we first propose an efficient and lightweight BERT (ELBERT) along with a novel confidence-window-based (CWB) early exit mechanism. Based on ELBERT, an innovative method to accelerate text processing on the GPU platform is developed, solving the difficult problem of making the early exit mechanism work more effectively with a large input batch size. Afterward, a fast and high-accuracy FSA system is built. Experimental results show that the proposed CWB early exit mechanism achieves significantly higher accuracy than existing early exit methods on BERT under the same computation cost. By using this acceleration method, our FSA system can boost the processing speed by nearly 40 times to over 1000 texts per second with sufficient accuracy, which is nearly twice as fast as FastBERT, thus providing a more powerful text processing capability for modern trading systems.