Abstract:As the foundation of large language models (LLMs), self-attention module faces the challenge of quadratic time and memory complexity with respect to sequence length. FlashAttention accelerates attention computation and reduces its memory usage by leveraging the GPU memory hierarchy. A promising research direction is to integrate FlashAttention with quantization methods. This paper introduces INT-FlashAttention, the first INT8 quantization architecture compatible with the forward workflow of FlashAttention, which significantly improves the inference speed of FlashAttention on Ampere GPUs. We implement our INT-FlashAttention prototype with fully INT8 activations and general matrix-multiplication (GEMM) kernels, making it the first attention operator with fully INT8 input. As a general token-level post-training quantization framework, INT-FlashAttention is also compatible with other data formats like INT4, etc. Experimental results show INT-FlashAttention achieves 72% faster inference speed and 82% smaller quantization error compared to standard FlashAttention with FP16 and FP8 data format.
Abstract:Large language models (LLMs) have achieved impressive performance on code generation. Although prior studies enhanced LLMs with prompting techniques and code refinement, they still struggle with complex programming problems due to rigid solution plans. In this paper, we draw on pair programming practices to propose PairCoder, a novel LLM-based framework for code generation. PairCoder incorporates two collaborative LLM agents, namely a Navigator agent for high-level planning and a Driver agent for specific implementation. The Navigator is responsible for proposing promising solution plans, selecting the current optimal plan, and directing the next iteration round based on execution feedback. The Driver follows the guidance of Navigator to undertake initial code generation, code testing, and refinement. This interleaved and iterative workflow involves multi-plan exploration and feedback-based refinement, which mimics the collaboration of pair programmers. We evaluate PairCoder with both open-source and closed-source LLMs on various code generation benchmarks. Extensive experimental results demonstrate the superior accuracy of PairCoder, achieving relative pass@1 improvements of 12.00%-162.43% compared to prompting LLMs directly.
Abstract:Existing domain generalization (DG) methods for cross-person generalization tasks often face challenges in capturing intra- and inter-domain style diversity, resulting in domain gaps with the target domain. In this study, we explore a novel perspective to tackle this problem, a process conceptualized as domain padding. This proposal aims to enrich the domain diversity by synthesizing intra- and inter-domain style data while maintaining robustness to class labels. We instantiate this concept using a conditional diffusion model and introduce a style-fused sampling strategy to enhance data generation diversity. In contrast to traditional condition-guided sampling, our style-fused sampling strategy allows for the flexible use of one or more random styles to guide data synthesis. This feature presents a notable advancement: it allows for the maximum utilization of possible permutations and combinations among existing styles to generate a broad spectrum of new style instances. Empirical evaluations on a board of datasets demonstrate that our generated data achieves remarkable diversity within the domain space. Both intra- and inter-domain generated data have proven to be significant and valuable, contributing to varying degrees of performance enhancements. Notably, our approach outperforms state-of-the-art DG methods in all human activity recognition tasks.
Abstract:Recent years have witnessed the deployment of code language models (LMs) in various code intelligence tasks such as code completion. Yet, it is challenging for pre-trained LMs to generate correct completions in private repositories. Previous studies retrieve cross-file context based on import relations or text similarity, which is insufficiently relevant to completion targets. In this paper, we propose a dataflow-guided retrieval augmentation approach, called DraCo, for repository-level code completion. DraCo parses a private repository into code entities and establishes their relations through an extended dataflow analysis, forming a repo-specific context graph. Whenever triggering code completion, DraCo precisely retrieves relevant background knowledge from the repo-specific context graph and generates well-formed prompts to query code LMs. Furthermore, we construct a large Python dataset, ReccEval, with more diverse completion targets. Our experiments demonstrate the superior accuracy and applicable efficiency of DraCo, improving code exact match by 3.43% and identifier F1-score by 3.27% on average compared to the state-of-the-art approach.
Abstract:Wood-leaf classification is an essential and fundamental prerequisite in the analysis and estimation of forest attributes from terrestrial laser scanning (TLS) point clouds,including critical measurements such as diameter at breast height(DBH),above-ground biomass(AGB),wood volume.To address this,we introduce the Wood-Leaf Classification Network(WLC-Net),a deep learning model derived from PointNet++,designed to differentiate between wood and leaf points within tree point clouds.WLC-Net enhances classification accuracy,completeness,and speed by incorporating linearity as an inherent feature,refining the input-output framework,and optimizing the centroid sampling technique.WLC-Net was trained and assessed using three distinct tree species datasets,comprising a total of 102 individual tree point clouds:21 Chinese ash trees,21 willow trees,and 60 tropical trees.For comparative evaluation,five alternative methods,including PointNet++,DGCNN,Krishna Moorthy's method,LeWoS, and Sun's method,were also applied to these datasets.The classification accuracy of all six methods was quantified using three metrics:overall accuracy(OA),mean Intersection over Union(mIoU),and F1-score.Across all three datasets,WLC-Net demonstrated superior performance, achieving OA scores of 0.9778, 0.9712, and 0.9508;mIoU scores of 0.9761, 0.9693,and 0.9141;and F1-scores of 0.8628, 0.7938,and 0.9019,respectively.The time costs of WLC-Net were also recorded to evaluate the efficiency.The average processing time was 102.74s per million points for WLC-Net.In terms of visual inspect,accuracy evaluation and efficiency evaluation,the results suggest that WLC-Net presents a promising approach for wood-leaf classification,distinguished by its high accuracy. In addition,WLC-Net also exhibits strong applicability across various tree point clouds and holds promise for further optimization.
Abstract:While one-dimensional convolutional neural networks (1D-CNNs) have been empirically proven effective in time series classification tasks, we find that there remain undesirable outcomes that could arise in their application, motivating us to further investigate and understand their underlying mechanisms. In this work, we propose a Temporal Convolutional Explorer (TCE) to empirically explore the learning behavior of 1D-CNNs from the perspective of the frequency domain. Our TCE analysis highlights that deeper 1D-CNNs tend to distract the focus from the low-frequency components leading to the accuracy degradation phenomenon, and the disturbing convolution is the driving factor. Then, we leverage our findings to the practical application and propose a regulatory framework, which can easily be integrated into existing 1D-CNNs. It aims to rectify the suboptimal learning behavior by enabling the network to selectively bypass the specified disturbing convolutions. Finally, through comprehensive experiments on widely-used UCR, UEA, and UCI benchmarks, we demonstrate that 1) TCE's insight into 1D-CNN's learning behavior; 2) our regulatory framework enables state-of-the-art 1D-CNNs to get improved performances with less consumption of memory and computational overhead.
Abstract:The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters. However, the closed-source nature of state-of-the-art LLMs and their general-purpose training limit role-playing optimization. In this paper, we introduce RoleLLM, a framework to benchmark, elicit, and enhance role-playing abilities in LLMs. RoleLLM comprises four stages: (1) Role Profile Construction for 100 roles; (2) Context-Based Instruction Generation (Context-Instruct) for role-specific knowledge extraction; (3) Role Prompting using GPT (RoleGPT) for speaking style imitation; and (4) Role-Conditioned Instruction Tuning (RoCIT) for fine-tuning open-source models along with role customization. By Context-Instruct and RoleGPT, we create RoleBench, the first systematic and fine-grained character-level benchmark dataset for role-playing with 168,093 samples. Moreover, RoCIT on RoleBench yields RoleLLaMA (English) and RoleGLM (Chinese), significantly enhancing role-playing abilities and even achieving comparable results with RoleGPT (using GPT-4).
Abstract:Logical query answering over Knowledge Graphs (KGs) is a fundamental yet complex task. A promising approach to achieve this is to embed queries and entities jointly into the same embedding space. Research along this line suggests that using multi-modal distribution to represent answer entities is more suitable than uni-modal distribution, as a single query may contain multiple disjoint answer subsets due to the compositional nature of multi-hop queries and the varying latent semantics of relations. However, existing methods based on multi-modal distribution roughly represent each subset without capturing its accurate cardinality, or even degenerate into uni-modal distribution learning during the reasoning process due to the lack of an effective similarity measure. To better model queries with diversified answers, we propose Query2GMM for answering logical queries over knowledge graphs. In Query2GMM, we present the GMM embedding to represent each query using a univariate Gaussian Mixture Model (GMM). Each subset of a query is encoded by its cardinality, semantic center and dispersion degree, allowing for precise representation of multiple subsets. Then we design specific neural networks for each operator to handle the inherent complexity that comes with multi-modal distribution while alleviating the cascading errors. Last, we define a new similarity measure to assess the relationships between an entity and a query's multi-answer subsets, enabling effective multi-modal distribution learning for reasoning. Comprehensive experimental results show that Query2GMM outperforms the best competitor by an absolute average of $5.5\%$. The source code is available at \url{https://anonymous.4open.science/r/Query2GMM-C42F}.
Abstract:The rapid development of aspect-based sentiment analysis (ABSA) within recent decades shows great potential for real-world society. The current ABSA works, however, are mostly limited to the scenario of a single text piece, leaving the study in dialogue contexts unexplored. In this work, we introduce a novel task of conversational aspect-based sentiment quadruple analysis, namely DiaASQ, aiming to detect the sentiment quadruple of \emph{target-aspect-opinion-sentiment} in a dialogue. DiaASQ bridges the gap between fine-grained sentiment analysis and conversational opinion mining. We manually construct a large-scale high-quality DiaASQ dataset in both Chinese and English languages. We deliberately develop a neural model to benchmark the task, which advances in effectively performing end-to-end quadruple prediction, and manages to incorporate rich dialogue-specific and discourse feature representations for better cross-utterance quadruple extraction. We finally point out several potential future works to facilitate the follow-up research of this new task.