Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kexin Wang

MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Nov 16, 2024

Yuhong Chou, Man Yao, Kexin Wang, Yuqi Pan, Ruijie Zhu, Yiran Zhong, Yu Qiao, Jibin Wu, Bo Xu, Guoqi Li

Figure 1 for MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Figure 2 for MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Figure 3 for MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Figure 4 for MetaLA: Unified Optimal Linear Approximation to Softmax Attention Map

Abstract:Various linear complexity models, such as Linear Transformer (LinFormer), State Space Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional softmax attention in Transformer structures. However, the optimal design of these linear models is still an open question. In this work, we attempt to answer this question by finding the best linear approximation to softmax attention from a theoretical perspective. We start by unifying existing linear complexity models as the linear attention form and then identify three conditions for the optimal linear attention design: 1) Dynamic memory ability; 2) Static approximation ability; 3) Least parameter approximation. We find that none of the current linear models meet all three conditions, resulting in suboptimal performance. Instead, we propose Meta Linear Attention (MetaLA) as a solution that satisfies these conditions. Our experiments on Multi-Query Associative Recall (MQAR) task, language modeling, image classification, and Long-Range Arena (LRA) benchmark demonstrate that MetaLA is more effective than the existing linear models.

Via

Access Paper or Ask Questions

RegNLP in Action: Facilitating Compliance Through Automated Information Retrieval and Answer Generation

Sep 09, 2024

Tuba Gokhan, Kexin Wang, Iryna Gurevych, Ted Briscoe

Figure 1 for RegNLP in Action: Facilitating Compliance Through Automated Information Retrieval and Answer Generation

Figure 2 for RegNLP in Action: Facilitating Compliance Through Automated Information Retrieval and Answer Generation

Figure 3 for RegNLP in Action: Facilitating Compliance Through Automated Information Retrieval and Answer Generation

Figure 4 for RegNLP in Action: Facilitating Compliance Through Automated Information Retrieval and Answer Generation

Abstract:Regulatory documents, issued by governmental regulatory bodies, establish rules, guidelines, and standards that organizations must adhere to for legal compliance. These documents, characterized by their length, complexity and frequent updates, are challenging to interpret, requiring significant allocation of time and expertise on the part of organizations to ensure ongoing compliance.Regulatory Natural Language Processing (RegNLP) is a multidisciplinary subfield aimed at simplifying access to and interpretation of regulatory rules and obligations. We define an Automated Question-Passage Generation task for RegNLP, create the ObliQA dataset containing 27,869 questions derived from the Abu Dhabi Global Markets (ADGM) financial regulation document collection, design a baseline Regulatory Information Retrieval and Answer Generation system, and evaluate it with RePASs, a novel evaluation metric that tests whether generated answers accurately capture all relevant obligations and avoid contradictions.

Via

Access Paper or Ask Questions

Exploring Fungal Morphology Simulation and Dynamic Light Containment from a Graphics Generation Perspective

Sep 08, 2024

Kexin Wang, Ivy He, Jinke Li, Ali Asadipour, Yitong Sun

Figure 1 for Exploring Fungal Morphology Simulation and Dynamic Light Containment from a Graphics Generation Perspective

Figure 2 for Exploring Fungal Morphology Simulation and Dynamic Light Containment from a Graphics Generation Perspective

Figure 3 for Exploring Fungal Morphology Simulation and Dynamic Light Containment from a Graphics Generation Perspective

Figure 4 for Exploring Fungal Morphology Simulation and Dynamic Light Containment from a Graphics Generation Perspective

Abstract:Fungal simulation and control are considered crucial techniques in Bio-Art creation. However, coding algorithms for reliable fungal simulations have posed significant challenges for artists. This study equates fungal morphology simulation to a two-dimensional graphic time-series generation problem. We propose a zero-coding, neural network-driven cellular automaton. Fungal spread patterns are learned through an image segmentation model and a time-series prediction model, which then supervise the training of neural network cells, enabling them to replicate real-world spreading behaviors. We further implemented dynamic containment of fungal boundaries with lasers. Synchronized with the automaton, the fungus successfully spreads into pre-designed complex shapes in reality.

* Siggraph Asia 2024 Art Paper

Via

Access Paper or Ask Questions

Scalable Autoregressive Image Generation with Mamba

Aug 22, 2024

Haopeng Li, Jinyue Yang, Kexin Wang, Xuerui Qiu, Yuhong Chou, Xin Li, Guoqi Li

Abstract:We introduce AiM, an autoregressive (AR) image generative model based on Mamba architecture. AiM employs Mamba, a novel state-space model characterized by its exceptional performance for long-sequence modeling with linear time complexity, to supplant the commonly utilized Transformers in AR image generation models, aiming to achieve both superior generation quality and enhanced inference speed. Unlike existing methods that adapt Mamba to handle two-dimensional signals via multi-directional scan, AiM directly utilizes the next-token prediction paradigm for autoregressive image generation. This approach circumvents the need for extensive modifications to enable Mamba to learn 2D spatial representations. By implementing straightforward yet strategically targeted modifications for visual generative tasks, we preserve Mamba's core structure, fully exploiting its efficient long-sequence modeling capabilities and scalability. We provide AiM models in various scales, with parameter counts ranging from 148M to 1.3B. On the ImageNet1K 256*256 benchmark, our best AiM model achieves a FID of 2.21, surpassing all existing AR models of comparable parameter counts and demonstrating significant competitiveness against diffusion models, with 2 to 10 times faster inference speed. Code is available at https://github.com/hp-l33/AiM

* 9 pages, 8 figures

Via

Access Paper or Ask Questions

Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment

Jul 20, 2024

Yongxin Huang, Kexin Wang, Goran Glavaš, Iryna Gurevych

Figure 1 for Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment

Figure 2 for Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment

Figure 3 for Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment

Figure 4 for Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment

Abstract:Multilingual sentence encoders are commonly obtained by training multilingual language models to map sentences from different languages into a shared semantic space. As such, they are subject to curse of multilinguality, a loss of monolingual representational accuracy due to parameter sharing. Another limitation of multilingual sentence encoders is the trade-off between monolingual and cross-lingual performance. Training for cross-lingual alignment of sentence embeddings distorts the optimal monolingual structure of semantic spaces of individual languages, harming the utility of sentence embeddings in monolingual tasks. In this work, we address both issues by modular training of sentence encoders, i.e., by separating monolingual specialization from cross-lingual alignment. We first efficiently train language-specific sentence encoders to avoid negative interference between languages (i.e., the curse). We then align all non-English monolingual encoders to the English encoder by training a cross-lingual alignment adapter on top of each, preventing interference with monolingual specialization from the first step. In both steps, we resort to contrastive learning on machine-translated paraphrase data. Monolingual and cross-lingual evaluations on semantic text similarity/relatedness and multiple-choice QA render our modular solution more effective than multilingual sentence encoders, especially benefiting low-resource languages.

Via

Access Paper or Ask Questions

Contrastive independent component analysis

Jul 02, 2024

Kexin Wang, Aida Maraj, Anna Seigal

Abstract:Visualizing data and finding patterns in data are ubiquitous problems in the sciences. Increasingly, applications seek signal and structure in a contrastive setting: a foreground dataset relative to a background dataset. For this purpose, we propose contrastive independent component analysis (cICA). This generalizes independent component analysis to independent latent variables across a foreground and background. We propose a hierarchical tensor decomposition algorithm for cICA. We study the identifiability of cICA and demonstrate its performance visualizing data and finding patterns in data, using synthetic and real-world datasets, comparing the approach to existing contrastive methods.

* 28 pages, 8 figures

Via

Access Paper or Ask Questions

Improving the Robustness of Knowledge-Grounded Dialogue via Contrastive Learning

Jan 09, 2024

Jiaan Wang, Jianfeng Qu, Kexin Wang, Zhixu Li, Wen Hua, Ximing Li, An Liu

Abstract:Knowledge-grounded dialogue (KGD) learns to generate an informative response based on a given dialogue context and external knowledge (\emph{e.g.}, knowledge graphs; KGs). Recently, the emergence of large language models (LLMs) and pre-training techniques has brought great success to knowledge-grounded dialogue. However, when building KGD systems in real applications, there are various real-world noises that are inevitable to face. For example, the dialogue context might involve perturbations such as misspellings and abbreviations. In addition, KGs typically suffer from incompletion and also might contain erroneous and outdated facts. Such real-world noises pose a challenge to the robustness of KGD systems and hinder their applications in the real world. In this paper, we propose an entity-based contrastive learning framework for improving the robustness of KGD. Specifically, we make use of the entity information in a KGD sample to create both its positive and negative samples which involve semantic-irrelevant and semantic-relevant perturbations, respectively. The contrastive learning framework ensures the KGD model is aware of these two types of perturbations, thus generating informative responses with the potentially noisy inputs in real applications. Experimental results on three benchmark datasets show that our method achieves new state-of-the-art performance in terms of automatic evaluation scores, verifying its effectiveness and potentiality. Furthermore, we show that our method can generate better responses than comparison models in both the noisy and the few-shot settings.

* Accepted by AAAI 2024

Via

Access Paper or Ask Questions

SVFAP: Self-supervised Video Facial Affect Perceiver

Dec 31, 2023

Licai Sun, Zheng Lian, Kexin Wang, Yu He, Mingyu Xu, Haiyang Sun, Bin Liu, Jianhua Tao

Figure 1 for SVFAP: Self-supervised Video Facial Affect Perceiver

Figure 2 for SVFAP: Self-supervised Video Facial Affect Perceiver

Figure 3 for SVFAP: Self-supervised Video Facial Affect Perceiver

Figure 4 for SVFAP: Self-supervised Video Facial Affect Perceiver

Abstract:Video-based facial affect analysis has recently attracted increasing attention owing to its critical role in human-computer interaction. Previous studies mainly focus on developing various deep learning architectures and training them in a fully supervised manner. Although significant progress has been achieved by these supervised methods, the longstanding lack of large-scale high-quality labeled data severely hinders their further improvements. Motivated by the recent success of self-supervised learning in computer vision, this paper introduces a self-supervised approach, termed Self-supervised Video Facial Affect Perceiver (SVFAP), to address the dilemma faced by supervised methods. Specifically, SVFAP leverages masked facial video autoencoding to perform self-supervised pre-training on massive unlabeled facial videos. Considering that large spatiotemporal redundancy exists in facial videos, we propose a novel temporal pyramid and spatial bottleneck Transformer as the encoder of SVFAP, which not only enjoys low computational cost but also achieves excellent performance. To verify the effectiveness of our method, we conduct experiments on nine datasets spanning three downstream tasks, including dynamic facial expression recognition, dimensional emotion recognition, and personality recognition. Comprehensive results demonstrate that SVFAP can learn powerful affect-related representations via large-scale self-supervised pre-training and it significantly outperforms previous state-of-the-art methods on all datasets. Codes will be available at https://github.com/sunlicai/SVFAP.

* Submitted to IEEE Trans. on Affective Computing (February 8, 2023)

Via

Access Paper or Ask Questions

Do we listen to what we are told? An empirical study on human behaviour during the COVID-19 pandemic: neural networks vs. regression analysis

Nov 21, 2023

Yuxi Heluo, Kexin Wang, Charles W. Robson

Figure 1 for Do we listen to what we are told? An empirical study on human behaviour during the COVID-19 pandemic: neural networks vs. regression analysis

Figure 2 for Do we listen to what we are told? An empirical study on human behaviour during the COVID-19 pandemic: neural networks vs. regression analysis

Figure 3 for Do we listen to what we are told? An empirical study on human behaviour during the COVID-19 pandemic: neural networks vs. regression analysis

Figure 4 for Do we listen to what we are told? An empirical study on human behaviour during the COVID-19 pandemic: neural networks vs. regression analysis

Abstract:In this work, we contribute the first visual open-source empirical study on human behaviour during the COVID-19 pandemic, in order to investigate how compliant a general population is to mask-wearing-related public-health policy. Object-detection-based convolutional neural networks, regression analysis and multilayer perceptrons are combined to analyse visual data of the Viennese public during 2020. We find that mask-wearing-related government regulations and public-transport announcements encouraged correct mask-wearing-behaviours during the COVID-19 pandemic. Importantly, changes in announcement and regulation contents led to heterogeneous effects on people's behaviour. Comparing the predictive power of regression analysis and neural networks, we demonstrate that the latter produces more accurate predictions of population reactions during the COVID-19 pandemic. Our use of regression modelling also allows us to unearth possible causal pathways underlying societal behaviour. Since our findings highlight the importance of appropriate communication contents, our results will facilitate more effective non-pharmaceutical interventions to be developed in future. Adding to the literature, we demonstrate that regression modelling and neural networks are not mutually exclusive but instead complement each other.

Via

Access Paper or Ask Questions

AdaSent: Efficient Domain-Adapted Sentence Embeddings for Few-Shot Classification

Nov 01, 2023

Yongxin Huang, Kexin Wang, Sourav Dutta, Raj Nath Patel, Goran Glavaš, Iryna Gurevych

Abstract:Recent work has found that few-shot sentence classification based on pre-trained Sentence Encoders (SEs) is efficient, robust, and effective. In this work, we investigate strategies for domain-specialization in the context of few-shot sentence classification with SEs. We first establish that unsupervised Domain-Adaptive Pre-Training (DAPT) of a base Pre-trained Language Model (PLM) (i.e., not an SE) substantially improves the accuracy of few-shot sentence classification by up to 8.4 points. However, applying DAPT on SEs, on the one hand, disrupts the effects of their (general-domain) Sentence Embedding Pre-Training (SEPT). On the other hand, applying general-domain SEPT on top of a domain-adapted base PLM (i.e., after DAPT) is effective but inefficient, since the computationally expensive SEPT needs to be executed on top of a DAPT-ed PLM of each domain. As a solution, we propose AdaSent, which decouples SEPT from DAPT by training a SEPT adapter on the base PLM. The adapter can be inserted into DAPT-ed PLMs from any domain. We demonstrate AdaSent's effectiveness in extensive experiments on 17 different few-shot sentence classification datasets. AdaSent matches or surpasses the performance of full SEPT on DAPT-ed PLM, while substantially reducing the training costs. The code for AdaSent is available.

* Accepted at EMNLP 2023 Main

Via

Access Paper or Ask Questions