Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Runze Wang

Bridging Molecular Graphs and Large Language Models

Mar 05, 2025

Runze Wang, Mingqi Yang, Yanming Shen

Abstract:While Large Language Models (LLMs) have shown exceptional generalization capabilities, their ability to process graph data, such as molecular structures, remains limited. To bridge this gap, this paper proposes Graph2Token, an efficient solution that aligns graph tokens to LLM tokens. The key idea is to represent a graph token with the LLM token vocabulary, without fine-tuning the LLM backbone. To achieve this goal, we first construct a molecule-text paired dataset from multisources, including CHEBI and HMDB, to train a graph structure encoder, which reduces the distance between graphs and texts representations in the feature space. Then, we propose a novel alignment strategy that associates a graph token with LLM tokens. To further unleash the potential of LLMs, we collect molecular IUPAC name identifiers, which are incorporated into the LLM prompts. By aligning molecular graphs as special tokens, we can activate LLM generalization ability to molecular few-shot learning. Extensive experiments on molecular classification and regression tasks demonstrate the effectiveness of our proposed Graph2Token.

Via

Access Paper or Ask Questions

A General-Purpose Neuromorphic Sensor based on Spiketrum Algorithm: Hardware Details and Real-life Applications

Jan 30, 2025

MHD Anas Alsakkal, Runze Wang, Jayawan Wijekoon, Piotr Dudek

Figure 1 for A General-Purpose Neuromorphic Sensor based on Spiketrum Algorithm: Hardware Details and Real-life Applications

Figure 2 for A General-Purpose Neuromorphic Sensor based on Spiketrum Algorithm: Hardware Details and Real-life Applications

Figure 3 for A General-Purpose Neuromorphic Sensor based on Spiketrum Algorithm: Hardware Details and Real-life Applications

Figure 4 for A General-Purpose Neuromorphic Sensor based on Spiketrum Algorithm: Hardware Details and Real-life Applications

Abstract:Spiking Neural Networks (SNNs) offer a biologically inspired computational paradigm, enabling energy-efficient data processing through spike-based information transmission. Despite notable advancements in hardware for SNNs, spike encoding has largely remained software-dependent, limiting efficiency. This paper addresses the need for adaptable and resource-efficient spike encoding hardware by presenting an area-optimized hardware implementation of the Spiketrum algorithm, which encodes time-varying analogue signals into spatiotemporal spike patterns. Unlike earlier performance-optimized designs, which prioritize speed, our approach focuses on reducing hardware footprint, achieving a 52% reduction in Block RAMs (BRAMs), 31% fewer Digital Signal Processing (DSP) slices, and a 6% decrease in Look-Up Tables (LUTs). The proposed implementation has been verified on an FPGA and successfully integrated into an IC using TSMC180 technology. Experimental results demonstrate the system's effectiveness in real-world applications, including sound and ECG classification. This work highlights the trade-offs between performance and resource efficiency, offering a flexible, scalable solution for neuromorphic systems in power-sensitive applications like cochlear implants and neural devices.

* Currently under review with IEEE TCAS

Via

Access Paper or Ask Questions

ObjectRelator: Enabling Cross-View Object Relation Understanding in Ego-Centric and Exo-Centric Videos

Nov 28, 2024

Yuqian Fu, Runze Wang, Yanwei Fu, Danda Pani Paudel, Xuanjing Huang, Luc Van Gool

Abstract:In this paper, we focus on the Ego-Exo Object Correspondence task, an emerging challenge in the field of computer vision that aims to map objects across ego-centric and exo-centric views. We introduce ObjectRelator, a novel method designed to tackle this task, featuring two new modules: Multimodal Condition Fusion (MCFuse) and SSL-based Cross-View Object Alignment (XObjAlign). MCFuse effectively fuses language and visual conditions to enhance target object localization, while XObjAlign enforces consistency in object representations across views through a self-supervised alignment strategy. Extensive experiments demonstrate the effectiveness of ObjectRelator, achieving state-of-the-art performance on Ego2Exo and Exo2Ego tasks with minimal additional parameters. This work provides a foundation for future research in comprehensive cross-view object relation understanding highlighting the potential of leveraging multimodal guidance and cross-view alignment. Codes and models will be released to advance further research in this direction.

Via

Access Paper or Ask Questions

Application based Evaluation of an Efficient Spike-Encoder, "Spiketrum"

May 29, 2024

MHD Anas Alsakkal, Runze Wang, Jayawan Wijekoon, Huajin Tang

Abstract:Spike-based encoders represent information as sequences of spikes or pulses, which are transmitted between neurons. A prevailing consensus suggests that spike-based approaches demonstrate exceptional capabilities in capturing the temporal dynamics of neural activity and have the potential to provide energy-efficient solutions for low-power applications. The Spiketrum encoder efficiently compresses input data using spike trains or code sets (for non-spiking applications) and is adaptable to both hardware and software implementations, with lossless signal reconstruction capability. The paper proposes and assesses Spiketrum's hardware, evaluating its output under varying spike rates and its classification performance with popular spiking and non-spiking classifiers, and also assessing the quality of information compression and hardware resource utilization. The paper extensively benchmarks both Spiketrum hardware and its software counterpart against state-of-the-art, biologically-plausible encoders. The evaluations encompass benchmarking criteria, including classification accuracy, training speed, and sparsity when using encoder outputs in pattern recognition and classification with both spiking and non-spiking classifiers. Additionally, they consider encoded output entropy and hardware resource utilization and power consumption of the hardware version of the encoders. Results demonstrate Spiketrum's superiority in most benchmarking criteria, making it a promising choice for various applications. It efficiently utilizes hardware resources with low power consumption, achieving high classification accuracy. This work also emphasizes the potential of encoders in spike-based processing to improve the efficiency and performance of neural computing systems.

* To be published at "IEEE Transactions on Cognitive and Developmental Systems"

Via

Access Paper or Ask Questions

CO3: Low-resource Contrastive Co-training for Generative Conversational Query Rewrite

Mar 18, 2024

Yifei Yuan, Chen Shi, Runze Wang, Liyi Chen, Renjun Hu, Zengming Zhang, Feijun Jiang, Wai Lam

Abstract:Generative query rewrite generates reconstructed query rewrites using the conversation history while rely heavily on gold rewrite pairs that are expensive to obtain. Recently, few-shot learning is gaining increasing popularity for this task, whereas these methods are sensitive to the inherent noise due to limited data size. Besides, both attempts face performance degradation when there exists language style shift between training and testing cases. To this end, we study low-resource generative conversational query rewrite that is robust to both noise and language style shift. The core idea is to utilize massive unlabeled data to make further improvements via a contrastive co-training paradigm. Specifically, we co-train two dual models (namely Rewriter and Simplifier) such that each of them provides extra guidance through pseudo-labeling for enhancing the other in an iterative manner. We also leverage contrastive learning with data augmentation, which enables our model pay more attention on the truly valuable information than the noise. Extensive experiments demonstrate the superiority of our model under both few-shot and zero-shot scenarios. We also verify the better generalization ability of our model when encountering language style shift.

* Accepted to COLING 2024

Via

Access Paper or Ask Questions

How Can Large Language Models Understand Spatial-Temporal Data?

Jan 25, 2024

Lei Liu, Shuo Yu, Runze Wang, Zhenxun Ma, Yanming Shen

Abstract:While Large Language Models (LLMs) dominate tasks like natural language processing and computer vision, harnessing their power for spatial-temporal forecasting remains challenging. The disparity between sequential text and complex spatial-temporal data hinders this application. To address this issue, this paper introduces STG-LLM, an innovative approach empowering LLMs for spatial-temporal forecasting. We tackle the data mismatch by proposing: 1) STG-Tokenizer: This spatial-temporal graph tokenizer transforms intricate graph data into concise tokens capturing both spatial and temporal relationships; 2) STG-Adapter: This minimalistic adapter, consisting of linear encoding and decoding layers, bridges the gap between tokenized data and LLM comprehension. By fine-tuning only a small set of parameters, it can effectively grasp the semantics of tokens generated by STG-Tokenizer, while preserving the original natural language understanding capabilities of LLMs. Extensive experiments on diverse spatial-temporal benchmark datasets show that STG-LLM successfully unlocks LLM potential for spatial-temporal forecasting. Remarkably, our approach achieves competitive performance on par with dedicated SOTA methods.

Via

Access Paper or Ask Questions

McQueen: a Benchmark for Multimodal Conversational Query Rewrite

Oct 23, 2022

Yifei Yuan, Chen Shi, Runze Wang, Liyi Chen, Feijun Jiang, Yuan You, Wai Lam

Abstract:The task of query rewrite aims to convert an in-context query to its fully-specified version where ellipsis and coreference are completed and referred-back according to the history context. Although much progress has been made, less efforts have been paid to real scenario conversations that involve drawing information from more than one modalities. In this paper, we propose the task of multimodal conversational query rewrite (McQR), which performs query rewrite under the multimodal visual conversation setting. We collect a large-scale dataset named McQueen based on manual annotation, which contains 15k visual conversations and over 80k queries where each one is associated with a fully-specified rewrite version. In addition, for entities appearing in the rewrite, we provide the corresponding image box annotation. We then use the McQueen dataset to benchmark a state-of-the-art method for effectively tackling the McQR task, which is based on a multimodal pre-trained model with pointer generator. Extensive experiments are performed to demonstrate the effectiveness of our model on this task\footnote{The dataset and code of this paper are both available in \url{https://github.com/yfyuan01/MQR}

* Accepted by EMNLP22

Via

Access Paper or Ask Questions

Can You be More Social? Injecting Politeness and Positivity into Task-Oriented Conversational Agents

Dec 29, 2020

Yi-Chia Wang, Alexandros Papangelis, Runze Wang, Zhaleh Feizollahi, Gokhan Tur, Robert Kraut

Figure 1 for Can You be More Social? Injecting Politeness and Positivity into Task-Oriented Conversational Agents

Figure 2 for Can You be More Social? Injecting Politeness and Positivity into Task-Oriented Conversational Agents

Figure 3 for Can You be More Social? Injecting Politeness and Positivity into Task-Oriented Conversational Agents

Figure 4 for Can You be More Social? Injecting Politeness and Positivity into Task-Oriented Conversational Agents

Abstract:Goal-oriented conversational agents are becoming prevalent in our daily lives. For these systems to engage users and achieve their goals, they need to exhibit appropriate social behavior as well as provide informative replies that guide users through tasks. The first component of the research in this paper applies statistical modeling techniques to understand conversations between users and human agents for customer service. Analyses show that social language used by human agents is associated with greater users' responsiveness and task completion. The second component of the research is the construction of a conversational agent model capable of injecting social language into an agent's responses while still preserving content. The model uses a sequence-to-sequence deep learning architecture, extended with a social language understanding element. Evaluation in terms of content preservation and social language level using both human judgment and automatic linguistic measures shows that the model can generate responses that enable agents to address users' issues in a more socially appropriate way.

Via

Access Paper or Ask Questions

Joint Contextual Modeling for ASR Correction and Language Understanding

Jan 28, 2020

Yue Weng, Sai Sumanth Miryala, Chandra Khatri, Runze Wang, Huaixiu Zheng, Piero Molino, Mahdi Namazifar, Alexandros Papangelis, Hugh Williams, Franziska Bell(+1 more)

Figure 1 for Joint Contextual Modeling for ASR Correction and Language Understanding

Figure 2 for Joint Contextual Modeling for ASR Correction and Language Understanding

Figure 3 for Joint Contextual Modeling for ASR Correction and Language Understanding

Abstract:The quality of automatic speech recognition (ASR) is critical to Dialogue Systems as ASR errors propagate to and directly impact downstream tasks such as language understanding (LU). In this paper, we propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with LU to improve the performance of both tasks simultaneously. To measure the effectiveness of this approach we used a public benchmark, the 2nd Dialogue State Tracking (DSTC2) corpus. As a baseline approach, we trained task-specific Statistical Language Models (SLM) and fine-tuned state-of-the-art Generalized Pre-training (GPT) Language Model to re-rank the n-best ASR hypotheses, followed by a model to identify the dialog act and slots. i) We further trained ranker models using GPT and Hierarchical CNN-RNN models with discriminatory losses to detect the best output given n-best hypotheses. We extended these ranker models to first select the best ASR output and then identify the dialogue act and slots in an end to end fashion. ii) We also proposed a novel joint ASR error correction and LU model, a word confusion pointer network (WCN-Ptr) with multi-head self-attention on top, which consumes the word confusions populated from the n-best. We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.

* Accepted at IEEE ICASSP 2020

Via

Access Paper or Ask Questions