Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haiyang Zhang

EcomScriptBench: A Multi-task Benchmark for E-commerce Script Planning via Step-wise Intention-Driven Product Association

May 21, 2025

Weiqi Wang, Limeng Cui, Xin Liu, Sreyashi Nag, Wenju Xu, Chen Luo, Sheikh Muhammad Sarwar, Yang Li, Hansu Gu, Hui Liu(+7 more)

Abstract:Goal-oriented script planning, or the ability to devise coherent sequences of actions toward specific goals, is commonly employed by humans to plan for typical activities. In e-commerce, customers increasingly seek LLM-based assistants to generate scripts and recommend products at each step, thereby facilitating convenient and efficient shopping experiences. However, this capability remains underexplored due to several challenges, including the inability of LLMs to simultaneously conduct script planning and product retrieval, difficulties in matching products caused by semantic discrepancies between planned actions and search queries, and a lack of methods and benchmark data for evaluation. In this paper, we step forward by formally defining the task of E-commerce Script Planning (EcomScript) as three sequential subtasks. We propose a novel framework that enables the scalable generation of product-enriched scripts by associating products with each step based on the semantic similarity between the actions and their purchase intentions. By applying our framework to real-world e-commerce data, we construct the very first large-scale EcomScript dataset, EcomScriptBench, which includes 605,229 scripts sourced from 2.4 million products. Human annotations are then conducted to provide gold labels for a sampled subset, forming an evaluation benchmark. Extensive experiments reveal that current (L)LMs face significant challenges with EcomScript tasks, even after fine-tuning, while injecting product purchase intentions improves their performance.

* ACL2025

Via

Access Paper or Ask Questions

Language-based Audio Retrieval with Co-Attention Networks

Dec 30, 2024

Haoran Sun, Zimu Wang, Qiuyi Chen, Jianjun Chen, Jia Wang, Haiyang Zhang

Abstract:In recent years, user-generated audio content has proliferated across various media platforms, creating a growing need for efficient retrieval methods that allow users to search for audio clips using natural language queries. This task, known as language-based audio retrieval, presents significant challenges due to the complexity of learning semantic representations from heterogeneous data across both text and audio modalities. In this work, we introduce a novel framework for the language-based audio retrieval task that leverages co-attention mechanismto jointly learn meaningful representations from both modalities. To enhance the model's ability to capture fine-grained cross-modal interactions, we propose a cascaded co-attention architecture, where co-attention modules are stacked or iterated to progressively refine the semantic alignment between text and audio. Experiments conducted on two public datasets show that the proposed method can achieve better performance than the state-of-the-art method. Specifically, our best performed co-attention model achieves a 16.6% improvement in mean Average Precision on Clotho dataset, and a 15.1% improvement on AudioCaps.

* Accepted at UIC 2024 proceedings. Accepted version

Via

Access Paper or Ask Questions

Domain-specific Guided Summarization for Mental Health Posts

Nov 03, 2024

Lu Qian, Yuqi Wang, Zimu Wang, Haiyang Zhang, Wei Wang, Ting Yu, Anh Nguyen

Figure 1 for Domain-specific Guided Summarization for Mental Health Posts

Figure 2 for Domain-specific Guided Summarization for Mental Health Posts

Figure 3 for Domain-specific Guided Summarization for Mental Health Posts

Figure 4 for Domain-specific Guided Summarization for Mental Health Posts

Abstract:In domain-specific contexts, particularly mental health, abstractive summarization requires advanced techniques adept at handling specialized content to generate domain-relevant and faithful summaries. In response to this, we introduce a guided summarizer equipped with a dual-encoder and an adapted decoder that utilizes novel domain-specific guidance signals, i.e., mental health terminologies and contextually rich sentences from the source document, to enhance its capacity to align closely with the content and context of guidance, thereby generating a domain-relevant summary. Additionally, we present a post-editing correction model to rectify errors in the generated summary, thus enhancing its consistency with the original content in detail. Evaluation on the MentSum dataset reveals that our model outperforms existing baseline models in terms of both ROUGE and FactCC scores. Although the experiments are specifically designed for mental health posts, the methodology we've developed offers broad applicability, highlighting its versatility and effectiveness in producing high-quality domain-specific summaries.

* Accepted at PACLIC 2024. Camera-ready version

Via

Access Paper or Ask Questions

Text-Guided Attention is All You Need for Zero-Shot Robustness in Vision-Language Models

Oct 29, 2024

Lu Yu, Haiyang Zhang, Changsheng Xu

Abstract:Due to the impressive zero-shot capabilities, pre-trained vision-language models (e.g. CLIP), have attracted widespread attention and adoption across various domains. Nonetheless, CLIP has been observed to be susceptible to adversarial examples. Through experimental analysis, we have observed a phenomenon wherein adversarial perturbations induce shifts in text-guided attention. Building upon this observation, we propose a simple yet effective strategy: __Text-Guided Attention for Zero-Shot Robustness (TGA-ZSR)__. This framework incorporates two components: the Attention Refinement module and the Attention-based Model Constraint module. Our goal is to maintain the generalization of the CLIP model and enhance its adversarial robustness: The Attention Refinement module aligns the text-guided attention obtained from the target model via adversarial examples with the text-guided attention acquired from the original model via clean examples. This alignment enhances the model's robustness. Additionally, the Attention-based Model Constraint module acquires text-guided attention from both the target and original models using clean examples. Its objective is to maintain model performance on clean samples while enhancing overall robustness. The experiments validate that our method yields a 9.58\% enhancement in zero-shot robust accuracy over the current state-of-the-art techniques across 16 datasets. __Our code is available at__ https://github.com/zhyblue424/TGA-ZSR.

* Accepted by NeurIPS 2024

Via

Access Paper or Ask Questions

A Dual Adaptive Assignment Approach for Robust Graph-Based Clustering

Oct 29, 2024

Yang Xiang, Li Fan, Tulika Saha, Yushan Pan, Haiyang Zhang, Chengtao Ji

Figure 1 for A Dual Adaptive Assignment Approach for Robust Graph-Based Clustering

Figure 2 for A Dual Adaptive Assignment Approach for Robust Graph-Based Clustering

Figure 3 for A Dual Adaptive Assignment Approach for Robust Graph-Based Clustering

Figure 4 for A Dual Adaptive Assignment Approach for Robust Graph-Based Clustering

Abstract:Graph clustering is an essential aspect of network analysis that involves grouping nodes into separate clusters. Recent developments in deep learning have resulted in advanced deep graph clustering techniques, which have proven effective in many applications. Nonetheless, these methods often encounter difficulties when dealing with the complexities of real-world graphs, particularly in the presence of noisy edges. Additionally, many denoising graph clustering strategies tend to suffer from lower performance compared to their non-denoised counterparts, training instability, and challenges in scaling to large datasets. To tackle these issues, we introduce a new framework called the Dual Adaptive Assignment Approach for Robust Graph-Based Clustering (RDSA). RDSA consists of three key components: (i) a node embedding module that effectively integrates the graph's topological features and node attributes; (ii) a structure-based soft assignment module that improves graph modularity by utilizing an affinity matrix for node assignments; and (iii) a node-based soft assignment module that identifies community landmarks and refines node assignments to enhance the model's robustness. We assess RDSA on various real-world datasets, demonstrating its superior performance relative to existing state-of-the-art methods. Our findings indicate that RDSA provides robust clustering across different graph types, excelling in clustering effectiveness and robustness, including adaptability to noise, stability, and scalability.

Via

Access Paper or Ask Questions

Identifying Influential nodes in Brain Networks via Self-Supervised Graph-Transformer

Sep 17, 2024

Yanqing Kang, Di Zhu, Haiyang Zhang, Enze Shi, Sigang Yu, Jinru Wu, Xuhui Wang, Xuan Liu, Geng Chen, Xi Jiang(+2 more)

Abstract:Studying influential nodes (I-nodes) in brain networks is of great significance in the field of brain imaging. Most existing studies consider brain connectivity hubs as I-nodes. However, this approach relies heavily on prior knowledge from graph theory, which may overlook the intrinsic characteristics of the brain network, especially when its architecture is not fully understood. In contrast, self-supervised deep learning can learn meaningful representations directly from the data. This approach enables the exploration of I-nodes for brain networks, which is also lacking in current studies. This paper proposes a Self-Supervised Graph Reconstruction framework based on Graph-Transformer (SSGR-GT) to identify I-nodes, which has three main characteristics. First, as a self-supervised model, SSGR-GT extracts the importance of brain nodes to the reconstruction. Second, SSGR-GT uses Graph-Transformer, which is well-suited for extracting features from brain graphs, combining both local and global characteristics. Third, multimodal analysis of I-nodes uses graph-based fusion technology, combining functional and structural brain information. The I-nodes we obtained are distributed in critical areas such as the superior frontal lobe, lateral parietal lobe, and lateral occipital lobe, with a total of 56 identified across different experiments. These I-nodes are involved in more brain networks than other regions, have longer fiber connections, and occupy more central positions in structural connectivity. They also exhibit strong connectivity and high node efficiency in both functional and structural networks. Furthermore, there is a significant overlap between the I-nodes and both the structural and functional rich-club. These findings enhance our understanding of the I-nodes within the brain network, and provide new insights for future research in further understanding the brain working mechanisms.

Via

Access Paper or Ask Questions

Exploring Query Understanding for Amazon Product Search

Aug 05, 2024

Chen Luo, Xianfeng Tang, Hanqing Lu, Yaochen Xie, Hui Liu, Zhenwei Dai, Limeng Cui, Ashutosh Joshi, Sreyashi Nag, Yang Li(+5 more)

Figure 1 for Exploring Query Understanding for Amazon Product Search

Figure 2 for Exploring Query Understanding for Amazon Product Search

Figure 3 for Exploring Query Understanding for Amazon Product Search

Figure 4 for Exploring Query Understanding for Amazon Product Search

Abstract:Online shopping platforms, such as Amazon, offer services to billions of people worldwide. Unlike web search or other search engines, product search engines have their unique characteristics, primarily featuring short queries which are mostly a combination of product attributes and structured product search space. The uniqueness of product search underscores the crucial importance of the query understanding component. However, there are limited studies focusing on exploring this impact within real-world product search engines. In this work, we aim to bridge this gap by conducting a comprehensive study and sharing our year-long journey investigating how the query understanding service impacts Amazon Product Search. Firstly, we explore how query understanding-based ranking features influence the ranking process. Next, we delve into how the query understanding system contributes to understanding the performance of a ranking model. Building on the insights gained from our study on the evaluation of the query understanding-based ranking model, we propose a query understanding-based multi-task learning framework for ranking. We present our studies and investigations using the real-world system on Amazon Search.

Via

Access Paper or Ask Questions

Illumination Design for Joint Imaging and Wireless Power Transfer Systems

Aug 01, 2024

Qianyu Yang, Haiyang Zhang, Chunguo Li, Ruiqi Liu, Baoyun Wang

Figure 1 for Illumination Design for Joint Imaging and Wireless Power Transfer Systems

Figure 2 for Illumination Design for Joint Imaging and Wireless Power Transfer Systems

Figure 3 for Illumination Design for Joint Imaging and Wireless Power Transfer Systems

Figure 4 for Illumination Design for Joint Imaging and Wireless Power Transfer Systems

Abstract:This paper presents a novel concept termed Integrated Imaging and Wireless Power Transfer (IWPT), wherein the integration of imaging and wireless power transfer functionalities is achieved on a unified hardware platform. IWPT leverages a transmitting array to efficiently illuminate a specific Region of Interest (ROI), enabling the extraction of ROI's scattering coefficients while concurrently providing wireless power to nearby users. The integration of IWPT offers compelling advantages, including notable reductions in power consumption and spectrum utilization, pivotal for the optimization of future 6G wireless networks. As an initial investigation, we explore two antenna architectures: a fully digital array and a digital/analog hybrid array. Our goal is to characterize the fundamental trade-off between imaging and wireless power transfer by optimizing the illumination signal. With imaging operating in the near-field, we formulate the illumination signal design as an optimization problem that minimizes the condition number of the equivalent channel. To address this optimization problem, we propose an semi-definite relaxation-based approach for the fully digital array and an alternating optimization algorithm for the hybrid array. Finally, numerical results verify the effectiveness of our proposed solutions and demonstrate the trade-off between imaging and wireless power transfer.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

Beam Focusing for Near-Field Multi-User Localization

Jul 24, 2024

Qianyu Yang, Anna Guerra, Francesco Guidi, Nir Shlezinger, Haiyang Zhang, Davide Dardari, Baoyun Wang, Yonina C. Eldar

Figure 1 for Beam Focusing for Near-Field Multi-User Localization

Figure 2 for Beam Focusing for Near-Field Multi-User Localization

Figure 3 for Beam Focusing for Near-Field Multi-User Localization

Figure 4 for Beam Focusing for Near-Field Multi-User Localization

Abstract:Extremely large-scale antenna arrays are poised to play a pivotal role in sixth-generation (6G) networks. Utilizing such arrays often results in a near-field spherical wave transmission environment, enabling the generation of focused beams, which introduces new degrees of freedom for wireless localization. In this paper, we consider a beam-focusing design for localizing multiple sources in the radiating near-field. Our formulation accommodates various expected types of implementations of large antenna arrays, including hybrid analog/digital architectures and dynamic metasurface antennas (DMAs). We consider a direct localization estimation method exploiting curvature-of-arrival of impinging spherical wavefront to obtain user positions. In this regard, we adopt a two-stage approach configuring the array to optimize near-field positioning. In the first step, we focus only on adjusting the array coefficients to minimize the estimation error. We obtain a closed-form approximate solution based on projection and the better one based on the Riemann gradient algorithm. We then extend this approach to simultaneously localize and focus the beams via a sub-optimal iterative approach that does not rely on such knowledge. The simulation results show that near-field localization accuracy based on a hybrid array or DMA can achieve performance close to that of fully digital arrays at a lower cost, and DMAs can attain better performance than hybrid solutions with the same aperture.

* 13 pages, 11 figures

Via

Access Paper or Ask Questions

Holographic Intelligence Surface Assisted Integrated Sensing and Communication

Jun 07, 2024

Zhuoyang Liu, Yuchen Zhang, Haiyang Zhang, Feng Xu, Yonina C. Eldar

Figure 1 for Holographic Intelligence Surface Assisted Integrated Sensing and Communication

Figure 2 for Holographic Intelligence Surface Assisted Integrated Sensing and Communication

Figure 3 for Holographic Intelligence Surface Assisted Integrated Sensing and Communication

Figure 4 for Holographic Intelligence Surface Assisted Integrated Sensing and Communication

Abstract:Traditional discrete-array-based systems fail to exploit interactions between closely spaced antennas, resulting in inadequate utilization of the aperture resource. In this paper, we propose a holographic intelligence surface (HIS) assisted integrated sensing and communication (HISAC) system, wherein both the transmitter and receiver are fabricated using a continuous-aperture array. A continuous-discrete transformation of the HIS pattern based on the Fourier transform is proposed, converting the continuous pattern design into a discrete beamforming design. We formulate a joint transmit-receive beamforming optimization problem for the HISAC system, aiming to balance the performance of multi-target sensing while fulfilling the performance requirement of multi-user communication. To solve the non-convex problem with coupled variables, an alternating optimization-based algorithm is proposed to optimize the HISAC transmit-receive beamforming in an alternate manner. Specifically, the transmit beamforming design is solved by decoupling into a series of feasibility-checking sub-problems while the receive beamforming is determined by the Rayleigh quotient-based method. Simulation results demonstrate the superiority of the proposed HISAC system over traditional discrete-array-based ISAC systems, achieving significantly higher sensing performance while guaranteeing predetermined communication performance.

Via

Access Paper or Ask Questions