Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ben Chen

OneSug: The Unified End-to-End Generative Framework for E-commerce Query Suggestion

Jun 07, 2025

Xian Guo, Ben Chen, Siyuan Wang, Ying Yang, Chenyi Lei, Yuqing Ding, Han Li

Abstract:Query suggestion plays a crucial role in enhancing user experience in e-commerce search systems by providing relevant query recommendations that align with users' initial input. This module helps users navigate towards personalized preference needs and reduces typing effort, thereby improving search experience. Traditional query suggestion modules usually adopt multi-stage cascading architectures, for making a well trade-off between system response time and business conversion. But they often suffer from inefficiencies and suboptimal performance due to inconsistent optimization objectives across stages. To address these, we propose OneSug, the first end-to-end generative framework for e-commerce query suggestion. OneSug incorporates a prefix2query representation enhancement module to enrich prefixes using semantically and interactively related queries to bridge content and business characteristics, an encoder-decoder generative model that unifies the query suggestion process, and a reward-weighted ranking strategy with behavior-level weights to capture fine-grained user preferences. Extensive evaluations on large-scale industry datasets demonstrate OneSug's ability for effective and efficient query suggestion. Furthermore, OneSug has been successfully deployed for the entire traffic on the e-commerce search engine in Kuaishou platform for over 1 month, with statistically significant improvements in user top click position (-9.33%), CTR (+2.01%), Order (+2.04%), and Revenue (+1.69%) over the online multi-stage strategy, showing great potential in e-commercial conversion.

* 11 pages, 8 figures, and 6 tables

Via

Access Paper or Ask Questions

Use Cases for Terahertz Communications: An Industrial Perspective

Jan 07, 2025

Tommaso Zugno, Cristina Ciochina, Sharad Sambhwani, Patrick Svedman, Luis M. Pessoa, Ben Chen, Per Hjalmar Lehne, Mate Boban, Thomas Kürner

Figure 1 for Use Cases for Terahertz Communications: An Industrial Perspective

Figure 2 for Use Cases for Terahertz Communications: An Industrial Perspective

Figure 3 for Use Cases for Terahertz Communications: An Industrial Perspective

Figure 4 for Use Cases for Terahertz Communications: An Industrial Perspective

Abstract:Thanks to the vast amount of available resources and unique propagation properties, terahertz (THz) frequency bands are viewed as a key enabler for achieving ultrahigh communication performance and precise sensing capabilities in future wireless systems. Recently, the European Telecommunications Standards Institute (ETSI) initiated an Industry Specification Group (ISG) on THz which aims at establishing the technical foundation for subsequent standardization of this technology, which is pivotal for its successful integration into future networks. Starting from the work recently finalized within this group, this paper provides an industrial perspective on potential use cases and frequency bands of interest for THz communication systems. We first identify promising frequency bands in the 100 GHz - 1 THz range, offering over 500 GHz of available spectrum that can be exploited to unlock the full potential of THz communications. Then, we present key use cases and application areas for THz communications, emphasizing the role of this technology and its advantages over other frequency bands. We discuss their target requirements and show that some applications demand for multi-Tbps data rates, latency below 0.5 ms, and sensing accuracy down to 0.5 cm. Additionally, we identify the main deployment scenarios and outline other enabling technologies crucial for overcoming the challenges faced by THz system. Finally, we summarize the past and ongoing standardization efforts focusing on THz communications, while also providing an outlook towards the inclusion of this technology as an integral part of the future sixth generation (6G) and beyond communication networks.

* This work has been accepted for publication in the IEEE Wireless Communications. Copyright with IEEE. For more details, see the IEEE Copyright Policy

Via

Access Paper or Ask Questions

MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

Dec 10, 2024

Yufei Ma, Zihan Liang, Huangyu Dai, Ben Chen, Dehong Gao, Zhuoran Ran, Wang Zihan, Linbo Jin, Wen Jiang, Guannan Zhang(+2 more)

Figure 1 for MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

Figure 2 for MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

Figure 3 for MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

Figure 4 for MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

Abstract:The growing demand for larger-scale models in the development of \textbf{L}arge \textbf{L}anguage \textbf{M}odels (LLMs) poses challenges for efficient training within limited computational resources. Traditional fine-tuning methods often exhibit instability in multi-task learning and rely heavily on extensive training resources. Here, we propose MoDULA (\textbf{M}ixture \textbf{o}f \textbf{D}omain-Specific and \textbf{U}niversal \textbf{L}oR\textbf{A}), a novel \textbf{P}arameter \textbf{E}fficient \textbf{F}ine-\textbf{T}uning (PEFT) \textbf{M}ixture-\textbf{o}f-\textbf{E}xpert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. The paradigm effectively improves the multi-task capability of the model by training universal experts, domain-specific experts, and routers separately. MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model's general capability by connecting universal and task-specific experts through residual connections. The experimental results demonstrate that the overall performance of the MoDULA-Flan and MoDULA-Res methods surpasses that of existing fine-tuning methods on various LLMs. Notably, MoDULA-Res achieves more significant performance improvements in multiple tasks while reducing training costs by over 80\% without losing general capability. Moreover, MoDULA displays flexible pluggability, allowing for the efficient addition of new tasks without retraining existing experts from scratch. This progressive training paradigm circumvents data balancing issues, enhancing training efficiency and model stability. Overall, MoDULA provides a scalable, cost-effective solution for fine-tuning LLMs with enhanced parameter efficiency and generalization capability.

Via

Access Paper or Ask Questions

Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translation

Sep 30, 2024

Huangyu Dai, Ben Chen, Kaidi Chen, Ying Han, Zihan Liang, Wen Jiang

Figure 1 for Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translation

Figure 2 for Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translation

Figure 3 for Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translation

Figure 4 for Contrastive Token Learning with Similarity Decay for Repetition Suppression in Machine Translation

Abstract:For crosslingual conversation and trade, Neural Machine Translation (NMT) is pivotal yet faces persistent challenges with monotony and repetition in generated content. Traditional solutions that rely on penalizing text redundancy or token reoccurrence have shown limited efficacy, particularly for lengthy article and e-commerce descriptions with inherent redundancy, even with the advent of Large Language Models (LLMs). This paper investigates the underlying causes of textual repetition through the lens of information entropy, attributing the phenomenon to the elevated uncertainty within the input text. To address this, a novel algorithm named Contrastive Token Learning with Similarity Decay (CTSD) is introduced, which modulates the suppression of tokens dynamically, informed by varying attention weights and inter-token distances. Furthermore, an e-commerce dataset comprised of title texts of online real items is compiled and released susceptible to hallucination translations to benchmark the algorithm. Extensive evaluations demonstrate that CTSD significantly outperforms existing approaches in precision and generalizability. Additional online A/B testing underscores its practical value, showing marked improvements in user engagement and conversion. Notably, this method has been implemented with full traffic on eight multilingual sites of alibaba.com, the largest B2B e-commerce platform in the world.

* Accepted by EMNLP'24 Findings. 12 pages, 4 figures, 9 tables

Via

Access Paper or Ask Questions

UV-Mamba: A DCN-Enhanced State Space Model for Urban Village Boundary Identification in High-Resolution Remote Sensing Images

Sep 05, 2024

Lulin Li, Ben Chen, Xuechao Zou, Junliang Xing, Pin Tao

Figure 1 for UV-Mamba: A DCN-Enhanced State Space Model for Urban Village Boundary Identification in High-Resolution Remote Sensing Images

Figure 2 for UV-Mamba: A DCN-Enhanced State Space Model for Urban Village Boundary Identification in High-Resolution Remote Sensing Images

Figure 3 for UV-Mamba: A DCN-Enhanced State Space Model for Urban Village Boundary Identification in High-Resolution Remote Sensing Images

Figure 4 for UV-Mamba: A DCN-Enhanced State Space Model for Urban Village Boundary Identification in High-Resolution Remote Sensing Images

Abstract:Owing to the diverse geographical environments, intricate landscapes, and high-density settlements, the automatic identification of urban village boundaries using remote sensing images is a highly challenging task. This paper proposes a novel and efficient neural network model called UV-Mamba for accurate boundary detection in high-resolution remote sensing images. UV-Mamba mitigates the memory loss problem in long sequence modeling, which arises in state space model (SSM) with increasing image size, by incorporating deformable convolutions (DCN). Its architecture utilizes an encoder-decoder framework, includes an encoder with four deformable state space augmentation (DSSA) blocks for efficient multi-level semantic extraction and a decoder to integrate the extracted semantic information. We conducted experiments on the Beijing and Xi'an datasets, and the results show that UV-Mamba achieves state-of-the-art performance. Specifically, our model achieves 73.3% and 78.1% IoU on the Beijing and Xi'an datasets, respectively, representing improvements of 1.2% and 3.4% IoU over the previous best model, while also being 6x faster in inference speed and 40x smaller in parameter count. Source code and pre-trained models are available in the supplementary material.

* 5 pages, 4 figures, 2 tables

Via

Access Paper or Ask Questions

Robust Interaction-based Relevance Modeling for Online E-Commerce and LLM-based Retrieval

Jun 04, 2024

Ben Chen, Huangyu Dai, Xiang Ma, Wen Jiang, Wei Ning

Abstract:Semantic relevance calculation is crucial for e-commerce search engines, as it ensures that the items selected closely align with customer intent. Inadequate attention to this aspect can detrimentally affect user experience and engagement. Traditional text-matching techniques are prevalent but often fail to capture the nuances of search intent accurately, so neural networks now have become a preferred solution to processing such complex text matching. Existing methods predominantly employ representation-based architectures, which strike a balance between high traffic capacity and low latency. However, they exhibit significant shortcomings in generalization and robustness when compared to interaction-based architectures. In this work, we introduce a robust interaction-based modeling paradigm to address these shortcomings. It encompasses 1) a dynamic length representation scheme for expedited inference, 2) a professional terms recognition method to identify subjects and core attributes from complex sentence structures, and 3) a contrastive adversarial training protocol to bolster the model's robustness and matching capabilities. Extensive offline evaluations demonstrate the superior robustness and effectiveness of our approach, and online A/B testing confirms its ability to improve relevance in the same exposure position, resulting in more clicks and conversions. To the best of our knowledge, this method is the first interaction-based approach for large e-commerce search relevance calculation. Notably, we have deployed it for the entire search traffic on alibaba.com, the largest B2B e-commerce platform in the world.

* Accepted by ECML-PKDD'24 as Outstanding Paper. 8 pages, 2 figures, 7 tables

Via

Access Paper or Ask Questions

A Parallel Attention Network for Cattle Face Recognition

Mar 29, 2024

Jiayu Li, Xuechao Zou, Shiying Wang, Ben Chen, Junliang Xing, Pin Tao

Figure 1 for A Parallel Attention Network for Cattle Face Recognition

Figure 2 for A Parallel Attention Network for Cattle Face Recognition

Figure 3 for A Parallel Attention Network for Cattle Face Recognition

Figure 4 for A Parallel Attention Network for Cattle Face Recognition

Abstract:Cattle face recognition holds paramount significance in domains such as animal husbandry and behavioral research. Despite significant progress in confined environments, applying these accomplishments in wild settings remains challenging. Thus, we create the first large-scale cattle face recognition dataset, ICRWE, for wild environments. It encompasses 483 cattle and 9,816 high-resolution image samples. Each sample undergoes annotation for face features, light conditions, and face orientation. Furthermore, we introduce a novel parallel attention network, PANet. Comprising several cascaded Transformer modules, each module incorporates two parallel Position Attention Modules (PAM) and Feature Mapping Modules (FMM). PAM focuses on local and global features at each image position through parallel channel attention, and FMM captures intricate feature patterns through non-linear mappings. Experimental results indicate that PANet achieves a recognition accuracy of 88.03% on the ICRWE dataset, establishing itself as the current state-of-the-art approach. The source code is available in the supplementary materials.

* Accepted by ICME 2024

Via

Access Paper or Ask Questions

General2Specialized LLMs Translation for E-commerce

Mar 06, 2024

Kaidi Chen, Ben Chen, Dehong Gao, Huangyu Dai, Wen Jiang, Wei Ning, Shanqing Yu, Libin Yang, Xiaoyan Cai

Abstract:Existing Neural Machine Translation (NMT) models mainly handle translation in the general domain, while overlooking domains with special writing formulas, such as e-commerce and legal documents. Taking e-commerce as an example, the texts usually include amounts of domain-related words and have more grammar problems, which leads to inferior performances of current NMT methods. To address these problems, we collect two domain-related resources, including a set of term pairs (aligned Chinese-English bilingual terms) and a parallel corpus annotated for the e-commerce domain. Furthermore, we propose a two-step fine-tuning paradigm (named G2ST) with self-contrastive semantic enhancement to transfer one general NMT model to the specialized NMT model for e-commerce. The paradigm can be used for the NMT models based on Large language models (LLMs). Extensive evaluations on real e-commerce titles demonstrate the superior translation quality and robustness of our G2ST approach, as compared with state-of-the-art NMT models such as LLaMA, Qwen, GPT-3.5, and even GPT-4.

* 4 pages, 1 figure, WWW2024 accepted

Via

Access Paper or Ask Questions

Accurate Leukocyte Detection Based on Deformable-DETR and Multi-Level Feature Fusion for Aiding Diagnosis of Blood Diseases

Jan 10, 2024

Yifei Chen, Chenyan Zhang, Ben Chen, Yiyu Huang, Yifei Sun, Changmiao Wang, Xianjun Fu, Yuxing Dai, Feiwei Qin, Yong Peng(+1 more)

Figure 1 for Accurate Leukocyte Detection Based on Deformable-DETR and Multi-Level Feature Fusion for Aiding Diagnosis of Blood Diseases

Figure 2 for Accurate Leukocyte Detection Based on Deformable-DETR and Multi-Level Feature Fusion for Aiding Diagnosis of Blood Diseases

Figure 3 for Accurate Leukocyte Detection Based on Deformable-DETR and Multi-Level Feature Fusion for Aiding Diagnosis of Blood Diseases

Figure 4 for Accurate Leukocyte Detection Based on Deformable-DETR and Multi-Level Feature Fusion for Aiding Diagnosis of Blood Diseases

Abstract:In standard hospital blood tests, the traditional process requires doctors to manually isolate leukocytes from microscopic images of patients' blood using microscopes. These isolated leukocytes are then categorized via automatic leukocyte classifiers to determine the proportion and volume of different types of leukocytes present in the blood samples, aiding disease diagnosis. This methodology is not only time-consuming and labor-intensive, but it also has a high propensity for errors due to factors such as image quality and environmental conditions, which could potentially lead to incorrect subsequent classifications and misdiagnosis. To address these issues, this paper proposes an innovative method of leukocyte detection: the Multi-level Feature Fusion and Deformable Self-attention DETR (MFDS-DETR). To tackle the issue of leukocyte scale disparity, we designed the High-level Screening-feature Fusion Pyramid (HS-FPN), enabling multi-level fusion. This model uses high-level features as weights to filter low-level feature information via a channel attention module and then merges the screened information with the high-level features, thus enhancing the model's feature expression capability. Further, we address the issue of leukocyte feature scarcity by incorporating a multi-scale deformable self-attention module in the encoder and using the self-attention and cross-deformable attention mechanisms in the decoder, which aids in the extraction of the global features of the leukocyte feature maps. The effectiveness, superiority, and generalizability of the proposed MFDS-DETR method are confirmed through comparisons with other cutting-edge leukocyte detection models using the private WBCDD, public LISC and BCCD datasets. Our source code and private WBCCD dataset are available at https://github.com/JustlfC03/MFDS-DETR.

* Computers in Biology and Medicine 2024
* 15 pages, 11 figures, accept Computers in Biology and Medicine 2024

Via

Access Paper or Ask Questions

High-Fidelity Lake Extraction via Two-Stage Prompt Enhancement: Establishing a Novel Baseline and Benchmark

Aug 16, 2023

Ben Chen, Xuechao Zou, Kai Li, Yu Zhang, Junliang Xing, Pin Tao

Figure 1 for High-Fidelity Lake Extraction via Two-Stage Prompt Enhancement: Establishing a Novel Baseline and Benchmark

Figure 2 for High-Fidelity Lake Extraction via Two-Stage Prompt Enhancement: Establishing a Novel Baseline and Benchmark

Figure 3 for High-Fidelity Lake Extraction via Two-Stage Prompt Enhancement: Establishing a Novel Baseline and Benchmark

Figure 4 for High-Fidelity Lake Extraction via Two-Stage Prompt Enhancement: Establishing a Novel Baseline and Benchmark

Abstract:The extraction of lakes from remote sensing images is a complex challenge due to the varied lake shapes and data noise. Current methods rely on multispectral image datasets, making it challenging to learn lake features accurately from pixel arrangements. This, in turn, affects model learning and the creation of accurate segmentation masks. This paper introduces a unified prompt-based dataset construction approach that provides approximate lake locations using point, box, and mask prompts. We also propose a two-stage prompt enhancement framework, LEPrompter, which involves prompt-based and prompt-free stages during training. The prompt-based stage employs a prompt encoder to extract prior information, integrating prompt tokens and image embeddings through self- and cross-attention in the prompt decoder. Prompts are deactivated once the model is trained to ensure independence during inference, enabling automated lake extraction. Evaluations on Surface Water and Qinghai-Tibet Plateau Lake datasets show consistent performance improvements compared to the previous state-of-the-art method. LEPrompter achieves mIoU scores of 91.48% and 97.43% on the respective datasets without introducing additional parameters or GFLOPs. Supplementary materials provide the source code, pre-trained models, and detailed user studies.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions