Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Soichiro Murakami

AdParaphrase: Paraphrase Dataset for Analyzing Linguistic Features toward Generating Attractive Ad Texts

Feb 07, 2025

Soichiro Murakami, Peinan Zhang, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura

Abstract:Effective linguistic choices that attract potential customers play crucial roles in advertising success. This study aims to explore the linguistic features of ad texts that influence human preferences. Although the creation of attractive ad texts is an active area of research, progress in understanding the specific linguistic features that affect attractiveness is hindered by several obstacles. First, human preferences are complex and influenced by multiple factors, including their content, such as brand names, and their linguistic styles, making analysis challenging. Second, publicly available ad text datasets that include human preferences are lacking, such as ad performance metrics and human feedback, which reflect people's interests. To address these problems, we present AdParaphrase, a paraphrase dataset that contains human preferences for pairs of ad texts that are semantically equivalent but differ in terms of wording and style. This dataset allows for preference analysis that focuses on the differences in linguistic features. Our analysis revealed that ad texts preferred by human judges have higher fluency, longer length, more nouns, and use of bracket symbols. Furthermore, we demonstrate that an ad text-generation model that considers these findings significantly improves the attractiveness of a given text. The dataset is publicly available at: https://github.com/CyberAgentAILab/AdParaphrase.

* Accepted to NAACL2025 Findings

Via

Access Paper or Ask Questions

FaithCAMERA: Construction of a Faithful Dataset for Ad Text Generation

Oct 04, 2024

Akihiko Kato, Masato Mita, Soichiro Murakami, Ukyo Honda, Sho Hoshino, Peinan Zhang

Figure 1 for FaithCAMERA: Construction of a Faithful Dataset for Ad Text Generation

Figure 2 for FaithCAMERA: Construction of a Faithful Dataset for Ad Text Generation

Figure 3 for FaithCAMERA: Construction of a Faithful Dataset for Ad Text Generation

Figure 4 for FaithCAMERA: Construction of a Faithful Dataset for Ad Text Generation

Abstract:In ad text generation (ATG), desirable ad text is both faithful and informative. That is, it should be faithful to the input document, while at the same time containing important information that appeals to potential customers. The existing evaluation data, CAMERA (arXiv:2309.12030), is suitable for evaluating informativeness, as it consists of reference ad texts created by ad creators. However, these references often include information unfaithful to the input, which is a notable obstacle in promoting ATG research. In this study, we collaborate with in-house ad creators to refine the CAMERA references and develop an alternative ATG evaluation dataset called FaithCAMERA, in which the faithfulness of references is guaranteed. Using FaithCAMERA, we can evaluate how well existing methods for improving faithfulness can generate informative ad text while maintaining faithfulness. Our experiments show that removing training data that contains unfaithful entities improves the faithfulness and informativeness at the entity level, but decreases both at the sentence level. This result suggests that for future ATG research, it is essential not only to scale the training data but also to ensure their faithfulness. Our dataset will be publicly available.

* For dataset, see https://github.com/CyberAgentAILab/FaithCAMERA

Via

Access Paper or Ask Questions

Cross-lingual Transfer or Machine Translation? On Data Augmentation for Monolingual Semantic Textual Similarity

Mar 08, 2024

Sho Hoshino, Akihiko Kato, Soichiro Murakami, Peinan Zhang

Abstract:Learning better sentence embeddings leads to improved performance for natural language understanding tasks including semantic textual similarity (STS) and natural language inference (NLI). As prior studies leverage large-scale labeled NLI datasets for fine-tuning masked language models to yield sentence embeddings, task performance for languages other than English is often left behind. In this study, we directly compared two data augmentation techniques as potential solutions for monolingual STS: (a) cross-lingual transfer that exploits English resources alone as training data to yield non-English sentence embeddings as zero-shot inference, and (b) machine translation that coverts English data into pseudo non-English training data in advance. In our experiments on monolingual STS in Japanese and Korean, we find that the two data techniques yield performance on par. Rather, we find a superiority of the Wikipedia domain over the NLI domain for these languages, in contrast to prior studies that focused on NLI as training data. Combining our findings, we demonstrate that the cross-lingual transfer of Wikipedia data exhibits improved performance, and that native Wikipedia data can further improve performance for monolingual STS.

* LREC-COLING 2024

Via

Access Paper or Ask Questions

CAMERA: A Multimodal Dataset and Benchmark for Ad Text Generation

Sep 21, 2023

Masato Mita, Soichiro Murakami, Akihiko Kato, Peinan Zhang

Figure 1 for CAMERA: A Multimodal Dataset and Benchmark for Ad Text Generation

Figure 2 for CAMERA: A Multimodal Dataset and Benchmark for Ad Text Generation

Figure 3 for CAMERA: A Multimodal Dataset and Benchmark for Ad Text Generation

Figure 4 for CAMERA: A Multimodal Dataset and Benchmark for Ad Text Generation

Abstract:In response to the limitations of manual online ad production, significant research has been conducted in the field of automatic ad text generation (ATG). However, comparing different methods has been challenging because of the lack of benchmarks encompassing the entire field and the absence of well-defined problem sets with clear model inputs and outputs. To address these challenges, this paper aims to advance the field of ATG by introducing a redesigned task and constructing a benchmark. Specifically, we defined ATG as a cross-application task encompassing various aspects of the Internet advertising. As part of our contribution, we propose a first benchmark dataset, CA Multimodal Evaluation for Ad Text GeneRAtion (CAMERA), carefully designed for ATG to be able to leverage multi-modal information and conduct an industry-wise evaluation. Furthermore, we demonstrate the usefulness of our proposed benchmark through evaluation experiments using multiple baseline models, which vary in terms of the type of pre-trained language model used and the incorporation of multi-modal information. We also discuss the current state of the task and the future challenges.

* 13 pages

Via

Access Paper or Ask Questions

Natural Language Generation for Advertising: A Survey

Jun 22, 2023

Soichiro Murakami, Sho Hoshino, Peinan Zhang

Abstract:Natural language generation methods have emerged as effective tools to help advertisers increase the number of online advertisements they produce. This survey entails a review of the research trends on this topic over the past decade, from template-based to extractive and abstractive approaches using neural networks. Additionally, key challenges and directions revealed through the survey, including metric optimization, faithfulness, diversity, multimodality, and the development of benchmark datasets, are discussed.

Via

Access Paper or Ask Questions

Aspect-based Analysis of Advertising Appeals for Search Engine Advertising

Apr 25, 2022

Soichiro Murakami, Peinan Zhang, Sho Hoshino, Hidetaka Kamigaito, Hiroya Takamura, Manabu Okumura

Figure 1 for Aspect-based Analysis of Advertising Appeals for Search Engine Advertising

Figure 2 for Aspect-based Analysis of Advertising Appeals for Search Engine Advertising

Figure 3 for Aspect-based Analysis of Advertising Appeals for Search Engine Advertising

Figure 4 for Aspect-based Analysis of Advertising Appeals for Search Engine Advertising

Abstract:Writing an ad text that attracts people and persuades them to click or act is essential for the success of search engine advertising. Therefore, ad creators must consider various aspects of advertising appeals (A$^3$) such as the price, product features, and quality. However, products and services exhibit unique effective A$^3$ for different industries. In this work, we focus on exploring the effective A$^3$ for different industries with the aim of assisting the ad creation process. To this end, we created a dataset of advertising appeals and used an existing model that detects various aspects for ad texts. Our experiments demonstrated that different industries have their own effective A$^3$ and that the identification of the A$^3$ contributes to the estimation of advertising performance.

* Accepted by NAACL-HLT2022 Industry track

Via

Access Paper or Ask Questions

NTT's Machine Translation Systems for WMT19 Robustness Task

Jul 09, 2019

Soichiro Murakami, Makoto Morishita, Tsutomu Hirao, Masaaki Nagata

Figure 1 for NTT's Machine Translation Systems for WMT19 Robustness Task

Figure 2 for NTT's Machine Translation Systems for WMT19 Robustness Task

Figure 3 for NTT's Machine Translation Systems for WMT19 Robustness Task

Figure 4 for NTT's Machine Translation Systems for WMT19 Robustness Task

Abstract:This paper describes NTT's submission to the WMT19 robustness task. This task mainly focuses on translating noisy text (e.g., posts on Twitter), which presents different difficulties from typical translation tasks such as news. Our submission combined techniques including utilization of a synthetic corpus, domain adaptation, and a placeholder mechanism, which significantly improved over the previous baseline. Experimental results revealed the placeholder mechanism, which temporarily replaces the non-standard tokens including emojis and emoticons with special placeholder tokens during translation, improves translation accuracy even with noisy texts.

* submitted to WMT 2019

Via

Access Paper or Ask Questions