Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sebastian Jäger

Automated Extraction of Fine-Grained Standardized Product Information from Unstructured Multilingual Web Data

Feb 23, 2023

Alexander Flick, Sebastian Jäger, Ivana Trajanovska, Felix Biessmann

Abstract:Extracting structured information from unstructured data is one of the key challenges in modern information retrieval applications, including e-commerce. Here, we demonstrate how recent advances in machine learning, combined with a recently published multilingual data set with standardized fine-grained product category information, enable robust product attribute extraction in challenging transfer learning settings. Our models can reliably predict product attributes across online shops, languages, or both. Furthermore, we show that our models can be used to match product taxonomies between online retailers.

* ECIR 2023 Demo Track

Via

Access Paper or Ask Questions

GreenDB -- A Dataset and Benchmark for Extraction of Sustainability Information of Consumer Goods

Jul 29, 2022

Alexander Flick, Sebastian Jäger, Jessica Adriana Sanchez Garcia, Kaspar von den Driesch, Karl Brendel, Felix Biessmann

Figure 1 for GreenDB -- A Dataset and Benchmark for Extraction of Sustainability Information of Consumer Goods

Figure 2 for GreenDB -- A Dataset and Benchmark for Extraction of Sustainability Information of Consumer Goods

Abstract:The production, shipping, usage, and disposal of consumer goods have a substantial impact on greenhouse gas emissions and the depletion of resources. Machine Learning (ML) can help to foster sustainable consumption patterns by accounting for sustainability aspects in product search or recommendations of modern retail platforms. However, the lack of large high quality publicly available product data with trustworthy sustainability information impedes the development of ML technology that can help to reach our sustainability goals. Here we present GreenDB, a database that collects products from European online shops on a weekly basis. As proxy for the products' sustainability, it relies on sustainability labels, which are evaluated by experts. The GreenDB schema extends the well-known schema.org Product definition and can be readily integrated into existing product catalogs. We present initial results demonstrating that ML models trained with our data can reliably (F1 score 96%) predict the sustainability label of products. These contributions can help to complement existing e-commerce experiences and ultimately encourage users to more sustainable consumption patterns.

* Presented at DataPerf Workshop at the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, 2022

Via

Access Paper or Ask Questions

GreenDB: Toward a Product-by-Product Sustainability Database

May 05, 2022

Sebastian Jäger, Jessica Greene, Max Jakob, Ruben Korenke, Tilman Santarius, Felix Biessmann

Figure 1 for GreenDB: Toward a Product-by-Product Sustainability Database

Figure 2 for GreenDB: Toward a Product-by-Product Sustainability Database

Figure 3 for GreenDB: Toward a Product-by-Product Sustainability Database

Abstract:The production, shipping, usage, and disposal of consumer goods have a substantial impact on greenhouse gas emissions and the depletion of resources. Modern retail platforms rely heavily on Machine Learning (ML) for their search and recommender systems. Thus, ML can potentially support efforts towards more sustainable consumption patterns, for example, by accounting for sustainability aspects in product search or recommendations. However, leveraging ML potential for reaching sustainability goals requires data on sustainability. Unfortunately, no open and publicly available database integrates sustainability information on a product-by-product basis. In this work, we present the GreenDB, which fills this gap. Based on search logs of millions of users, we prioritize which products users care about most. The GreenDB schema extends the well-known schema.org Product definition and can be readily integrated into existing product catalogs to improve sustainability information available for search and recommendation experiences. We present our proof of concept implementation of a scraping system that creates the GreenDB dataset.

Via

Access Paper or Ask Questions