Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alexander Flick

Automated Extraction of Fine-Grained Standardized Product Information from Unstructured Multilingual Web Data

Feb 23, 2023

Alexander Flick, Sebastian Jäger, Ivana Trajanovska, Felix Biessmann

Abstract:Extracting structured information from unstructured data is one of the key challenges in modern information retrieval applications, including e-commerce. Here, we demonstrate how recent advances in machine learning, combined with a recently published multilingual data set with standardized fine-grained product category information, enable robust product attribute extraction in challenging transfer learning settings. Our models can reliably predict product attributes across online shops, languages, or both. Furthermore, we show that our models can be used to match product taxonomies between online retailers.

* ECIR 2023 Demo Track

Via

Access Paper or Ask Questions

GreenDB -- A Dataset and Benchmark for Extraction of Sustainability Information of Consumer Goods

Jul 29, 2022

Alexander Flick, Sebastian Jäger, Jessica Adriana Sanchez Garcia, Kaspar von den Driesch, Karl Brendel, Felix Biessmann

Figure 1 for GreenDB -- A Dataset and Benchmark for Extraction of Sustainability Information of Consumer Goods

Figure 2 for GreenDB -- A Dataset and Benchmark for Extraction of Sustainability Information of Consumer Goods

Abstract:The production, shipping, usage, and disposal of consumer goods have a substantial impact on greenhouse gas emissions and the depletion of resources. Machine Learning (ML) can help to foster sustainable consumption patterns by accounting for sustainability aspects in product search or recommendations of modern retail platforms. However, the lack of large high quality publicly available product data with trustworthy sustainability information impedes the development of ML technology that can help to reach our sustainability goals. Here we present GreenDB, a database that collects products from European online shops on a weekly basis. As proxy for the products' sustainability, it relies on sustainability labels, which are evaluated by experts. The GreenDB schema extends the well-known schema.org Product definition and can be readily integrated into existing product catalogs. We present initial results demonstrating that ML models trained with our data can reliably (F1 score 96%) predict the sustainability label of products. These contributions can help to complement existing e-commerce experiences and ultimately encourage users to more sustainable consumption patterns.

* Presented at DataPerf Workshop at the 39th International Conference on Machine Learning, Baltimore, Maryland, USA, 2022

Via

Access Paper or Ask Questions