Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hsin Wen Liu

From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks

Aug 29, 2018

Avikalp Srivastava, Hsin Wen Liu, Sumio Fujita

Figure 1 for From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks

Figure 2 for From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks

Figure 3 for From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks

Figure 4 for From VQA to Multimodal CQA: Adapting Visual QA Models for Community QA Tasks

Abstract:In this work, we present novel methods to adapt visual QA models for community QA tasks of practical significance - automated question category classification and finding experts for question answering - on questions containing both text and image. To the best of our knowledge, this is the first work to tackle the multimodality challenge in CQA, and is an enabling step towards basic question-answering on image-based CQA. First, we analyze the differences between visual QA and community QA datasets, discussing the limitations of applying VQA models directly to CQA tasks, and then we propose novel augmentations to VQA-based models to best address those limitations. Our model, with the augmentations of an image-text combination method tailored for CQA and use of auxiliary tasks for learning better grounding features, significantly outperforms the text-only and VQA model baselines for both tasks on real-world CQA data from Yahoo! Chiebukuro, a Japanese counterpart of Yahoo! Answers.

* Submitted for review at AAAI 2019

Via

Access Paper or Ask Questions