Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

Mar 28, 2022

Bumsoo Kim, Jonghwan Mun, Kyoung-Woon On, Minchul Shin, Junhyun Lee, Eun-Sol Kim

Figure 1 for MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

Figure 2 for MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

Figure 3 for MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

Figure 4 for MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

Share this with someone who'll enjoy it:

Abstract:Human-Object Interaction (HOI) detection is the task of identifying a set of <human, object, interaction> triplets from an image. Recent work proposed transformer encoder-decoder architectures that successfully eliminated the need for many hand-designed components in HOI detection through end-to-end training. However, they are limited to single-scale feature resolution, providing suboptimal performance in scenes containing humans, objects and their interactions with vastly different scales and distances. To tackle this problem, we propose a Multi-Scale TRansformer (MSTR) for HOI detection powered by two novel HOI-aware deformable attention modules called Dual-Entity attention and Entity-conditioned Context attention. While existing deformable attention comes at a huge cost in HOI detection performance, our proposed attention modules of MSTR learn to effectively attend to sampling points that are essential to identify interactions. In experiments, we achieve the new state-of-the-art performance on two HOI detection benchmarks.

* CVPR 2022

View paper on

Share this with someone who'll enjoy it:

Title:MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

Paper and Code