Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens

Apr 05, 2023

Zhanyu Wang, Lingqiao Liu, Lei Wang, Luping Zhou

Figure 1 for METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens

Figure 2 for METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens

Figure 3 for METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens

Figure 4 for METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens

Share this with someone who'll enjoy it:

Abstract:In clinical scenarios, multi-specialist consultation could significantly benefit the diagnosis, especially for intricate cases. This inspires us to explore a "multi-expert joint diagnosis" mechanism to upgrade the existing "single expert" framework commonly seen in the current literature. To this end, we propose METransformer, a method to realize this idea with a transformer-based backbone. The key design of our method is the introduction of multiple learnable "expert" tokens into both the transformer encoder and decoder. In the encoder, each expert token interacts with both vision tokens and other expert tokens to learn to attend different image regions for image representation. These expert tokens are encouraged to capture complementary information by an orthogonal loss that minimizes their overlap. In the decoder, each attended expert token guides the cross-attention between input words and visual tokens, thus influencing the generated report. A metrics-based expert voting strategy is further developed to generate the final report. By the multi-experts concept, our model enjoys the merits of an ensemble-based approach but through a manner that is computationally more efficient and supports more sophisticated interactions among experts. Experimental results demonstrate the promising performance of our proposed model on two widely used benchmarks. Last but not least, the framework-level innovation makes our work ready to incorporate advances on existing "single-expert" models to further improve its performance.

* Accepted by CVPR2023

View paper on

Share this with someone who'll enjoy it:

Title:METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens

Paper and Code