Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Soohyuk Jang

Superpixel Tokenization for Vision Transformers: Preserving Semantic Integrity in Visual Tokens

Dec 06, 2024

Jaihyun Lew, Soohyuk Jang, Jaehoon Lee, Seungryong Yoo, Eunji Kim, Saehyung Lee, Jisoo Mok, Siwon Kim, Sungroh Yoon

Figure 1 for Superpixel Tokenization for Vision Transformers: Preserving Semantic Integrity in Visual Tokens

Figure 2 for Superpixel Tokenization for Vision Transformers: Preserving Semantic Integrity in Visual Tokens

Figure 3 for Superpixel Tokenization for Vision Transformers: Preserving Semantic Integrity in Visual Tokens

Figure 4 for Superpixel Tokenization for Vision Transformers: Preserving Semantic Integrity in Visual Tokens

Abstract:Transformers, a groundbreaking architecture proposed for Natural Language Processing (NLP), have also achieved remarkable success in Computer Vision. A cornerstone of their success lies in the attention mechanism, which models relationships among tokens. While the tokenization process in NLP inherently ensures that a single token does not contain multiple semantics, the tokenization of Vision Transformer (ViT) utilizes tokens from uniformly partitioned square image patches, which may result in an arbitrary mixing of visual concepts in a token. In this work, we propose to substitute the grid-based tokenization in ViT with superpixel tokenization, which employs superpixels to generate a token that encapsulates a sole visual concept. Unfortunately, the diverse shapes, sizes, and locations of superpixels make integrating superpixels into ViT tokenization rather challenging. Our tokenization pipeline, comprised of pre-aggregate extraction and superpixel-aware aggregation, overcomes the challenges that arise in superpixel tokenization. Extensive experiments demonstrate that our approach, which exhibits strong compatibility with existing frameworks, enhances the accuracy and robustness of ViT on various downstream tasks.

Via

Access Paper or Ask Questions

FedRN: Exploiting k-Reliable Neighbors Towards Robust Federated Learning

May 03, 2022

SangMook Kim, Wonyoung Shin, Soohyuk Jang, Hwanjun Song, Se-Young Yun

Figure 1 for FedRN: Exploiting k-Reliable Neighbors Towards Robust Federated Learning

Figure 2 for FedRN: Exploiting k-Reliable Neighbors Towards Robust Federated Learning

Figure 3 for FedRN: Exploiting k-Reliable Neighbors Towards Robust Federated Learning

Figure 4 for FedRN: Exploiting k-Reliable Neighbors Towards Robust Federated Learning

Abstract:Robustness is becoming another important challenge of federated learning in that the data collection process in each client is naturally accompanied by noisy labels. However, it is far more complex and challenging owing to varying levels of data heterogeneity and noise over clients, which exacerbates the client-to-client performance discrepancy. In this work, we propose a robust federated learning method called FedRN, which exploits k-reliable neighbors with high data expertise or similarity. Our method helps mitigate the gap between low- and high-performance clients by training only with a selected set of clean examples, identified by their ensembled mixture models. We demonstrate the superiority of FedRN via extensive evaluations on three real-world or synthetic benchmark datasets. Compared with existing robust training methods, the results show that FedRN significantly improves the test accuracy in the presence of noisy labels.

Via

Access Paper or Ask Questions