Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sangwon Kim

EQ-CBM: A Probabilistic Concept Bottleneck with Energy-based Models and Quantized Vectors

Sep 22, 2024

Sangwon Kim, Dasom Ahn, Byoung Chul Ko, In-su Jang, Kwang-Ju Kim

Figure 1 for EQ-CBM: A Probabilistic Concept Bottleneck with Energy-based Models and Quantized Vectors

Figure 2 for EQ-CBM: A Probabilistic Concept Bottleneck with Energy-based Models and Quantized Vectors

Figure 3 for EQ-CBM: A Probabilistic Concept Bottleneck with Energy-based Models and Quantized Vectors

Figure 4 for EQ-CBM: A Probabilistic Concept Bottleneck with Energy-based Models and Quantized Vectors

Abstract:The demand for reliable AI systems has intensified the need for interpretable deep neural networks. Concept bottleneck models (CBMs) have gained attention as an effective approach by leveraging human-understandable concepts to enhance interpretability. However, existing CBMs face challenges due to deterministic concept encoding and reliance on inconsistent concepts, leading to inaccuracies. We propose EQ-CBM, a novel framework that enhances CBMs through probabilistic concept encoding using energy-based models (EBMs) with quantized concept activation vectors (qCAVs). EQ-CBM effectively captures uncertainties, thereby improving prediction reliability and accuracy. By employing qCAVs, our method selects homogeneous vectors during concept encoding, enabling more decisive task performance and facilitating higher levels of human intervention. Empirical results using benchmark datasets demonstrate that our approach outperforms the state-of-the-art in both concept and task accuracy.

* Accepted by ACCV 2024

Via

Access Paper or Ask Questions

Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency

May 21, 2024

Hyeongjin Kim, Sangwon Kim, Dasom Ahn, Jong Taek Lee, Byoung Chul Ko

Abstract:Scene graph generation (SGG) is an important task in image understanding because it represents the relationships between objects in an image as a graph structure, making it possible to understand the semantic relationships between objects intuitively. Previous SGG studies used a message-passing neural networks (MPNN) to update features, which can effectively reflect information about surrounding objects. However, these studies have failed to reflect the co-occurrence of objects during SGG generation. In addition, they only addressed the long-tail problem of the training dataset from the perspectives of sampling and learning methods. To address these two problems, we propose CooK, which reflects the Co-occurrence Knowledge between objects, and the learnable term frequency-inverse document frequency (TF-l-IDF) to solve the long-tail problem. We applied the proposed model to the SGG benchmark dataset, and the results showed a performance improvement of up to 3.8% compared with existing state-of-the-art models in SGGen subtask. The proposed method exhibits generalization ability from the results obtained, showing uniform performance improvement for all MPNN models.

* Accepted by ICML2024

Via

Access Paper or Ask Questions

Semantic Scene Graph Generation Based on an Edge Dual Scene Graph and Message Passing Neural Network

Nov 02, 2023

Hyeongjin Kim, Sangwon Kim, Jong Taek Lee, Byoung Chul Ko

Abstract:Along with generative AI, interest in scene graph generation (SGG), which comprehensively captures the relationships and interactions between objects in an image and creates a structured graph-based representation, has significantly increased in recent years. However, relying on object-centric and dichotomous relationships, existing SGG methods have a limited ability to accurately predict detailed relationships. To solve these problems, a new approach to the modeling multiobject relationships, called edge dual scene graph generation (EdgeSGG), is proposed herein. EdgeSGG is based on a edge dual scene graph and Dual Message Passing Neural Network (DualMPNN), which can capture rich contextual interactions between unconstrained objects. To facilitate the learning of edge dual scene graphs with a symmetric graph structure, the proposed DualMPNN learns both object- and relation-centric features for more accurately predicting relation-aware contexts and allows fine-grained relational updates between objects. A comparative experiment with state-of-the-art (SoTA) methods was conducted using two public datasets for SGG operations and six metrics for three subtasks. Compared with SoTA approaches, the proposed model exhibited substantial performance improvements across all SGG subtasks. Furthermore, experiment on long-tail distributions revealed that incorporating the relationships between objects effectively mitigates existing long-tail problems.

Via

Access Paper or Ask Questions

Cross-Modal Learning with 3D Deformable Attention for Action Recognition

Dec 12, 2022

Sangwon Kim, Dasom Ahn, Byoung Chul Ko

Abstract:An important challenge in vision-based action recognition is the embedding of spatiotemporal features with two or more heterogeneous modalities into a single feature. In this study, we propose a new 3D deformable transformer for action recognition with adaptive spatiotemporal receptive fields and a cross-modal learning scheme. The 3D deformable transformer consists of three attention modules: 3D deformability, local joint stride, and temporal stride attention. The two cross-modal tokens are input into the 3D deformable attention module to create a cross-attention token with a reflected spatiotemporal correlation. Local joint stride attention is applied to spatially combine attention and pose tokens. Temporal stride attention temporally reduces the number of input tokens in the attention module and supports temporal expression learning without the simultaneous use of all tokens. The deformable transformer iterates L times and combines the last cross-modal token for classification. The proposed 3D deformable transformer was tested on the NTU60, NTU120, FineGYM, and Penn Action datasets, and showed results better than or similar to pre-trained state-of-the-art methods even without a pre-training process. In addition, by visualizing important joints and correlations during action recognition through spatial joint and temporal stride attention, the possibility of achieving an explainable potential for action recognition is presented.

* 10 pages, 8 figures

Via

Access Paper or Ask Questions

STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

Oct 14, 2022

Dasom Ahn, Sangwon Kim, Hyunsu Hong, Byoung Chul Ko

Figure 1 for STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

Figure 2 for STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

Figure 3 for STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

Figure 4 for STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition

Abstract:In action recognition, although the combination of spatio-temporal videos and skeleton features can improve the recognition performance, a separate model and balancing feature representation for cross-modal data are required. To solve these problems, we propose Spatio-TemporAl cRoss (STAR)-transformer, which can effectively represent two cross-modal features as a recognizable vector. First, from the input video and skeleton sequence, video frames are output as global grid tokens and skeletons are output as joint map tokens, respectively. These tokens are then aggregated into multi-class tokens and input into STAR-transformer. The STAR-transformer encoder layer consists of a full self-attention (FAttn) module and a proposed zigzag spatio-temporal attention (ZAttn) module. Similarly, the continuous decoder consists of a FAttn module and a proposed binary spatio-temporal attention (BAttn) module. STAR-transformer learns an efficient multi-feature representation of the spatio-temporal features by properly arranging pairings of the FAttn, ZAttn, and BAttn modules. Experimental results on the Penn-Action, NTU RGB+D 60, and 120 datasets show that the proposed method achieves a promising improvement in performance in comparison to previous state-of-the-art methods.

* Accepted by WACV 2023

Via

Access Paper or Ask Questions

Interpretation and Simplification of Deep Forest

Feb 18, 2020

Sangwon Kim, Mira Jeong, Byoung Chul Ko

Figure 1 for Interpretation and Simplification of Deep Forest

Figure 2 for Interpretation and Simplification of Deep Forest

Figure 3 for Interpretation and Simplification of Deep Forest

Figure 4 for Interpretation and Simplification of Deep Forest

Abstract:This paper proposes a new method for interpreting and simplifying a black box model of a deep random forest (RF) using a proposed rule elimination. In deep RF, a large number of decision trees are connected to multiple layers, thereby making an analysis difficult. It has a high performance similar to that of a deep neural network (DNN), but achieves a better generalizability. Therefore, in this study, we consider quantifying the feature contributions and frequency of the fully trained deep RF in the form of a decision rule set. The feature contributions provide a basis for determining how features affect the decision process in a rule set. Model simplification is achieved by eliminating unnecessary rules by measuring the feature contributions. Consequently, the simplified model has fewer parameters and rules than before. Experiment results have shown that a feature contribution analysis allows a black box model to be decomposed for quantitatively interpreting a rule set. The proposed method was successfully applied to various deep RF models and benchmark datasets while maintaining a robust performance despite the elimination of a large number of rules.

* There are fatal flaws in the algorithm and we want to withdraw it

Via

Access Paper or Ask Questions