Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yugyung Lee

Multi-Modal Interpretability for Enhanced Localization in Vision-Language Models

Sep 17, 2025

Muhammad Imran, Yugyung Lee

Abstract:Recent advances in vision-language models have significantly expanded the frontiers of automated image analysis. However, applying these models in safety-critical contexts remains challenging due to the complex relationships between objects, subtle visual cues, and the heightened demand for transparency and reliability. This paper presents the Multi-Modal Explainable Learning (MMEL) framework, designed to enhance the interpretability of vision-language models while maintaining high performance. Building upon prior work in gradient-based explanations for transformer architectures (Grad-eclip), MMEL introduces a novel Hierarchical Semantic Relationship Module that enhances model interpretability through multi-scale feature processing, adaptive attention weighting, and cross-modal alignment. Our approach processes features at multiple semantic levels to capture relationships between image regions at different granularities, applying learnable layer-specific weights to balance contributions across the model's depth. This results in more comprehensive visual explanations that highlight both primary objects and their contextual relationships with improved precision. Through extensive experiments on standard datasets, we demonstrate that by incorporating semantic relationship information into gradient-based attribution maps, MMEL produces more focused and contextually aware visualizations that better reflect how vision-language models process complex scenes. The MMEL framework generalizes across various domains, offering valuable insights into model decisions for applications requiring high interpretability and reliability.

* Non-Archival track - The First Workshop on Multimodal Knowledge and Language Modeling IJCAI 2025 Workshop, August 16, 2025 IJCAI 2025 Workshop, August 16, 2025 Room 516B, Palais des congr\`es, Montreal, Canada
* 8 pages, 6 figures, 3 tables

Via

Access Paper or Ask Questions

OCU-Net: A Novel U-Net Architecture for Enhanced Oral Cancer Segmentation

Oct 03, 2023

Ahmed Albishri, Syed Jawad Hussain Shah, Yugyung Lee, Rong Wang

Figure 1 for OCU-Net: A Novel U-Net Architecture for Enhanced Oral Cancer Segmentation

Figure 2 for OCU-Net: A Novel U-Net Architecture for Enhanced Oral Cancer Segmentation

Figure 3 for OCU-Net: A Novel U-Net Architecture for Enhanced Oral Cancer Segmentation

Figure 4 for OCU-Net: A Novel U-Net Architecture for Enhanced Oral Cancer Segmentation

Abstract:Accurate detection of oral cancer is crucial for improving patient outcomes. However, the field faces two key challenges: the scarcity of deep learning-based image segmentation research specifically targeting oral cancer and the lack of annotated data. Our study proposes OCU-Net, a pioneering U-Net image segmentation architecture exclusively designed to detect oral cancer in hematoxylin and eosin (H&E) stained image datasets. OCU-Net incorporates advanced deep learning modules, such as the Channel and Spatial Attention Fusion (CSAF) module, a novel and innovative feature that emphasizes important channel and spatial areas in H&E images while exploring contextual information. In addition, OCU-Net integrates other innovative components such as Squeeze-and-Excite (SE) attention module, Atrous Spatial Pyramid Pooling (ASPP) module, residual blocks, and multi-scale fusion. The incorporation of these modules showed superior performance for oral cancer segmentation for two datasets used in this research. Furthermore, we effectively utilized the efficient ImageNet pre-trained MobileNet-V2 model as a backbone of our OCU-Net to create OCU-Netm, an enhanced version achieving state-of-the-art results. Comprehensive evaluation demonstrates that OCU-Net and OCU-Netm outperformed existing segmentation methods, highlighting their precision in identifying cancer cells in H&E images from OCDC and ORCA datasets.

Via

Access Paper or Ask Questions

EARLIN: Early Out-of-Distribution Detection for Resource-efficient Collaborative Inference

Jun 29, 2021

Sumaiya Tabassum Nimi, Md Adnan Arefeen, Md Yusuf Sarwar Uddin, Yugyung Lee

Figure 1 for EARLIN: Early Out-of-Distribution Detection for Resource-efficient Collaborative Inference

Figure 2 for EARLIN: Early Out-of-Distribution Detection for Resource-efficient Collaborative Inference

Figure 3 for EARLIN: Early Out-of-Distribution Detection for Resource-efficient Collaborative Inference

Figure 4 for EARLIN: Early Out-of-Distribution Detection for Resource-efficient Collaborative Inference

Abstract:Collaborative inference enables resource-constrained edge devices to make inferences by uploading inputs (e.g., images) to a server (i.e., cloud) where the heavy deep learning models run. While this setup works cost-effectively for successful inferences, it severely underperforms when the model faces input samples on which the model was not trained (known as Out-of-Distribution (OOD) samples). If the edge devices could, at least, detect that an input sample is an OOD, that could potentially save communication and computation resources by not uploading those inputs to the server for inference workload. In this paper, we propose a novel lightweight OOD detection approach that mines important features from the shallow layers of a pretrained CNN model and detects an input sample as ID (In-Distribution) or OOD based on a distance function defined on the reduced feature space. Our technique (a) works on pretrained models without any retraining of those models, and (b) does not expose itself to any OOD dataset (all detection parameters are obtained from the ID training dataset). To this end, we develop EARLIN (EARLy OOD detection for Collaborative INference) that takes a pretrained model and partitions the model at the OOD detection layer and deploys the considerably small OOD part on an edge device and the rest on the cloud. By experimenting using real datasets and a prototype implementation, we show that our technique achieves better results than other approaches in terms of overall accuracy and cost when tested against popular OOD datasets on top of popular deep learning models pretrained on benchmark datasets.

* To Appear in the proceedings of ECML-PKDD'2021

Via

Access Paper or Ask Questions

Link Prediction for Temporally Consistent Networks

Jun 06, 2020

Mohamoud Ali, Yugyung Lee, Praveen Rao

Figure 1 for Link Prediction for Temporally Consistent Networks

Figure 2 for Link Prediction for Temporally Consistent Networks

Figure 3 for Link Prediction for Temporally Consistent Networks

Figure 4 for Link Prediction for Temporally Consistent Networks

Abstract:Dynamic networks have intrinsic structural, computational, and multidisciplinary advantages. Link prediction estimates the next relationship in dynamic networks. However, in the current link prediction approaches, only bipartite or non-bipartite but homogeneous networks are considered. The use of adjacency matrix to represent dynamically evolving networks limits the ability to analytically learn from heterogeneous, sparse, or forming networks. In the case of a heterogeneous network, modeling all network states using a binary-valued matrix can be difficult. On the other hand, sparse or currently forming networks have many missing edges, which are represented as zeros, thus introducing class imbalance or noise. We propose a time-parameterized matrix (TP-matrix) and empirically demonstrate its effectiveness in non-bipartite, heterogeneous networks. In addition, we propose a predictive influence index as a measure of a node's boosting or diminishing predictive influence using backward and forward-looking maximization over the temporal space of the n-degree neighborhood. We further propose a new method of canonically representing heterogeneous time-evolving activities as a temporally parameterized network model (TPNM). The new method robustly enables activities to be represented as a form of a network, thus potentially inspiring new link prediction applications, including intelligent business process management systems and context-aware workflow engines. We evaluated our model on four datasets of different network systems. We present results that show the proposed model is more effective in capturing and retaining temporal relationships in dynamically evolving networks. We also show that our model performed better than state-of-the-art link prediction benchmark results for networks that are sensitive to temporal evolution.

Via

Access Paper or Ask Questions

SCAT: Second Chance Autoencoder for Textual Data

May 11, 2020

Somaieh Goudarzvand, Gharib Gharibi, Yugyung Lee

Figure 1 for SCAT: Second Chance Autoencoder for Textual Data

Figure 2 for SCAT: Second Chance Autoencoder for Textual Data

Figure 3 for SCAT: Second Chance Autoencoder for Textual Data

Abstract:We present a k-competitive learning approach for textual autoencoders named Second Chance Autoencoder (SCAT). SCAT selects the $k$ largest and smallest positive activations as the winner neurons, which gain the activation values of the loser neurons during the learning process, and thus focus on retrieving well-representative features for topics. Our experiments show that SCAT achieves outstanding performance in classification, topic modeling, and document visualization compared to LDA, K-Sparse, NVCTM, and KATE.

Via

Access Paper or Ask Questions

CRL: Class Representative Learning for Image Classification

Feb 16, 2020

Mayanka Chandrashekar, Yugyung Lee

Figure 1 for CRL: Class Representative Learning for Image Classification

Figure 2 for CRL: Class Representative Learning for Image Classification

Figure 3 for CRL: Class Representative Learning for Image Classification

Figure 4 for CRL: Class Representative Learning for Image Classification

Abstract:Building robust and real-time classifiers with diverse datasets are one of the most significant challenges to deep learning researchers. It is because there is a considerable gap between a model built with training (seen) data and real (unseen) data in applications. Recent works including Zero-Shot Learning (ZSL), have attempted to deal with this problem of overcoming the apparent gap through transfer learning. In this paper, we propose a novel model, called Class Representative Learning Model (CRL), that can be especially effective in image classification influenced by ZSL. In the CRL model, first, the learning step is to build class representatives to represent classes in datasets by aggregating prominent features extracted from a Convolutional Neural Network (CNN). Second, the inferencing step in CRL is to match between the class representatives and new data. The proposed CRL model demonstrated superior performance compared to the current state-of-the-art research in ZSL and mobile deep learning. The proposed CRL model has been implemented and evaluated in a parallel environment, using Apache Spark, for both distributed learning and recognition. An extensive experimental study on the benchmark datasets, ImageNet-1K, CalTech-101, CalTech-256, CIFAR-100, shows that CRL can build a class distribution model with drastic improvement in learning and recognition performance without sacrificing accuracy compared to the state-of-the-art performances in image classification.

* 15 pages, Table 8, Figure 6

Via

Access Paper or Ask Questions

Automated Human Claustrum Segmentation using Deep Learning Technologies

Nov 18, 2019

Ahmed Awad Albishri, Syed Jawad Hussain Shah, Anthony Schmiedler, Seung Suk Kang, Yugyung Lee

Figure 1 for Automated Human Claustrum Segmentation using Deep Learning Technologies

Figure 2 for Automated Human Claustrum Segmentation using Deep Learning Technologies

Figure 3 for Automated Human Claustrum Segmentation using Deep Learning Technologies

Figure 4 for Automated Human Claustrum Segmentation using Deep Learning Technologies

Abstract:In recent years, Deep Learning (DL) has shown promising results in conducting AI tasks such as computer vision and image segmentation. Specifically, Convolutional Neural Network (CNN) models in DL have been applied to prevention,detection, and diagnosis in predictive medicine. Image segmentation plays a significant role in disease detection and prevention.However, there are enormous challenges in performing DL-based automatic segmentation due to the nature of medical images such as heterogeneous modalities and formats, insufficient labeled training data, and the high-class imbalance in the labeled data. Furthermore, automating segmentation of medical images,like magnetic resonance images (MRI), becomes a challenging task. The need for automated segmentation or annotation is what motivates our work. In this paper, we propose a fully automated approach that aims to segment the human claustrum for analytical purposes. We applied a U-Net CNN model to segment the claustrum (Cl) from a MRI dataset. With this approach, we have achieved an average Dice per case score of 0.72 for Cl segmentation, with K=5 for cross-validation. The expert in the medical domain also evaluates these results.

* 6 pages, 4 figures

Via

Access Paper or Ask Questions

MedTQ: Dynamic Topic Discovery and Query Generation for Medical Ontologies

Feb 12, 2018

Feichen Shen, Yugyung Lee

Figure 1 for MedTQ: Dynamic Topic Discovery and Query Generation for Medical Ontologies

Figure 2 for MedTQ: Dynamic Topic Discovery and Query Generation for Medical Ontologies

Figure 3 for MedTQ: Dynamic Topic Discovery and Query Generation for Medical Ontologies

Abstract:Biomedical ontology refers to a shared conceptualization for a biomedical domain of interest that has vastly improved data management and data sharing through the open data movement. The rapid growth and availability of biomedical data make it impractical and computationally expensive to perform manual analysis and query processing with the large scale ontologies. The lack of ability in analyzing ontologies from such a variety of sources, and supporting knowledge discovery for clinical practice and biomedical research should be overcome with new technologies. In this study, we developed a Medical Topic discovery and Query generation framework (MedTQ), which was composed by a series of approaches and algorithms. A predicate neighborhood pattern-based approach introduced has the ability to compute the similarity of predicates (relations) in ontologies. Given a predicate similarity metric, machine learning algorithms have been developed for automatic topic discovery and query generation. The topic discovery algorithm, called the hierarchical K-Means algorithm was designed by extending an existing supervised algorithm (K-means clustering) for the construction of a topic hierarchy. In the hierarchical K-Means algorithm, a level-by-level optimization strategy was selected for consistent with the strongly association between elements within a topic. Automatic query generation was facilitated for discovered topic that could be guided users for interactive query design and processing. Evaluation was conducted to generate topic hierarchy for DrugBank ontology as a case study. Results demonstrated that the MedTQ framework can enhance knowledge discovery by capturing underlying structures from domain specific data and ontologies.

Via

Access Paper or Ask Questions