Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rafael M. O. Cruz

Beyond Patches: Mining Interpretable Part-Prototypes for Explainable AI

Apr 16, 2025

Mahdi Alehdaghi, Rajarshi Bhattacharya, Pourya Shamsolmoali, Rafael M. O. Cruz, Maguelonne Heritier, Eric Granger

Abstract:Deep learning has provided considerable advancements for multimedia systems, yet the interpretability of deep models remains a challenge. State-of-the-art post-hoc explainability methods, such as GradCAM, provide visual interpretation based on heatmaps but lack conceptual clarity. Prototype-based approaches, like ProtoPNet and PIPNet, offer a more structured explanation but rely on fixed patches, limiting their robustness and semantic consistency. To address these limitations, a part-prototypical concept mining network (PCMNet) is proposed that dynamically learns interpretable prototypes from meaningful regions. PCMNet clusters prototypes into concept groups, creating semantically grounded explanations without requiring additional annotations. Through a joint process of unsupervised part discovery and concept activation vector extraction, PCMNet effectively captures discriminative concepts and makes interpretable classification decisions. Our extensive experiments comparing PCMNet against state-of-the-art methods on multiple datasets show that it can provide a high level of interpretability, stability, and robustness under clean and occluded scenarios.

Via

Access Paper or Ask Questions

Multi-view autoencoders for Fake News Detection

Apr 10, 2025

Ingryd V. S. T. Pereira, George D. C. Cavalcanti, Rafael M. O. Cruz

Abstract:Given the volume and speed at which fake news spreads across social media, automatic fake news detection has become a highly important task. However, this task presents several challenges, including extracting textual features that contain relevant information about fake news. Research about fake news detection shows that no single feature extraction technique consistently outperforms the others across all scenarios. Nevertheless, different feature extraction techniques can provide complementary information about the textual data and enable a more comprehensive representation of the content. This paper proposes using multi-view autoencoders to generate a joint feature representation for fake news detection by integrating several feature extraction techniques commonly used in the literature. Experiments on fake news datasets show a significant improvement in classification performance compared to individual views (feature representations). We also observed that selecting a subset of the views instead of composing a latent space with all the views can be advantageous in terms of accuracy and computational effort. For further details, including source codes, figures, and datasets, please refer to the project's repository: https://github.com/ingrydpereira/multiview-fake-news.

* Accepted by IEEE Symposium Series on Computational Intelligence - IEEE SSCI 2025

Via

Access Paper or Ask Questions

SEREP: Semantic Facial Expression Representation for Robust In-the-Wild Capture and Retargeting

Dec 18, 2024

Arthur Josi, Luiz Gustavo Hafemann, Abdallah Dib, Emeline Got, Rafael M. O. Cruz, Marc-Andre Carbonneau

Abstract:Monocular facial performance capture in-the-wild is challenging due to varied capture conditions, face shapes, and expressions. Most current methods rely on linear 3D Morphable Models, which represent facial expressions independently of identity at the vertex displacement level. We propose SEREP (Semantic Expression Representation), a model that disentangles expression from identity at the semantic level. It first learns an expression representation from unpaired 3D facial expressions using a cycle consistency loss. Then we train a model to predict expression from monocular images using a novel semi-supervised scheme that relies on domain adaptation. In addition, we introduce MultiREX, a benchmark addressing the lack of evaluation resources for the expression capture task. Our experiments show that SEREP outperforms state-of-the-art methods, capturing challenging expressions and transferring them to novel identities.

Via

Access Paper or Ask Questions

Image Retrieval Methods in the Dissimilarity Space

Dec 11, 2024

Madhu Kiran, Kartikey Vishnu, Rafael M. O. Cruz, Eric Granger

Abstract:Image retrieval methods rely on metric learning to train backbone feature extraction models that can extract discriminant queries and reference (gallery) feature representations for similarity matching. Although state-of-the-art accuracy has improved considerably with the advent of deep learning (DL) models trained on large datasets, image retrieval remains challenging in many real-world video analytics and surveillance applications, e.g., person re-identification. Using the Euclidean space for matching limits the performance in real-world applications due to the curse of dimensionality, overfitting, and sensitivity to noisy data. We argue that the feature dissimilarity space is more suitable for similarity matching, and propose a dichotomy transformation to project query and reference embeddings into a single embedding in the dissimilarity space. We also advocate for end-to-end training of a backbone and binary classification models for pair-wise matching. As opposed to comparing the distance between queries and reference embeddings, we show the benefits of classifying the single dissimilarity space embedding (as similar or dissimilar), especially when trained end-to-end. We propose a method to train the max-margin classifier together with the backbone feature extractor by applying constraints to the L2 norm of the classifier weights along with the hinge loss. Our extensive experiments on challenging image retrieval datasets and using diverse feature extraction backbones highlight the benefits of similarity matching in the dissimilarity space. In particular, when jointly training the feature extraction backbone and regularised classifier for matching, the dissimilarity space provides a higher level of accuracy.

* 7 pages

Via

Access Paper or Ask Questions

MLRS-PDS: A Meta-learning recommendation of dynamic ensemble selection pipelines

Jul 10, 2024

Hesam Jalalian, Rafael M. O. Cruz

Abstract:Dynamic Selection (DS), where base classifiers are chosen from a classifier's pool for each new instance at test time, has shown to be highly effective in pattern recognition. However, instability and redundancy in the classifier pools can impede computational efficiency and accuracy in dynamic ensemble selection. This paper introduces a meta-learning recommendation system (MLRS) to recommend the optimal pool generation scheme for DES methods tailored to individual datasets. The system employs a meta-model built from dataset meta-features to predict the most suitable pool generation scheme and DES method for a given dataset. Through an extensive experimental study encompassing 288 datasets, we demonstrate that this meta-learning recommendation system outperforms traditional fixed pool or DES method selection strategies, highlighting the efficacy of a meta-learning approach in refining DES method selection. The source code, datasets, and supplementary results can be found in this project's GitHub repository: https://github.com/Menelau/MLRS-PDS.

* Paper published at the International Joint Conference on Neural Networks

Via

Access Paper or Ask Questions

Bidirectional Multi-Step Domain Generalization for Visible-Infrared Person Re-Identification

Mar 16, 2024

Mahdi Alehdaghi, Pourya Shamsolmoali, Rafael M. O. Cruz, Eric Granger

Figure 1 for Bidirectional Multi-Step Domain Generalization for Visible-Infrared Person Re-Identification

Figure 2 for Bidirectional Multi-Step Domain Generalization for Visible-Infrared Person Re-Identification

Figure 3 for Bidirectional Multi-Step Domain Generalization for Visible-Infrared Person Re-Identification

Figure 4 for Bidirectional Multi-Step Domain Generalization for Visible-Infrared Person Re-Identification

Abstract:A key challenge in visible-infrared person re-identification (V-I ReID) is training a backbone model capable of effectively addressing the significant discrepancies across modalities. State-of-the-art methods that generate a single intermediate bridging domain are often less effective, as this generated domain may not adequately capture sufficient common discriminant information. This paper introduces the Bidirectional Multi-step Domain Generalization (BMDG), a novel approach for unifying feature representations across diverse modalities. BMDG creates multiple virtual intermediate domains by finding and aligning body part features extracted from both I and V modalities. Indeed, BMDG aims to reduce the modality gaps in two steps. First, it aligns modalities in feature space by learning shared and modality-invariant body part prototypes from V and I images. Then, it generalizes the feature representation by applying bidirectional multi-step learning, which progressively refines feature representations in each step and incorporates more prototypes from both modalities. In particular, our method minimizes the cross-modal gap by identifying and aligning shared prototypes that capture key discriminative features across modalities, then uses multiple bridging steps based on this information to enhance the feature representation. Experiments conducted on challenging V-I ReID datasets indicate that our BMDG approach outperforms state-of-the-art part-based models or methods that generate an intermediate domain from V-I person ReID.

Via

Access Paper or Ask Questions

MoSAR: Monocular Semi-Supervised Model for Avatar Reconstruction using Differentiable Shading

Dec 22, 2023

Abdallah Dib, Luiz Gustavo Hafemann, Emeline Got, Trevor Anderson, Amin Fadaeinejad, Rafael M. O. Cruz, Marc-Andre Carbonneau

Abstract:Reconstructing an avatar from a portrait image has many applications in multimedia, but remains a challenging research problem. Extracting reflectance maps and geometry from one image is ill-posed: recovering geometry is a one-to-many mapping problem and reflectance and light are difficult to disentangle. Accurate geometry and reflectance can be captured under the controlled conditions of a light stage, but it is costly to acquire large datasets in this fashion. Moreover, training solely with this type of data leads to poor generalization with in-the-wild images. This motivates the introduction of MoSAR, a method for 3D avatar generation from monocular images. We propose a semi-supervised training scheme that improves generalization by learning from both light stage and in-the-wild datasets. This is achieved using a novel differentiable shading formulation. We show that our approach effectively disentangles the intrinsic face parameters, producing relightable avatars. As a result, MoSAR estimates a richer set of skin reflectance maps, and generates more realistic avatars than existing state-of-the-art methods. We also introduce a new dataset, named FFHQ-UV-Intrinsics, the first public dataset providing intrinsic face attributes at scale (diffuse, specular, ambient occlusion and translucency maps) for a total of 10k subjects. The project website and the dataset are available on the following link: https://ubisoft-laforge.github.io/character/mosar/

* https://ubisoft-laforge.github.io/character/mosar/

Via

Access Paper or Ask Questions

A post-selection algorithm for improving dynamic ensemble selection methods

Sep 26, 2023

Paulo R. G. Cordeiro, George D. C. Cavalcanti, Rafael M. O. Cruz

Figure 1 for A post-selection algorithm for improving dynamic ensemble selection methods

Figure 2 for A post-selection algorithm for improving dynamic ensemble selection methods

Figure 3 for A post-selection algorithm for improving dynamic ensemble selection methods

Figure 4 for A post-selection algorithm for improving dynamic ensemble selection methods

Abstract:Dynamic Ensemble Selection (DES) is a Multiple Classifier Systems (MCS) approach that aims to select an ensemble for each query sample during the selection phase. Even with the proposal of several DES approaches, no particular DES technique is the best choice for different problems. Thus, we hypothesize that selecting the best DES approach per query instance can lead to better accuracy. To evaluate this idea, we introduce the Post-Selection Dynamic Ensemble Selection (PS-DES) approach, a post-selection scheme that evaluates ensembles selected by several DES techniques using different metrics. Experimental results show that using accuracy as a metric to select the ensembles, PS-DES performs better than individual DES techniques. PS-DES source code is available in a GitHub repository

* 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

Via

Access Paper or Ask Questions

Adaptive Generation of Privileged Intermediate Information for Visible-Infrared Person Re-Identification

Jul 06, 2023

Mahdi Alehdaghi, Arthur Josi, Pourya Shamsolmoali, Rafael M. O. Cruz, Eric Granger

Figure 1 for Adaptive Generation of Privileged Intermediate Information for Visible-Infrared Person Re-Identification

Figure 2 for Adaptive Generation of Privileged Intermediate Information for Visible-Infrared Person Re-Identification

Figure 3 for Adaptive Generation of Privileged Intermediate Information for Visible-Infrared Person Re-Identification

Figure 4 for Adaptive Generation of Privileged Intermediate Information for Visible-Infrared Person Re-Identification

Abstract:Visible-infrared person re-identification seeks to retrieve images of the same individual captured over a distributed network of RGB and IR sensors. Several V-I ReID approaches directly integrate both V and I modalities to discriminate persons within a shared representation space. However, given the significant gap in data distributions between V and I modalities, cross-modal V-I ReID remains challenging. Some recent approaches improve generalization by leveraging intermediate spaces that can bridge V and I modalities, yet effective methods are required to select or generate data for such informative domains. In this paper, the Adaptive Generation of Privileged Intermediate Information training approach is introduced to adapt and generate a virtual domain that bridges discriminant information between the V and I modalities. The key motivation behind AGPI^2 is to enhance the training of a deep V-I ReID backbone by generating privileged images that provide additional information. These privileged images capture shared discriminative features that are not easily accessible within the original V or I modalities alone. Towards this goal, a non-linear generative module is trained with an adversarial objective, translating V images into intermediate spaces with a smaller domain shift w.r.t. the I domain. Meanwhile, the embedding module within AGPI^2 aims to produce similar features for both V and generated images, encouraging the extraction of features that are common to all modalities. In addition to these contributions, AGPI^2 employs adversarial objectives for adapting the intermediate images, which play a crucial role in creating a non-modality-specific space to address the large domain shifts between V and I domains. Experimental results conducted on challenging V-I ReID datasets indicate that AGPI^2 increases matching accuracy without extra computational resources during inference.

Via

Access Paper or Ask Questions

Distance Functions and Normalization Under Stream Scenarios

Jul 04, 2023

Eduardo V. L. Barboza, Paulo R. Lisboa de Almeida, Alceu de Souza Britto Jr, Rafael M. O. Cruz

Figure 1 for Distance Functions and Normalization Under Stream Scenarios

Figure 2 for Distance Functions and Normalization Under Stream Scenarios

Figure 3 for Distance Functions and Normalization Under Stream Scenarios

Figure 4 for Distance Functions and Normalization Under Stream Scenarios

Abstract:Data normalization is an essential task when modeling a classification system. When dealing with data streams, data normalization becomes especially challenging since we may not know in advance the properties of the features, such as their minimum/maximum values, and these properties may change over time. We compare the accuracies generated by eight well-known distance functions in data streams without normalization, normalized considering the statistics of the first batch of data received, and considering the previous batch received. We argue that experimental protocols for streams that consider the full stream as normalized are unrealistic and can lead to biased and poor results. Our results indicate that using the original data stream without applying normalization, and the Canberra distance, can be a good combination when no information about the data stream is known beforehand.

* Paper accepted to the 2023 International Joint Conference on Neural Networks

Via

Access Paper or Ask Questions