Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nicolas Pugeault

Generative Fields: Uncovering Hierarchical Feature Control for StyleGAN via Inverted Receptive Fields

Apr 24, 2025

Zhuo He, Paul Henderson, Nicolas Pugeault

Abstract:StyleGAN has demonstrated the ability of GANs to synthesize highly-realistic faces of imaginary people from random noise. One limitation of GAN-based image generation is the difficulty of controlling the features of the generated image, due to the strong entanglement of the low-dimensional latent space. Previous work that aimed to control StyleGAN with image or text prompts modulated sampling in W latent space, which is more expressive than Z latent space. However, W space still has restricted expressivity since it does not control the feature synthesis directly; also the feature embedding in W space requires a pre-training process to reconstruct the style signal, limiting its application. This paper introduces the concept of "generative fields" to explain the hierarchical feature synthesis in StyleGAN, inspired by the receptive fields of convolution neural networks (CNNs). Additionally, we propose a new image editing pipeline for StyleGAN using generative field theory and the channel-wise style latent space S, utilizing the intrinsic structural feature of CNNs to achieve disentangled control of feature synthesis at synthesis time.

Via

Access Paper or Ask Questions

Beyond Reconstruction: A Physics Based Neural Deferred Shader for Photo-realistic Rendering

Apr 16, 2025

Zhuo He, Paul Henderson, Nicolas Pugeault

Abstract:Deep learning based rendering has demonstrated major improvements for photo-realistic image synthesis, applicable to various applications including visual effects in movies and photo-realistic scene building in video games. However, a significant limitation is the difficulty of decomposing the illumination and material parameters, which limits such methods to reconstruct an input scene, without any possibility to control these parameters. This paper introduces a novel physics based neural deferred shading pipeline to decompose the data-driven rendering process, learn a generalizable shading function to produce photo-realistic results for shading and relighting tasks, we also provide a shadow estimator to efficiently mimic shadowing effect. Our model achieves improved performance compared to classical models and a state-of-art neural shading model, and enables generalizable photo-realistic shading from arbitrary illumination input.

Via

Access Paper or Ask Questions

On the Benefits of Instance Decomposition in Video Prediction Models

Jan 17, 2025

Eliyas Suleyman, Paul Henderson, Nicolas Pugeault

Abstract:Video prediction is a crucial task for intelligent agents such as robots and autonomous vehicles, since it enables them to anticipate and act early on time-critical incidents. State-of-the-art video prediction methods typically model the dynamics of a scene jointly and implicitly, without any explicit decomposition into separate objects. This is challenging and potentially sub-optimal, as every object in a dynamic scene has their own pattern of movement, typically somewhat independent of others. In this paper, we investigate the benefit of explicitly modeling the objects in a dynamic scene separately within the context of latent-transformer video prediction models. We conduct detailed and carefully-controlled experiments on both synthetic and real-world datasets; our results show that decomposing a dynamic scene leads to higher quality predictions compared with models of a similar capacity that lack such decomposition.

Via

Access Paper or Ask Questions

Robust Federated Learning in the Face of Covariate Shift: A Magnitude Pruning with Hybrid Regularization Framework for Enhanced Model Aggregation

Dec 19, 2024

Ozgu Goksu, Nicolas Pugeault

Figure 1 for Robust Federated Learning in the Face of Covariate Shift: A Magnitude Pruning with Hybrid Regularization Framework for Enhanced Model Aggregation

Figure 2 for Robust Federated Learning in the Face of Covariate Shift: A Magnitude Pruning with Hybrid Regularization Framework for Enhanced Model Aggregation

Figure 3 for Robust Federated Learning in the Face of Covariate Shift: A Magnitude Pruning with Hybrid Regularization Framework for Enhanced Model Aggregation

Figure 4 for Robust Federated Learning in the Face of Covariate Shift: A Magnitude Pruning with Hybrid Regularization Framework for Enhanced Model Aggregation

Abstract:The development of highly sophisticated neural networks has allowed for fast progress in every field of computer vision, however, applications where annotated data is prohibited due to privacy or security concerns remain challenging. Federated Learning (FL) offers a promising framework for individuals aiming to collaboratively develop a shared model while preserving data privacy. Nevertheless, our findings reveal that variations in data distribution among clients can profoundly affect FL methodologies, primarily due to instabilities in the aggregation process. We also propose a novel FL framework to mitigate the adverse effects of covariate shifts among federated clients by combining individual parameter pruning and regularization techniques to improve the robustness of individual clients' models to aggregate. Each client's model is optimized through magnitude-based pruning and the addition of dropout and noise injection layers to build more resilient decision pathways in the networks and improve the robustness of the model's parameter aggregation step. The proposed framework is capable of extracting robust representations even in the presence of very large covariate shifts among client data distributions and in the federation of a small number of clients. Empirical findings substantiate the effectiveness of our proposed methodology across common benchmark datasets, including CIFAR10, MNIST, SVHN, and Fashion MNIST. Furthermore, we introduce the CelebA-Gender dataset, specifically designed to evaluate performance on a more realistic domain. The proposed method is capable of extracting robust representations even in the presence of both high and low covariate shifts among client data distributions.

Via

Access Paper or Ask Questions

Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach

Oct 14, 2024

Rory Young, Nicolas Pugeault

Abstract:Deep reinforcement learning agents achieve state-of-the-art performance in a wide range of simulated control tasks. However, successful applications to real-world problems remain limited. One reason for this dichotomy is because the learned policies are not robust to observation noise or adversarial attacks. In this paper, we investigate the robustness of deep RL policies to a single small state perturbation in deterministic continuous control tasks. We demonstrate that RL policies can be deterministically chaotic as small perturbations to the system state have a large impact on subsequent state and reward trajectories. This unstable non-linear behaviour has two consequences: First, inaccuracies in sensor readings, or adversarial attacks, can cause significant performance degradation; Second, even policies that show robust performance in terms of rewards may have unpredictable behaviour in practice. These two facets of chaos in RL policies drastically restrict the application of deep RL to real-world problems. To address this issue, we propose an improvement on the successful Dreamer V3 architecture, implementing a Maximal Lyapunov Exponent regularisation. This new approach reduces the chaotic state dynamics, rendering the learnt policies more resilient to sensor noise or adversarial attacks and thereby improving the suitability of Deep Reinforcement Learning for real-world applications.

Via

Access Paper or Ask Questions

Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition

May 26, 2024

Tong Shi, Xuri Ge, Joemon M. Jose, Nicolas Pugeault, Paul Henderson

Figure 1 for Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition

Figure 2 for Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition

Figure 3 for Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition

Figure 4 for Detail-Enhanced Intra- and Inter-modal Interaction for Audio-Visual Emotion Recognition

Abstract:Capturing complex temporal relationships between video and audio modalities is vital for Audio-Visual Emotion Recognition (AVER). However, existing methods lack attention to local details, such as facial state changes between video frames, which can reduce the discriminability of features and thus lower recognition accuracy. In this paper, we propose a Detail-Enhanced Intra- and Inter-modal Interaction network(DE-III) for AVER, incorporating several novel aspects. We introduce optical flow information to enrich video representations with texture details that better capture facial state changes. A fusion module integrates the optical flow estimation with the corresponding video frames to enhance the representation of facial texture variations. We also design attentive intra- and inter-modal feature enhancement modules to further improve the richness and discriminability of video and audio representations. A detailed quantitative evaluation shows that our proposed model outperforms all existing methods on three benchmark datasets for both concrete and continuous emotion recognition. To encourage further research and ensure replicability, we will release our full code upon acceptance.

* Submitted to 27th International Conference of Pattern Recognition (ICPR 2024)

Via

Access Paper or Ask Questions

The Bad Batches: Enhancing Self-Supervised Learning in Image Classification Through Representative Batch Curation

Mar 28, 2024

Ozgu Goksu, Nicolas Pugeault

Abstract:The pursuit of learning robust representations without human supervision is a longstanding challenge. The recent advancements in self-supervised contrastive learning approaches have demonstrated high performance across various representation learning challenges. However, current methods depend on the random transformation of training examples, resulting in some cases of unrepresentative positive pairs that can have a large impact on learning. This limitation not only impedes the convergence of the learning process but the robustness of the learnt representation as well as requiring larger batch sizes to improve robustness to such bad batches. This paper attempts to alleviate the influence of false positive and false negative pairs by employing pairwise similarity calculations through the Fr\'echet ResNet Distance (FRD), thereby obtaining robust representations from unlabelled data. The effectiveness of the proposed method is substantiated by empirical results, where a linear classifier trained on self-supervised contrastive representations achieved an impressive 87.74\% top-1 accuracy on STL10 and 99.31\% on the Flower102 dataset. These results emphasize the potential of the proposed approach in pushing the boundaries of the state-of-the-art in self-supervised contrastive learning, particularly for image classification tasks.

* 8 Pages, 4 figures, IEEE WCCI 2024 Conference

Via

Access Paper or Ask Questions

The role of noise in denoising models for anomaly detection in medical images

Jan 19, 2023

Antanas Kascenas, Pedro Sanchez, Patrick Schrempf, Chaoyang Wang, William Clackett, Shadia S. Mikhael, Jeremy P. Voisey, Keith Goatman, Alexander Weir, Nicolas Pugeault(+2 more)

Figure 1 for The role of noise in denoising models for anomaly detection in medical images

Figure 2 for The role of noise in denoising models for anomaly detection in medical images

Figure 3 for The role of noise in denoising models for anomaly detection in medical images

Figure 4 for The role of noise in denoising models for anomaly detection in medical images

Abstract:Pathological brain lesions exhibit diverse appearance in brain images, in terms of intensity, texture, shape, size, and location. Comprehensive sets of data and annotations are difficult to acquire. Therefore, unsupervised anomaly detection approaches have been proposed using only normal data for training, with the aim of detecting outlier anomalous voxels at test time. Denoising methods, for instance classical denoising autoencoders (DAEs) and more recently emerging diffusion models, are a promising approach, however naive application of pixelwise noise leads to poor anomaly detection performance. We show that optimization of the spatial resolution and magnitude of the noise improves the performance of different model training regimes, with similar noise parameter adjustments giving good performance for both DAEs and diffusion models. Visual inspection of the reconstructions suggests that the training noise influences the trade-off between the extent of the detail that is reconstructed and the extent of erasure of anomalies, both of which contribute to better anomaly detection performance. We validate our findings on two real-world datasets (tumor detection in brain MRI and hemorrhage/ischemia/tumor detection in brain CT), showing good detection on diverse anomaly appearances. Overall, we find that a DAE trained with coarse noise is a fast and simple method that gives state-of-the-art accuracy. Diffusion models applied to anomaly detection are as yet in their infancy and provide a promising avenue for further research.

* Submitted to Medical Image Analysis special issue for MIDL 2022

Via

Access Paper or Ask Questions

A deep mixture density network for outlier-corrected interpolation of crowd-sourced weather data

Jan 25, 2022

Charlie Kirkwood, Theo Economou, Henry Odbert, Nicolas Pugeault

Figure 1 for A deep mixture density network for outlier-corrected interpolation of crowd-sourced weather data

Figure 2 for A deep mixture density network for outlier-corrected interpolation of crowd-sourced weather data

Figure 3 for A deep mixture density network for outlier-corrected interpolation of crowd-sourced weather data

Figure 4 for A deep mixture density network for outlier-corrected interpolation of crowd-sourced weather data

Abstract:As the costs of sensors and associated IT infrastructure decreases - as exemplified by the Internet of Things - increasing volumes of observational data are becoming available for use by environmental scientists. However, as the number of available observation sites increases, so too does the opportunity for data quality issues to emerge, particularly given that many of these sensors do not have the benefit of official maintenance teams. To realise the value of crowd sourced 'Internet of Things' type observations for environmental modelling, we require approaches that can automate the detection of outliers during the data modelling process so that they do not contaminate the true distribution of the phenomena of interest. To this end, here we present a Bayesian deep learning approach for spatio-temporal modelling of environmental variables with automatic outlier detection. Our approach implements a Gaussian-uniform mixture density network whose dual purposes - modelling the phenomenon of interest, and learning to classify and ignore outliers - are achieved simultaneously, each by specifically designed branches of our neural network. For our example application, we use the Met Office's Weather Observation Website data, an archive of observations from around 1900 privately run and unofficial weather stations across the British Isles. Using data on surface air temperature, we demonstrate how our deep mixture model approach enables the modelling of a highly skilled spatio-temporal temperature distribution without contamination from spurious observations. We hope that adoption of our approach will help unlock the potential of incorporating a wider range of observation sources, including from crowd sourcing, into future environmental models.

* 20 pages, 12 figures, not yet submitted

Via

Access Paper or Ask Questions

Transformer-Encoder Detector Module: Using Context to Improve Robustness to Adversarial Attacks on Object Detection

Nov 13, 2020

Faisal Alamri, Sinan Kalkan, Nicolas Pugeault

Figure 1 for Transformer-Encoder Detector Module: Using Context to Improve Robustness to Adversarial Attacks on Object Detection

Figure 2 for Transformer-Encoder Detector Module: Using Context to Improve Robustness to Adversarial Attacks on Object Detection

Figure 3 for Transformer-Encoder Detector Module: Using Context to Improve Robustness to Adversarial Attacks on Object Detection

Figure 4 for Transformer-Encoder Detector Module: Using Context to Improve Robustness to Adversarial Attacks on Object Detection

Abstract:Deep neural network approaches have demonstrated high performance in object recognition (CNN) and detection (Faster-RCNN) tasks, but experiments have shown that such architectures are vulnerable to adversarial attacks (FFF, UAP): low amplitude perturbations, barely perceptible by the human eye, can lead to a drastic reduction in labeling performance. This article proposes a new context module, called \textit{Transformer-Encoder Detector Module}, that can be applied to an object detector to (i) improve the labeling of object instances; and (ii) improve the detector's robustness to adversarial attacks. The proposed model achieves higher mAP, F1 scores and AUC average score of up to 13\% compared to the baseline Faster-RCNN detector, and an mAP score 8 points higher on images subjected to FFF or UAP attacks due to the inclusion of both contextual and visual features extracted from scene and encoded into the model. The result demonstrates that a simple ad-hoc context module can improve the reliability of object detectors significantly.

* Accepted for the 25th International Conference on Pattern Recognition (ICPR'2020)

Via

Access Paper or Ask Questions