Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hala Djeghim

LIPADE

SAIL: Self-supervised Albedo Estimation from Real Images with a Latent Diffusion Model

May 26, 2025

Hala Djeghim, Nathan Piasco, Luis Roldão, Moussab Bennehar, Dzmitry Tsishkou, Céline Loscos, Désiré Sidibé

Abstract:Intrinsic image decomposition aims at separating an image into its underlying albedo and shading components, isolating the base color from lighting effects to enable downstream applications such as virtual relighting and scene editing. Despite the rise and success of learning-based approaches, intrinsic image decomposition from real-world images remains a significant challenging task due to the scarcity of labeled ground-truth data. Most existing solutions rely on synthetic data as supervised setups, limiting their ability to generalize to real-world scenes. Self-supervised methods, on the other hand, often produce albedo maps that contain reflections and lack consistency under different lighting conditions. To address this, we propose SAIL, an approach designed to estimate albedo-like representations from single-view real-world images. We repurpose the prior knowledge of a latent diffusion model for unconditioned scene relighting as a surrogate objective for albedo estimation. To extract the albedo, we introduce a novel intrinsic image decomposition fully formulated in the latent space. To guide the training of our latent diffusion model, we introduce regularization terms that constrain both the lighting-dependent and independent components of our latent image decomposition. SAIL predicts stable albedo under varying lighting conditions and generalizes to multiple scenes, using only unlabeled multi-illumination data available online.

Via

Access Paper or Ask Questions

CoStruction: Conjoint radiance field optimization for urban scene reconStruction with limited image overlap

Jan 07, 2025

Fusang Wang, Hala Djeghim, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou

Figure 1 for CoStruction: Conjoint radiance field optimization for urban scene reconStruction with limited image overlap

Figure 2 for CoStruction: Conjoint radiance field optimization for urban scene reconStruction with limited image overlap

Figure 3 for CoStruction: Conjoint radiance field optimization for urban scene reconStruction with limited image overlap

Figure 4 for CoStruction: Conjoint radiance field optimization for urban scene reconStruction with limited image overlap

Abstract:Reconstructing the surrounding surface geometry from recorded driving sequences poses a significant challenge due to the limited image overlap and complex topology of urban environments. SoTA neural implicit surface reconstruction methods often struggle in such setting, either failing due to small vision overlap or exhibiting suboptimal performance in accurately reconstructing both the surface and fine structures. To address these limitations, we introduce CoStruction, a novel hybrid implicit surface reconstruction method tailored for large driving sequences with limited camera overlap. CoStruction leverages cross-representation uncertainty estimation to filter out ambiguous geometry caused by limited observations. Our method performs joint optimization of both radiance fields in addition to guided sampling achieving accurate reconstruction of large areas along with fine structures in complex urban scenarios. Extensive evaluation on major driving datasets demonstrates the superiority of our approach in reconstructing large driving sequences with limited image overlap, outperforming concurrent SoTA methods.

Via

Access Paper or Ask Questions

SCILLA: SurfaCe Implicit Learning for Large Urban Area, a volumetric hybrid solution

Mar 15, 2024

Hala Djeghim, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou, Désiré Sidibé

Figure 1 for SCILLA: SurfaCe Implicit Learning for Large Urban Area, a volumetric hybrid solution

Figure 2 for SCILLA: SurfaCe Implicit Learning for Large Urban Area, a volumetric hybrid solution

Figure 3 for SCILLA: SurfaCe Implicit Learning for Large Urban Area, a volumetric hybrid solution

Figure 4 for SCILLA: SurfaCe Implicit Learning for Large Urban Area, a volumetric hybrid solution

Abstract:Neural implicit surface representation methods have recently shown impressive 3D reconstruction results. However, existing solutions struggle to reconstruct urban outdoor scenes due to their large, unbounded, and highly detailed nature. Hence, to achieve accurate reconstructions, additional supervision data such as LiDAR, strong geometric priors, and long training times are required. To tackle such issues, we present SCILLA, a new hybrid implicit surface learning method to reconstruct large driving scenes from 2D images. SCILLA's hybrid architecture models two separate implicit fields: one for the volumetric density and another for the signed distance to the surface. To accurately represent urban outdoor scenarios, we introduce a novel volume-rendering strategy that relies on self-supervised probabilistic density estimation to sample points near the surface and transition progressively from volumetric to surface representation. Our solution permits a proper and fast initialization of the signed distance field without relying on any geometric prior on the scene, compared to concurrent methods. By conducting extensive experiments on four outdoor driving datasets, we show that SCILLA can learn an accurate and detailed 3D surface scene representation in various urban scenarios while being two times faster to train compared to previous state-of-the-art solutions.

Via

Access Paper or Ask Questions

Learning an Adaptation Function to Assess Image Visual Similarities

Jun 03, 2022

Olivier Risser-Maroix, Amine Marzouki, Hala Djeghim, Camille Kurtz, Nicolas Lomenie

Figure 1 for Learning an Adaptation Function to Assess Image Visual Similarities

Figure 2 for Learning an Adaptation Function to Assess Image Visual Similarities

Figure 3 for Learning an Adaptation Function to Assess Image Visual Similarities

Figure 4 for Learning an Adaptation Function to Assess Image Visual Similarities

Abstract:Human perception is routinely assessing the similarity between images, both for decision making and creative thinking. But the underlying cognitive process is not really well understood yet, hence difficult to be mimicked by computer vision systems. State-of-the-art approaches using deep architectures are often based on the comparison of images described as feature vectors learned for image categorization task. As a consequence, such features are powerful to compare semantically related images but not really efficient to compare images visually similar but semantically unrelated. Inspired by previous works on neural features adaptation to psycho-cognitive representations, we focus here on the specific task of learning visual image similarities when analogy matters. We propose to compare different supervised, semi-supervised and self-supervised networks, pre-trained on distinct scales and contents datasets (such as ImageNet-21k, ImageNet-1K or VGGFace2) to conclude which model may be the best to approximate the visual cortex and learn only an adaptation function corresponding to the approximation of the the primate IT cortex through the metric learning framework. Our experiments conducted on the Totally Looks Like image dataset highlight the interest of our method, by increasing the retrieval scores of the best model @1 by 2.25x. This research work was recently accepted for publication at the ICIP 2021 international conference [1]. In this new article, we expand on this previous work by using and comparing new pre-trained feature extractors on other datasets.

* ORASIS 2021, Centre National de la Recherche Scientifique [CNRS], Sep 2021, Saint Ferr{\'e}ol, France

Via

Access Paper or Ask Questions