Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stefan Roth

Anomaly Detection for Sensing Security

Jun 12, 2025

Stefan Roth, Aydin Sezgin

Abstract:Various approaches in the field of physical layer security involve anomaly detection, such as physical layer authentication, sensing attacks, and anti-tampering solutions. Depending on the context in which these approaches are applied, anomaly detection needs to be computationally lightweight, resilient to changes in temperature and environment, and robust against phase noise. We adapt moving average filters, autoregression filters and Kalman filters to provide predictions of feature vectors that fulfill the above criteria. Different hypothesis test designs are employed that allow omnidirectional and unidirectional outlier detection. In a case study, a sensing attack is investigated that employs the described algorithms with various channel features based on commodity WiFi devices. Thereby, various combinations of algorithms and channel features show effectiveness for motion detection by an attacker. Countermeasures only utilizing transmit power randomization are shown insufficient to mitigate such attacks if the attacker has access to channel state information (CSI) measurements, suggesting that mitigation solutions might require frequency-variant randomization.

Via

Access Paper or Ask Questions

Disentangling Polysemantic Channels in Convolutional Neural Networks

Apr 17, 2025

Robin Hesse, Jonas Fischer, Simone Schaub-Meyer, Stefan Roth

Abstract:Mechanistic interpretability is concerned with analyzing individual components in a (convolutional) neural network (CNN) and how they form larger circuits representing decision mechanisms. These investigations are challenging since CNNs frequently learn polysemantic channels that encode distinct concepts, making them hard to interpret. To address this, we propose an algorithm to disentangle a specific kind of polysemantic channel into multiple channels, each responding to a single concept. Our approach restructures weights in a CNN, utilizing that different concepts within the same channel exhibit distinct activation patterns in the previous layer. By disentangling these polysemantic features, we enhance the interpretability of CNNs, ultimately improving explanatory techniques such as feature visualizations.

* Accepted at CVPR 2025 Workshop on Mechanistic Interpretability for Vision (MIV). Code: https://github.com/visinf/disentangle-channels

Via

Access Paper or Ask Questions

Scene-Centric Unsupervised Panoptic Segmentation

Apr 02, 2025

Oliver Hahn, Christoph Reich, Nikita Araslanov, Daniel Cremers, Christian Rupprecht, Stefan Roth

Abstract:Unsupervised panoptic segmentation aims to partition an image into semantically meaningful regions and distinct object instances without training on manually annotated data. In contrast to prior work on unsupervised panoptic scene understanding, we eliminate the need for object-centric training data, enabling the unsupervised understanding of complex scenes. To that end, we present the first unsupervised panoptic method that directly trains on scene-centric imagery. In particular, we propose an approach to obtain high-resolution panoptic pseudo labels on complex scene-centric data, combining visual representations, depth, and motion cues. Utilizing both pseudo-label training and a panoptic self-training strategy yields a novel approach that accurately predicts panoptic segmentation of complex scenes without requiring any human annotations. Our approach significantly improves panoptic quality, e.g., surpassing the recent state of the art in unsupervised panoptic segmentation on Cityscapes by 9.4% points in PQ.

* To appear at CVPR 2025. Christoph Reich and Oliver Hahn - both authors contributed equally. Code: https://github.com/visinf/cups Project page: https://visinf.github.io/cups/

Via

Access Paper or Ask Questions

Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model

Mar 30, 2025

Jannik Endres, Oliver Hahn, Charles Corbière, Simone Schaub-Meyer, Stefan Roth, Alexandre Alahi

Abstract:Omnidirectional depth perception is essential for mobile robotics applications that require scene understanding across a full 360{\deg} field of view. Camera-based setups offer a cost-effective option by using stereo depth estimation to generate dense, high-resolution depth maps without relying on expensive active sensing. However, existing omnidirectional stereo matching approaches achieve only limited depth accuracy across diverse environments, depth ranges, and lighting conditions, due to the scarcity of real-world data. We present DFI-OmniStereo, a novel omnidirectional stereo matching method that leverages a large-scale pre-trained foundation model for relative monocular depth estimation within an iterative optimization-based stereo matching architecture. We introduce a dedicated two-stage training strategy to utilize the relative monocular depth features for our omnidirectional stereo matching before scale-invariant fine-tuning. DFI-OmniStereo achieves state-of-the-art results on the real-world Helvipad dataset, reducing disparity MAE by approximately 16% compared to the previous best omnidirectional stereo method.

* Project page: https://vita-epfl.github.io/DFI-OmniStereo-website/

Via

Access Paper or Ask Questions

Beyond Accuracy: What Matters in Designing Well-Behaved Models?

Mar 21, 2025

Robin Hesse, Doğukan Bağcı, Bernt Schiele, Simone Schaub-Meyer, Stefan Roth

Abstract:Deep learning has become an essential part of computer vision, with deep neural networks (DNNs) excelling in predictive performance. However, they often fall short in other critical quality dimensions, such as robustness, calibration, or fairness. While existing studies have focused on a subset of these quality dimensions, none have explored a more general form of "well-behavedness" of DNNs. With this work, we address this gap by simultaneously studying nine different quality dimensions for image classification. Through a large-scale study, we provide a bird's-eye view by analyzing 326 backbone models and how different training paradigms and model architectures affect the quality dimensions. We reveal various new insights such that (i) vision-language models exhibit high fairness on ImageNet-1k classification and strong robustness against domain changes; (ii) self-supervised learning is an effective training paradigm to improve almost all considered quality dimensions; and (iii) the training dataset size is a major driver for most of the quality dimensions. We conclude our study by introducing the QUBA score (Quality Understanding Beyond Accuracy), a novel metric that ranks models across multiple dimensions of quality, enabling tailored recommendations based on specific user needs.

* Code: https://github.com/visinf/beyond-accuracy

Via

Access Paper or Ask Questions

Covertness in the Near Field: Maximizing the Covert Region with FDA

Nov 22, 2024

Fatemeh Lotfi, Stefan Roth, Anas Chaaban, Aydin Sezgin

Abstract:Covert communication in wireless networks ensures that transmissions remain undetectable to adversaries, making it a potential enabler for privacy and security in sensitive applications. However, to meet the high performance and connectivity demands of sixth-generation (6G) networks, future wireless systems will require larger antenna arrays, higher operating frequencies, and advanced antenna architectures. This shift changes the propagation model from far-field planar-wave to near-field spherical-wave which necessitates a redesign of existing covert communication systems. Unlike far-field beamforming, which relies only on direction, near-field beamforming depends on both distance and direction, providing additional degrees of freedom for system design. In this paper, we aim to utilize those freedoms by proposing near-field Frequency Diverse Array (FDA)-based transmission strategies that manipulate the beampattern in both distance and angle, thereby establishing a non-covert region around the legitimate user. Our approach takes advantage of near-field properties and FDA technology to significantly reduce the area vulnerable to detection by adversaries while maintaining covert communication with the legitimate receiver. Numerical simulations show that our methods outperform conventional phased arrays by shrinking the non-covert region and allowing the covert region to expand as the number of antennas increases.

Via

Access Paper or Ask Questions

A Structural Analysis of the User Behavior Dynamics for Environmentally Sustainable ICT

Oct 14, 2024

Stefan Roth, Aydin Sezgin

Figure 1 for A Structural Analysis of the User Behavior Dynamics for Environmentally Sustainable ICT

Figure 2 for A Structural Analysis of the User Behavior Dynamics for Environmentally Sustainable ICT

Figure 3 for A Structural Analysis of the User Behavior Dynamics for Environmentally Sustainable ICT

Figure 4 for A Structural Analysis of the User Behavior Dynamics for Environmentally Sustainable ICT

Abstract:The sector of information and communication technology (ICT) can contribute to the fulfillment of the Paris agreement and the sustainable development goals (SDGs) through the introduction of sustainability strategies. For environmental sustainability, such strategies should contain efficiency, sufficiency, and consistency measures. To propose such, a structural analysis of ICT is undertaken in this manuscript. Thereby, key mechanisms and dynamics behind the usage of ICT and the corresponding energy and resource use are analyzed by describing ICT as a complex system. The system contains data centers, communication networks, smartphone hardware, apps, and the behavior of the users as sub-systems, between which various Morinian interactions are present. Energy and non-energy resources can be seen as inputs of the system, while e-waste is an output. Based on the system description, we propose multiple measures for efficiency, sufficiency and consistency to reduce greenhouse gas emissions and other environmental impacts.

Via

Access Paper or Ask Questions

DIAGen: Diverse Image Augmentation with Generative Models

Aug 26, 2024

Tobias Lingenberg, Markus Reuter, Gopika Sudhakaran, Dominik Gojny, Stefan Roth, Simone Schaub-Meyer

Figure 1 for DIAGen: Diverse Image Augmentation with Generative Models

Figure 2 for DIAGen: Diverse Image Augmentation with Generative Models

Figure 3 for DIAGen: Diverse Image Augmentation with Generative Models

Figure 4 for DIAGen: Diverse Image Augmentation with Generative Models

Abstract:Simple data augmentation techniques, such as rotations and flips, are widely used to enhance the generalization power of computer vision models. However, these techniques often fail to modify high-level semantic attributes of a class. To address this limitation, researchers have explored generative augmentation methods like the recently proposed DA-Fusion. Despite some progress, the variations are still largely limited to textural changes, thus falling short on aspects like varied viewpoints, environment, weather conditions, or even class-level semantic attributes (eg, variations in a dog's breed). To overcome this challenge, we propose DIAGen, building upon DA-Fusion. First, we apply Gaussian noise to the embeddings of an object learned with Textual Inversion to diversify generations using a pre-trained diffusion model's knowledge. Second, we exploit the general knowledge of a text-to-text generative model to guide the image generation of the diffusion model with varied class-specific prompts. Finally, we introduce a weighting mechanism to mitigate the impact of poorly generated samples. Experimental results across various datasets show that DIAGen not only enhances semantic diversity but also improves the performance of subsequent classifiers. The advantages of DIAGen over standard augmentations and the DA-Fusion baseline are particularly pronounced with out-of-distribution samples.

* Accepted for publication in GCPR 2024

Via

Access Paper or Ask Questions

Guided Latent Slot Diffusion for Object-Centric Learning

Jul 25, 2024

Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth

Abstract:Slot attention aims to decompose an input image into a set of meaningful object files (slots). These latent object representations enable various downstream tasks. Yet, these slots often bind to object parts, not objects themselves, especially for real-world datasets. To address this, we introduce Guided Latent Slot Diffusion - GLASS, an object-centric model that uses generated captions as a guiding signal to better align slots with objects. Our key insight is to learn the slot-attention module in the space of generated images. This allows us to repurpose the pre-trained diffusion decoder model, which reconstructs the images from the slots, as a semantic mask generator based on the generated captions. GLASS learns an object-level representation suitable for multiple tasks simultaneously, e.g., segmentation, image generation, and property prediction, outperforming previous methods. For object discovery, GLASS achieves approx. a +35% and +10% relative improvement for mIoU over the previous state-of-the-art (SOTA) method on the VOC and COCO datasets, respectively, and establishes a new SOTA FID score for conditional image generation amongst slot-attention-based methods. For the segmentation task, GLASS surpasses SOTA weakly-supervised and language-based segmentation models, which were specifically designed for the task.

* Project Page: https://guided-sa.github.io

Via

Access Paper or Ask Questions

Benchmarking the Attribution Quality of Vision Models

Jul 16, 2024

Robin Hesse, Simone Schaub-Meyer, Stefan Roth

Abstract:Attribution maps are one of the most established tools to explain the functioning of computer vision models. They assign importance scores to input features, indicating how relevant each feature is for the prediction of a deep neural network. While much research has gone into proposing new attribution methods, their proper evaluation remains a difficult challenge. In this work, we propose a novel evaluation protocol that overcomes two fundamental limitations of the widely used incremental-deletion protocol, i.e., the out-of-domain issue and lacking inter-model comparisons. This allows us to evaluate 23 attribution methods and how eight different design choices of popular vision models affect their attribution quality. We find that intrinsically explainable models outperform standard models and that raw attribution values exhibit a higher attribution quality than what is known from previous work. Further, we show consistent changes in the attribution quality when varying the network design, indicating that some standard design choices promote attribution quality.

Via

Access Paper or Ask Questions