Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Christoph Reich

Scene-Centric Unsupervised Panoptic Segmentation

Apr 02, 2025

Oliver Hahn, Christoph Reich, Nikita Araslanov, Daniel Cremers, Christian Rupprecht, Stefan Roth

Abstract:Unsupervised panoptic segmentation aims to partition an image into semantically meaningful regions and distinct object instances without training on manually annotated data. In contrast to prior work on unsupervised panoptic scene understanding, we eliminate the need for object-centric training data, enabling the unsupervised understanding of complex scenes. To that end, we present the first unsupervised panoptic method that directly trains on scene-centric imagery. In particular, we propose an approach to obtain high-resolution panoptic pseudo labels on complex scene-centric data, combining visual representations, depth, and motion cues. Utilizing both pseudo-label training and a panoptic self-training strategy yields a novel approach that accurately predicts panoptic segmentation of complex scenes without requiring any human annotations. Our approach significantly improves panoptic quality, e.g., surpassing the recent state of the art in unsupervised panoptic segmentation on Cityscapes by 9.4% points in PQ.

* To appear at CVPR 2025. Christoph Reich and Oliver Hahn - both authors contributed equally. Code: https://github.com/visinf/cups Project page: https://visinf.github.io/cups/

Via

Access Paper or Ask Questions

Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers

Jan 30, 2025

Malte Tölle, Mohamad Scharaf, Samantha Fischer, Christoph Reich, Silav Zeid, Christoph Dieterich, Benjamin Meder, Norbert Frey, Philipp Wild, Sandy Engelhardt

Figure 1 for Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers

Figure 2 for Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers

Figure 3 for Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers

Figure 4 for Arbitrary Data as Images: Fusion of Patient Data Across Modalities and Irregular Intervals with Vision Transformers

Abstract:A patient undergoes multiple examinations in each hospital stay, where each provides different facets of the health status. These assessments include temporal data with varying sampling rates, discrete single-point measurements, therapeutic interventions such as medication administration, and images. While physicians are able to process and integrate diverse modalities intuitively, neural networks need specific modeling for each modality complicating the training procedure. We demonstrate that this complexity can be significantly reduced by visualizing all information as images along with unstructured text and subsequently training a conventional vision-text transformer. Our approach, Vision Transformer for irregular sampled Multi-modal Measurements (ViTiMM), not only simplifies data preprocessing and modeling but also outperforms current state-of-the-art methods in predicting in-hospital mortality and phenotyping, as evaluated on 6,175 patients from the MIMIC-IV dataset. The modalities include patient's clinical measurements, medications, X-ray images, and electrocardiography scans. We hope our work inspires advancements in multi-modal medical AI by reducing the training complexity to (visual) prompt engineering, thus lowering entry barriers and enabling no-code solutions for training. The source code will be made publicly available.

Via

Access Paper or Ask Questions

A Perspective on Deep Vision Performance with Standard Image and Video Codecs

Apr 18, 2024

Christoph Reich, Oliver Hahn, Daniel Cremers, Stefan Roth, Biplob Debnath

Figure 1 for A Perspective on Deep Vision Performance with Standard Image and Video Codecs

Figure 2 for A Perspective on Deep Vision Performance with Standard Image and Video Codecs

Figure 3 for A Perspective on Deep Vision Performance with Standard Image and Video Codecs

Figure 4 for A Perspective on Deep Vision Performance with Standard Image and Video Codecs

Abstract:Resource-constrained hardware, such as edge devices or cell phones, often rely on cloud servers to provide the required computational resources for inference in deep vision models. However, transferring image and video data from an edge or mobile device to a cloud server requires coding to deal with network constraints. The use of standardized codecs, such as JPEG or H.264, is prevalent and required to ensure interoperability. This paper aims to examine the implications of employing standardized codecs within deep vision pipelines. We find that using JPEG and H.264 coding significantly deteriorates the accuracy across a broad range of vision tasks and models. For instance, strong compression rates reduce semantic segmentation accuracy by more than 80% in mIoU. In contrast to previous findings, our analysis extends beyond image and action classification to localization and dense prediction tasks, thus providing a more comprehensive perspective.

* Accepted at CVPR 2024 Workshop on AI for Streaming (AIS)

Via

Access Paper or Ask Questions

Deep Video Codec Control

Sep 16, 2023

Christoph Reich, Biplob Debnath, Deep Patel, Tim Prangemeier, Srimat Chakradhar

Abstract:Lossy video compression is commonly used when transmitting and storing video data. Unified video codecs (e.g., H.264 or H.265) remain the de facto standard, despite the availability of advanced (neural) compression approaches. Transmitting videos in the face of dynamic network bandwidth conditions requires video codecs to adapt to vastly different compression strengths. Rate control modules augment the codec's compression such that bandwidth constraints are satisfied and video distortion is minimized. While, both standard video codes and their rate control modules are developed to minimize video distortion w.r.t. human quality assessment, preserving the downstream performance of deep vision models is not considered. In this paper, we present the first end-to-end learnable deep video codec control considering both bandwidth constraints and downstream vision performance, while not breaking existing standardization. We demonstrate for two common vision tasks (semantic segmentation and optical flow estimation) and on two different datasets that our deep codec control better preserves downstream performance than using 2-pass average bit rate control while meeting dynamic bandwidth constraints and adhering to standardizations.

* 22 pages, 26 figures, 6 tables

Via

Access Paper or Ask Questions

Differentiable JPEG: The Devil is in the Details

Sep 13, 2023

Christoph Reich, Biplob Debnath, Deep Patel, Srimat Chakradhar

Figure 1 for Differentiable JPEG: The Devil is in the Details

Figure 2 for Differentiable JPEG: The Devil is in the Details

Figure 3 for Differentiable JPEG: The Devil is in the Details

Figure 4 for Differentiable JPEG: The Devil is in the Details

Abstract:JPEG remains one of the most widespread lossy image coding methods. However, the non-differentiable nature of JPEG restricts the application in deep learning pipelines. Several differentiable approximations of JPEG have recently been proposed to address this issue. This paper conducts a comprehensive review of existing diff. JPEG approaches and identifies critical details that have been missed by previous methods. To this end, we propose a novel diff. JPEG approach, overcoming previous limitations. Our approach is differentiable w.r.t. the input image, the JPEG quality, the quantization tables, and the color conversion parameters. We evaluate the forward and backward performance of our diff. JPEG approach against existing methods. Additionally, extensive ablations are performed to evaluate crucial design choices. Our proposed diff. JPEG resembles the (non-diff.) reference implementation best, significantly surpassing the recent-best diff. approach by $3.47$dB (PSNR) on average. For strong compression rates, we can even improve PSNR by $9.51$dB. Strong adversarial attack results are yielded by our diff. JPEG, demonstrating the effective gradient approximation. Our code is available at https://github.com/necla-ml/Diff-JPEG.

* Accepted at WACV 2024. Project page: https://christophreich1996.github.io/differentiable_jpeg/

Via

Access Paper or Ask Questions

The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures

Aug 23, 2023

Christoph Reich, Tim Prangemeier, Heinz Koeppl

Abstract:Segmenting cells and tracking their motion over time is a common task in biomedical applications. However, predicting accurate instance-wise segmentation and cell motions from microscopy imagery remains a challenging task. Using microstructured environments for analyzing single cells in a constant flow of media adds additional complexity. While large-scale labeled microscopy datasets are available, we are not aware of any large-scale dataset, including both cells and microstructures. In this paper, we introduce the trapped yeast cell (TYC) dataset, a novel dataset for understanding instance-level semantics and motions of cells in microstructures. We release $105$ dense annotated high-resolution brightfield microscopy images, including about $19$k instance masks. We also release $261$ curated video clips composed of $1293$ high-resolution microscopy images to facilitate unsupervised understanding of cell motions and morphology. TYC offers ten times more instance annotations than the previously largest dataset, including cells and microstructures. Our effort also exceeds previous attempts in terms of microstructure variability, resolution, complexity, and capturing device (microscopy) variability. We facilitate a unified comparison on our novel dataset by introducing a standardized evaluation strategy. TYC and evaluation code are publicly available under CC BY 4.0 license.

* Accepted at ICCV 2023 Workshop on BioImage Computing. Project page (with links to the dataset and code): https://christophreich1996.github.io/tyc_dataset/

Via

Access Paper or Ask Questions

An Instance Segmentation Dataset of Yeast Cells in Microstructures

Apr 23, 2023

Christoph Reich, Tim Prangemeier, André O. Françani, Heinz Koeppl

Abstract:Extracting single-cell information from microscopy data requires accurate instance-wise segmentations. Obtaining pixel-wise segmentations from microscopy imagery remains a challenging task, especially with the added complexity of microstructured environments. This paper presents a novel dataset for segmenting yeast cells in microstructures. We offer pixel-wise instance segmentation labels for both cells and trap microstructures. In total, we release 493 densely annotated microscopy images. To facilitate a unified comparison between novel segmentation algorithms, we propose a standardized evaluation strategy for our dataset. The aim of the dataset and evaluation strategy is to facilitate the development of new cell segmentation approaches. The dataset is publicly available at https://christophreich1996.github.io/yeast_in_microstructures_dataset/ .

* IEEE EMBC 2023 (accepted), Christoph Reich and Tim Prangemeier --- both authors contributed equally

Via

Access Paper or Ask Questions

Histopathological Image Classification based on Self-Supervised Vision Transformer and Weak Labels

Oct 17, 2022

Ahmet Gokberk Gul, Oezdemir Cetin, Christoph Reich, Tim Prangemeier, Nadine Flinner, Heinz Koeppl

Abstract:Whole Slide Image (WSI) analysis is a powerful method to facilitate the diagnosis of cancer in tissue samples. Automating this diagnosis poses various issues, most notably caused by the immense image resolution and limited annotations. WSIs commonly exhibit resolutions of 100Kx100K pixels. Annotating cancerous areas in WSIs on the pixel level is prohibitively labor-intensive and requires a high level of expert knowledge. Multiple instance learning (MIL) alleviates the need for expensive pixel-level annotations. In MIL, learning is performed on slide-level labels, in which a pathologist provides information about whether a slide includes cancerous tissue. Here, we propose Self-ViT-MIL, a novel approach for classifying and localizing cancerous areas based on slide-level annotations, eliminating the need for pixel-wise annotated training data. Self-ViT- MIL is pre-trained in a self-supervised setting to learn rich feature representation without relying on any labels. The recent Vision Transformer (ViT) architecture builds the feature extractor of Self-ViT-MIL. For localizing cancerous regions, a MIL aggregator with global attention is utilized. To the best of our knowledge, Self-ViT- MIL is the first approach to introduce self-supervised ViTs in MIL-based WSI analysis tasks. We showcase the effectiveness of our approach on the common Camelyon16 dataset. Self-ViT-MIL surpasses existing state-of-the-art MIL-based approaches in terms of accuracy and area under the curve (AUC).

* Proc. SPIE 12039, Medical Imaging 2022: Digital and Computational Pathology, 120391O (4 April 2022)

Via

Access Paper or Ask Questions

Scalable 3D Semantic Segmentation for Gun Detection in CT Scans

Dec 07, 2021

Marius Memmel, Christoph Reich, Nicolas Wagner, Faraz Saeedan

Figure 1 for Scalable 3D Semantic Segmentation for Gun Detection in CT Scans

Figure 2 for Scalable 3D Semantic Segmentation for Gun Detection in CT Scans

Figure 3 for Scalable 3D Semantic Segmentation for Gun Detection in CT Scans

Figure 4 for Scalable 3D Semantic Segmentation for Gun Detection in CT Scans

Abstract:With the increased availability of 3D data, the need for solutions processing those also increased rapidly. However, adding dimension to already reliably accurate 2D approaches leads to immense memory consumption and higher computational complexity. These issues cause current hardware to reach its limitations, with most methods forced to reduce the input resolution drastically. Our main contribution is a novel deep 3D semantic segmentation method for gun detection in baggage CT scans that enables fast training and low video memory consumption for high-resolution voxelized volumes. We introduce a moving pyramid approach that utilizes multiple forward passes at inference time for segmenting an instance.

* This work was part of the Project Lab Deep Learning in Computer Vision Winter Semester 2019/2020 at TU Darmstadt

Via

Access Paper or Ask Questions

OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data

Oct 20, 2021

Christoph Reich, Tim Prangemeier, Özdemir Cetin, Heinz Koeppl

Figure 1 for OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data

Figure 2 for OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data

Figure 3 for OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data

Figure 4 for OSS-Net: Memory Efficient High Resolution Semantic Segmentation of 3D Medical Data

Abstract:Convolutional neural networks (CNNs) are the current state-of-the-art meta-algorithm for volumetric segmentation of medical data, for example, to localize COVID-19 infected tissue on computer tomography scans or the detection of tumour volumes in magnetic resonance imaging. A key limitation of 3D CNNs on voxelised data is that the memory consumption grows cubically with the training data resolution. Occupancy networks (O-Nets) are an alternative for which the data is represented continuously in a function space and 3D shapes are learned as a continuous decision boundary. While O-Nets are significantly more memory efficient than 3D CNNs, they are limited to simple shapes, are relatively slow at inference, and have not yet been adapted for 3D semantic segmentation of medical data. Here, we propose Occupancy Networks for Semantic Segmentation (OSS-Nets) to accurately and memory-efficiently segment 3D medical data. We build upon the original O-Net with modifications for increased expressiveness leading to improved segmentation performance comparable to 3D CNNs, as well as modifications for faster inference. We leverage local observations to represent complex shapes and prior encoder predictions to expedite inference. We showcase OSS-Net's performance on 3D brain tumour and liver segmentation against a function space baseline (O-Net), a performance baseline (3D residual U-Net), and an efficiency baseline (2D residual U-Net). OSS-Net yields segmentation results similar to the performance baseline and superior to the function space and efficiency baselines. In terms of memory efficiency, OSS-Net consumes comparable amounts of memory as the function space baseline, somewhat more memory than the efficiency baseline and significantly less than the performance baseline. As such, OSS-Net enables memory-efficient and accurate 3D semantic segmentation that can scale to high resolutions.

* BMVC 2021 (accepted), https://github.com/ChristophReich1996/OSS-Net (code)

Via

Access Paper or Ask Questions