Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Timo Milbich

Atlas: A Novel Pathology Foundation Model by Mayo Clinic, Charité, and Aignostics

Jan 10, 2025

Maximilian Alber, Stephan Tietz, Jonas Dippel, Timo Milbich, Timothée Lesort, Panos Korfiatis, Moritz Krügener, Beatriz Perez Cancer, Neelay Shah, Alexander Möllers(+17 more)

Abstract:Recent advances in digital pathology have demonstrated the effectiveness of foundation models across diverse applications. In this report, we present Atlas, a novel vision foundation model based on the RudolfV approach. Our model was trained on a dataset comprising 1.2 million histopathology whole slide images, collected from two medical institutions: Mayo Clinic and Charit\'e - Universt\"atsmedizin Berlin. Comprehensive evaluations show that Atlas achieves state-of-the-art performance across twenty-one public benchmark datasets, even though it is neither the largest model by parameter count nor by training dataset size.

Via

Access Paper or Ask Questions

Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning

Jul 20, 2021

Timo Milbich, Karsten Roth, Samarth Sinha, Ludwig Schmidt, Marzyeh Ghassemi, Björn Ommer

Figure 1 for Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning

Figure 2 for Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning

Figure 3 for Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning

Figure 4 for Characterizing Generalization under Out-Of-Distribution Shifts in Deep Metric Learning

Abstract:Deep Metric Learning (DML) aims to find representations suitable for zero-shot transfer to a priori unknown test distributions. However, common evaluation protocols only test a single, fixed data split in which train and test classes are assigned randomly. More realistic evaluations should consider a broad spectrum of distribution shifts with potentially varying degree and difficulty. In this work, we systematically construct train-test splits of increasing difficulty and present the ooDML benchmark to characterize generalization under out-of-distribution shifts in DML. ooDML is designed to probe the generalization performance on much more challenging, diverse train-to-test distribution shifts. Based on our new benchmark, we conduct a thorough empirical analysis of state-of-the-art DML methods. We find that while generalization tends to consistently degrade with difficulty, some methods are better at retaining performance as the distribution shift increases. Finally, we propose few-shot DML as an efficient way to consistently improve generalization in response to unknown test shifts presented in ooDML. Code available here: https://github.com/Confusezius/Characterizing_Generalization_in_DeepMetricLearning.

Via

Access Paper or Ask Questions

iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

Jul 06, 2021

Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer

Figure 1 for iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

Figure 2 for iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

Figure 3 for iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

Figure 4 for iPOKE: Poking a Still Image for Controlled Stochastic Video Synthesis

Abstract:How would a static scene react to a local poke? What are the effects on other parts of an object if you could locally push it? There will be distinctive movement, despite evident variations caused by the stochastic nature of our world. These outcomes are governed by the characteristic kinematics of objects that dictate their overall motion caused by a local interaction. Conversely, the movement of an object provides crucial information about its underlying distinctive kinematics and the interdependencies between its parts. This two-way relation motivates learning a bijective mapping between object kinematics and plausible future image sequences. Therefore, we propose iPOKE - invertible Prediction of Object Kinematics - that, conditioned on an initial frame and a local poke, allows to sample object kinematics and establishes a one-to-one correspondence to the corresponding plausible videos, thereby providing a controlled stochastic video synthesis. In contrast to previous works, we do not generate arbitrary realistic videos, but provide efficient control of movements, while still capturing the stochastic nature of our environment and the diversity of plausible outcomes it entails. Moreover, our approach can transfer kinematics onto novel object instances and is not confined to particular object classes. Project page is available at https://bit.ly/3dJN4Lf

* Project page is available at https://bit.ly/3dJN4Lf

Via

Access Paper or Ask Questions

Understanding Object Dynamics for Interactive Image-to-Video Synthesis

Jun 21, 2021

Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer

Figure 1 for Understanding Object Dynamics for Interactive Image-to-Video Synthesis

Figure 2 for Understanding Object Dynamics for Interactive Image-to-Video Synthesis

Figure 3 for Understanding Object Dynamics for Interactive Image-to-Video Synthesis

Figure 4 for Understanding Object Dynamics for Interactive Image-to-Video Synthesis

Abstract:What would be the effect of locally poking a static scene? We present an approach that learns naturally-looking global articulations caused by a local manipulation at a pixel level. Training requires only videos of moving objects but no information of the underlying manipulation of the physical scene. Our generative model learns to infer natural object dynamics as a response to user interaction and learns about the interrelations between different object body regions. Given a static image of an object and a local poking of a pixel, the approach then predicts how the object would deform over time. In contrast to existing work on video prediction, we do not synthesize arbitrary realistic videos but enable local interactive control of the deformation. Our model is not restricted to particular object categories and can transfer dynamics onto novel unseen object instances. Extensive experiments on diverse objects demonstrate the effectiveness of our approach compared to common video prediction frameworks. Project page is available at https://bit.ly/3cxfA2L .

* CVPR 2021, project page available at https://bit.ly/3cxfA2L

Via

Access Paper or Ask Questions

Stochastic Image-to-Video Synthesis using cINNs

May 10, 2021

Michael Dorkenwald, Timo Milbich, Andreas Blattmann, Robin Rombach, Konstantinos G. Derpanis, Björn Ommer

Figure 1 for Stochastic Image-to-Video Synthesis using cINNs

Figure 2 for Stochastic Image-to-Video Synthesis using cINNs

Figure 3 for Stochastic Image-to-Video Synthesis using cINNs

Figure 4 for Stochastic Image-to-Video Synthesis using cINNs

Abstract:Video understanding calls for a model to learn the characteristic interplay between static scene content and its dynamics: Given an image, the model must be able to predict a future progression of the portrayed scene and, conversely, a video should be explained in terms of its static image content and all the remaining characteristics not present in the initial frame. This naturally suggests a bijective mapping between the video domain and the static content as well as residual information. In contrast to common stochastic image-to-video synthesis, such a model does not merely generate arbitrary videos progressing the initial image. Given this image, it rather provides a one-to-one mapping between the residual vectors and the video with stochastic outcomes when sampling. The approach is naturally implemented using a conditional invertible neural network (cINN) that can explain videos by independently modelling static and other video characteristics, thus laying the basis for controlled video synthesis. Experiments on four diverse video datasets demonstrate the effectiveness of our approach in terms of both the quality and diversity of the synthesized results. Our project page is available at https://bit.ly/3t66bnU.

* Accepted to CVPR 2021

Via

Access Paper or Ask Questions

Behavior-Driven Synthesis of Human Dynamics

Mar 08, 2021

Andreas Blattmann, Timo Milbich, Michael Dorkenwald, Björn Ommer

Figure 1 for Behavior-Driven Synthesis of Human Dynamics

Figure 2 for Behavior-Driven Synthesis of Human Dynamics

Figure 3 for Behavior-Driven Synthesis of Human Dynamics

Figure 4 for Behavior-Driven Synthesis of Human Dynamics

Abstract:Generating and representing human behavior are of major importance for various computer vision applications. Commonly, human video synthesis represents behavior as sequences of postures while directly predicting their likely progressions or merely changing the appearance of the depicted persons, thus not being able to exercise control over their actual behavior during the synthesis process. In contrast, controlled behavior synthesis and transfer across individuals requires a deep understanding of body dynamics and calls for a representation of behavior that is independent of appearance and also of specific postures. In this work, we present a model for human behavior synthesis which learns a dedicated representation of human dynamics independent of postures. Using this representation, we are able to change the behavior of a person depicted in an arbitrary posture, or to even directly transfer behavior observed in a given video sequence. To this end, we propose a conditional variational framework which explicitly disentangles posture from behavior. We demonstrate the effectiveness of our approach on this novel task, evaluating capturing, transferring, and sampling fine-grained, diverse behavior, both quantitatively and qualitatively. Project page is available at https://cutt.ly/5l7rXEp

* Accepted to CVPR 2021 as Poster

Via

Access Paper or Ask Questions

S2SD: Simultaneous Similarity-based Self-Distillation for Deep Metric Learning

Oct 01, 2020

Karsten Roth, Timo Milbich, Björn Ommer, Joseph Paul Cohen, Marzyeh Ghassemi

Figure 1 for S2SD: Simultaneous Similarity-based Self-Distillation for Deep Metric Learning

Figure 2 for S2SD: Simultaneous Similarity-based Self-Distillation for Deep Metric Learning

Figure 3 for S2SD: Simultaneous Similarity-based Self-Distillation for Deep Metric Learning

Figure 4 for S2SD: Simultaneous Similarity-based Self-Distillation for Deep Metric Learning

Abstract:Deep Metric Learning (DML) provides a crucial tool for visual similarity and zero-shot retrieval applications by learning generalizing embedding spaces, although recent work in DML has shown strong performance saturation across training objectives. However, generalization capacity is known to scale with the embedding space dimensionality. Unfortunately, high dimensional embeddings also create higher retrieval cost for downstream applications. To remedy this, we propose S2SD - Simultaneous Similarity-based Self-distillation. S2SD extends DML with knowledge distillation from auxiliary, high-dimensional embedding and feature spaces to leverage complementary context during training while retaining test-time cost and with negligible changes to the training time. Experiments and ablations across different objectives and standard benchmarks show S2SD offering notable improvements of up to 7% in Recall@1, while also setting a new state-of-the-art. Code available at https://github.com/MLforHealth/S2SD.

Via

Access Paper or Ask Questions

DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning

Apr 29, 2020

Timo Milbich, Karsten Roth, Homanga Bharadhwaj, Samarth Sinha, Yoshua Bengio, Björn Ommer, Joseph Paul Cohen

Figure 1 for DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning

Figure 2 for DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning

Figure 3 for DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning

Figure 4 for DiVA: Diverse Visual Feature Aggregation for Deep Metric Learning

Abstract:Visual Similarity plays an important role in many computer vision applications. Deep metric learning (DML) is a powerful framework for learning such similarities which not only generalize from training data to identically distributed test distributions, but in particular also translate to unknown test classes. However, its prevailing learning paradigm is class-discriminative supervised training, which typically results in representations specialized in separating training classes. For effective generalization, however, such an image representation needs to capture a diverse range of data characteristics. To this end, we propose and study multiple complementary learning tasks, targeting conceptually different data relationships by only resorting to the available training samples and labels of a standard DML setting. Through simultaneous optimization of our tasks we learn a single model to aggregate their training signals, resulting in strong generalization and state-of-the-art performance on multiple established DML benchmark datasets.

* 18 pages

Via

Access Paper or Ask Questions

Sharing Matters for Generalization in Deep Metric Learning

Apr 12, 2020

Timo Milbich, Karsten Roth, Biagio Brattoli, Björn Ommer

Figure 1 for Sharing Matters for Generalization in Deep Metric Learning

Figure 2 for Sharing Matters for Generalization in Deep Metric Learning

Figure 3 for Sharing Matters for Generalization in Deep Metric Learning

Figure 4 for Sharing Matters for Generalization in Deep Metric Learning

Abstract:Learning the similarity between images constitutes the foundation for numerous vision tasks. The common paradigm is discriminative metric learning, which seeks an embedding that separates different training classes. However, the main challenge is to learn a metric that not only generalizes from training to novel, but related, test samples. It should also transfer to different object classes. So what complementary information is missed by the discriminative paradigm? Besides finding characteristics that separate between classes, we also need them to likely occur in novel categories, which is indicated if they are shared across training classes. This work investigates how to learn such characteristics without the need for extra annotations or training data. By formulating our approach as a novel triplet sampling strategy, it can be easily applied on top of recent ranking loss frameworks. Experiments show that, independent of the underlying network architecture and the specific ranking loss, our approach significantly improves performance in deep metric learning, leading to new the state-of-the-art results on various standard benchmark datasets.

* Technical Report

Via

Access Paper or Ask Questions

PADS: Policy-Adapted Sampling for Visual Similarity Learning

Mar 28, 2020

Karsten Roth, Timo Milbich, Björn Ommer

Figure 1 for PADS: Policy-Adapted Sampling for Visual Similarity Learning

Figure 2 for PADS: Policy-Adapted Sampling for Visual Similarity Learning

Figure 3 for PADS: Policy-Adapted Sampling for Visual Similarity Learning

Figure 4 for PADS: Policy-Adapted Sampling for Visual Similarity Learning

Abstract:Learning visual similarity requires to learn relations, typically between triplets of images. Albeit triplet approaches being powerful, their computational complexity mostly limits training to only a subset of all possible training triplets. Thus, sampling strategies that decide when to use which training sample during learning are crucial. Currently, the prominent paradigm are fixed or curriculum sampling strategies that are predefined before training starts. However, the problem truly calls for a sampling process that adjusts based on the actual state of the similarity representation during training. We, therefore, employ reinforcement learning and have a teacher network adjust the sampling distribution based on the current state of the learner network, which represents visual similarity. Experiments on benchmark datasets using standard triplet-based losses show that our adaptive sampling strategy significantly outperforms fixed sampling strategies. Moreover, although our adaptive sampling is only applied on top of basic triplet-learning frameworks, we reach competitive results to state-of-the-art approaches that employ diverse additional learning signals or strong ensemble architectures. Code can be found under https://github.com/Confusezius/CVPR2020_PADS.

* Accepted to CVPR2020

Via

Access Paper or Ask Questions