Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Benjamin Wild

Cohort-Based Active Modality Acquisition

May 22, 2025

Tillmann Rheude, Roland Eils, Benjamin Wild

Abstract:Real-world machine learning applications often involve data from multiple modalities that must be integrated effectively to make robust predictions. However, in many practical settings, not all modalities are available for every sample, and acquiring additional modalities can be costly. This raises the question: which samples should be prioritized for additional modality acquisition when resources are limited? While prior work has explored individual-level acquisition strategies and training-time active learning paradigms, test-time and cohort-based acquisition remain underexplored despite their importance in many real-world settings. We introduce Cohort-based Active Modality Acquisition (CAMA), a novel test-time setting to formalize the challenge of selecting which samples should receive additional modalities. We derive acquisition strategies that leverage a combination of generative imputation and discriminative modeling to estimate the expected benefit of acquiring missing modalities based on common evaluation metrics. We also introduce upper-bound heuristics that provide performance ceilings to benchmark acquisition strategies. Experiments on common multimodal datasets demonstrate that our proposed imputation-based strategies can more effectively guide the acquisition of new samples in comparison to those relying solely on unimodal information, entropy guidance, and random selections. Our work provides an effective solution for optimizing modality acquisition at the cohort level, enabling better utilization of resources in constrained settings.

Via

Access Paper or Ask Questions

JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model

May 22, 2025

Qihao Duan, Bingding Huang, Zhenqiao Song, Irina Lehmann, Lei Gu, Roland Eils, Benjamin Wild

Abstract:Large language models (LLMs) have revolutionized natural language processing and are increasingly applied to other sequential data types, including genetic sequences. However, adapting LLMs to genomics presents significant challenges. Capturing complex genomic interactions requires modeling long-range dependencies within DNA sequences, where interactions often span over 10,000 base pairs, even within a single gene, posing substantial computational burdens under conventional model architectures and training paradigms. Moreover, standard LLM training approaches are suboptimal for DNA: autoregressive training, while efficient, supports only unidirectional understanding. However, DNA is inherently bidirectional, e.g., bidirectional promoters regulate transcription in both directions and account for nearly 11% of human gene expression. Masked language models (MLMs) allow bidirectional understanding but are inefficient, as only masked tokens contribute to the loss per step. To address these limitations, we introduce JanusDNA, the first bidirectional DNA foundation model built upon a novel pretraining paradigm that combines the optimization efficiency of autoregressive modeling with the bidirectional comprehension of masked modeling. JanusDNA adopts a hybrid Mamba, Attention and Mixture of Experts (MoE) architecture, combining long-range modeling of Attention with efficient sequential learning of Mamba. MoE layers further scale model capacity via sparse activation while keeping computational cost low. Notably, JanusDNA processes up to 1 million base pairs at single nucleotide resolution on a single 80GB GPU. Extensive experiments and ablations show JanusDNA achieves new SOTA results on three genomic representation benchmarks, outperforming models with 250x more activated parameters. Code: https://github.com/Qihao-Duan/JanusDNA

Via

Access Paper or Ask Questions

Large Language Models are Powerful EHR Encoders

Feb 24, 2025

Stefan Hegselmann, Georg von Arnim, Tillmann Rheude, Noel Kronenberg, David Sontag, Gerhard Hindricks, Roland Eils, Benjamin Wild

Abstract:Electronic Health Records (EHRs) offer rich potential for clinical prediction, yet their inherent complexity and heterogeneity pose significant challenges for traditional machine learning approaches. Domain-specific EHR foundation models trained on large collections of unlabeled EHR data have demonstrated promising improvements in predictive accuracy and generalization; however, their training is constrained by limited access to diverse, high-quality datasets and inconsistencies in coding standards and healthcare practices. In this study, we explore the possibility of using general-purpose Large Language Models (LLMs) based embedding methods as EHR encoders. By serializing patient records into structured Markdown text, transforming codes into human-readable descriptors, we leverage the extensive generalization capabilities of LLMs pretrained on vast public corpora, thereby bypassing the need for proprietary medical datasets. We systematically evaluate two state-of-the-art LLM-embedding models, GTE-Qwen2-7B-Instruct and LLM2Vec-Llama3.1-8B-Instruct, across 15 diverse clinical prediction tasks from the EHRSHOT benchmark, comparing their performance to an EHRspecific foundation model, CLIMBR-T-Base, and traditional machine learning baselines. Our results demonstrate that LLM-based embeddings frequently match or exceed the performance of specialized models, even in few-shot settings, and that their effectiveness scales with the size of the underlying LLM and the available context window. Overall, our findings demonstrate that repurposing LLMs for EHR encoding offers a scalable and effective approach for clinical prediction, capable of overcoming the limitations of traditional EHR modeling and facilitating more interoperable and generalizable healthcare applications.

Via

Access Paper or Ask Questions

Diffsurv: Differentiable sorting for censored time-to-event data

Apr 26, 2023

Andre Vauvelle, Benjamin Wild, Aylin Cakiroglu, Roland Eils, Spiros Denaxas

Abstract:Survival analysis is a crucial semi-supervised task in machine learning with numerous real-world applications, particularly in healthcare. Currently, the most common approach to survival analysis is based on Cox's partial likelihood, which can be interpreted as a ranking model optimized on a lower bound of the concordance index. This relation between ranking models and Cox's partial likelihood considers only pairwise comparisons. Recent work has developed differentiable sorting methods which relax this pairwise independence assumption, enabling the ranking of sets of samples. However, current differentiable sorting methods cannot account for censoring, a key factor in many real-world datasets. To address this limitation, we propose a novel method called Diffsurv. We extend differentiable sorting methods to handle censored tasks by predicting matrices of possible permutations that take into account the label uncertainty introduced by censored samples. We contrast this approach with methods derived from partial likelihood and ranking losses. Our experiments show that Diffsurv outperforms established baselines in various simulated and real-world risk prediction scenarios. Additionally, we demonstrate the benefits of the algorithmic supervision enabled by Diffsurv by presenting a novel method for top-k risk prediction that outperforms current methods.

Via

Access Paper or Ask Questions

BioTracker: An Open-Source Computer Vision Framework for Visual Animal Tracking

Mar 21, 2018

Hauke Jürgen Mönck, Andreas Jörg, Tobias von Falkenhausen, Julian Tanke, Benjamin Wild, David Dormagen, Jonas Piotrowski, Claudia Winklmayr, David Bierbach, Tim Landgraf

Figure 1 for BioTracker: An Open-Source Computer Vision Framework for Visual Animal Tracking

Figure 2 for BioTracker: An Open-Source Computer Vision Framework for Visual Animal Tracking

Abstract:The study of animal behavior increasingly relies on (semi-) automatic methods for the extraction of relevant behavioral features from video or picture data. To date, several specialized software products exist to detect and track animals' positions in simple (laboratory) environments. Tracking animals in their natural environments, however, often requires substantial customization of the image processing algorithms to the problem-specific image characteristics. Here we introduce BioTracker, an open-source computer vision framework, that provides programmers with core functionalities that are essential parts of a tracking software, such as video I/O, graphics overlays and mouse and keyboard interfaces. BioTracker additionally provides a number of different tracking algorithms suitable for a variety of image recording conditions. The main feature of BioTracker is however the straightforward implementation of new problem-specific tracking modules and vision algorithms that can build upon BioTracker's core functionalities. With this open-source framework the scientific community can accelerate their research and focus on the development of new vision algorithms.

* 4 pages, 2 figures

Via

Access Paper or Ask Questions

Tracking all members of a honey bee colony over their lifetime

Mar 15, 2018

Franziska Boenisch, Benjamin Rosemann, Benjamin Wild, Fernando Wario, David Dormagen, Tim Landgraf

Figure 1 for Tracking all members of a honey bee colony over their lifetime

Figure 2 for Tracking all members of a honey bee colony over their lifetime

Figure 3 for Tracking all members of a honey bee colony over their lifetime

Figure 4 for Tracking all members of a honey bee colony over their lifetime

Abstract:Computational approaches to the analysis of collective behavior in social insects increasingly rely on motion paths as an intermediate data layer from which one can infer individual behaviors or social interactions. Honey bees are a popular model for learning and memory. Previous experience has been shown to affect and modulate future social interactions. So far, no lifetime history observations have been reported for all bees of a colony. In a previous work we introduced a tracking system customized to track up to $4000$ bees over several weeks. In this contribution we present an in-depth description of the underlying multi-step algorithm which both produces the motion paths, and also improves the marker decoding accuracy significantly. We automatically tracked ${\sim}2000$ marked honey bees over 10 weeks with inexpensive recording hardware using markers without any error correction bits. We found that the proposed two-step tracking reduced incorrect ID decodings from initially ${\sim}13\%$ to around $2\%$ post-tracking. Alongside this paper, we publish the first trajectory dataset for all bees in a colony, extracted from ${\sim} 4$ million images. We invite researchers to join the collective scientific effort to investigate this intriguing animal system. All components of our system are open-source.

Via

Access Paper or Ask Questions

Automatic localization and decoding of honeybee markers using deep convolutional neural networks

Feb 14, 2018

Benjamin Wild, Leon Sixt, Tim Landgraf

Figure 1 for Automatic localization and decoding of honeybee markers using deep convolutional neural networks

Figure 2 for Automatic localization and decoding of honeybee markers using deep convolutional neural networks

Figure 3 for Automatic localization and decoding of honeybee markers using deep convolutional neural networks

Figure 4 for Automatic localization and decoding of honeybee markers using deep convolutional neural networks

Abstract:The honeybee is a fascinating model animal to investigate how collective behavior emerges from (inter-)actions of thousands of individuals. Bees may acquire unique memories throughout their lives. These experiences affect social interactions even over large time frames. Tracking and identifying all bees in the colony over their lifetimes therefore may likely shed light on the interplay of individual differences and colony behavior. This paper proposes a software pipeline based on two deep convolutional neural networks for the localization and decoding of custom binary markers that honeybees carry from their first to the last day in their life. We show that this approach outperforms similar systems proposed in recent literature. By opening this software for the public, we hope that the resulting datasets will help advancing the understanding of honeybee collective intelligence.

Via

Access Paper or Ask Questions

Automatic detection and decoding of honey bee waggle dances

Dec 05, 2017

Fernando Wario, Benjamin Wild, Raúl Rojas, Tim Landgraf

Figure 1 for Automatic detection and decoding of honey bee waggle dances

Figure 2 for Automatic detection and decoding of honey bee waggle dances

Figure 3 for Automatic detection and decoding of honey bee waggle dances

Figure 4 for Automatic detection and decoding of honey bee waggle dances

Abstract:The waggle dance is one of the most popular examples of animal communication. Forager bees direct their nestmates to profitable resources via a complex motor display. Essentially, the dance encodes the polar coordinates to the resource in the field. Unemployed foragers follow the dancer's movements and then search for the advertised spots in the field. Throughout the last decades, biologists have employed different techniques to measure key characteristics of the waggle dance and decode the information it conveys. Early techniques involved the use of protractors and stopwatches to measure the dance orientation and duration directly from the observation hive. Recent approaches employ digital video recordings and manual measurements on screen. However, manual approaches are very time-consuming. Most studies, therefore, regard only small numbers of animals in short periods of time. We have developed a system capable of automatically detecting, decoding and mapping communication dances in real-time. In this paper, we describe our recording setup, the image processing steps performed for dance detection and decoding and an algorithm to map dances to the field. The proposed system performs with a detection accuracy of 90.07\%. The decoded waggle orientation has an average error of -2.92{\deg} ($\pm$ 7.37{\deg} ), well within the range of human error. To evaluate and exemplify the system's performance, a group of bees was trained to an artificial feeder, and all dances in the colony were automatically detected, decoded and mapped. The system presented here is the first of this kind made publicly available, including source code and hardware specifications. We hope this will foster quantitative analyses of the honey bee waggle dance.

* 16 pages, LaTeX; a new value for the ratio distance-waggle run duration was computed. Figure 2 has been updated and discussion section was improved

Via

Access Paper or Ask Questions

RenderGAN: Generating Realistic Labeled Data

Jan 12, 2017

Leon Sixt, Benjamin Wild, Tim Landgraf

Figure 1 for RenderGAN: Generating Realistic Labeled Data

Figure 2 for RenderGAN: Generating Realistic Labeled Data

Figure 3 for RenderGAN: Generating Realistic Labeled Data

Figure 4 for RenderGAN: Generating Realistic Labeled Data

Abstract:Deep Convolutional Neuronal Networks (DCNNs) are showing remarkable performance on many computer vision tasks. Due to their large parameter space, they require many labeled samples when trained in a supervised setting. The costs of annotating data manually can render the use of DCNNs infeasible. We present a novel framework called RenderGAN that can generate large amounts of realistic, labeled images by combining a 3D model and the Generative Adversarial Network framework. In our approach, image augmentations (e.g. lighting, background, and detail) are learned from unlabeled data such that the generated images are strikingly realistic while preserving the labels known from the 3D model. We apply the RenderGAN framework to generate images of barcode-like markers that are attached to honeybees. Training a DCNN on data generated by the RenderGAN yields considerably better performance than training it on various baselines.

Via

Access Paper or Ask Questions