Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

André Martin

Heterogeneous Datasets for Federated Survival Analysis Simulation

Jan 28, 2023

Alberto Archetti, Eugenio Lomurno, Francesco Lattari, André Martin, Matteo Matteucci

Figure 1 for Heterogeneous Datasets for Federated Survival Analysis Simulation

Figure 2 for Heterogeneous Datasets for Federated Survival Analysis Simulation

Figure 3 for Heterogeneous Datasets for Federated Survival Analysis Simulation

Figure 4 for Heterogeneous Datasets for Federated Survival Analysis Simulation

Abstract:Survival analysis studies time-modeling techniques for an event of interest occurring for a population. Survival analysis found widespread applications in healthcare, engineering, and social sciences. However, the data needed to train survival models are often distributed, incomplete, censored, and confidential. In this context, federated learning can be exploited to tremendously improve the quality of the models trained on distributed data while preserving user privacy. However, federated survival analysis is still in its early development, and there is no common benchmarking dataset to test federated survival models. This work proposes a novel technique for constructing realistic heterogeneous datasets by starting from existing non-federated datasets in a reproducible way. Specifically, we provide two novel dataset-splitting algorithms based on the Dirichlet distribution to assign each data sample to a carefully chosen client: quantity-skewed splitting and label-skewed splitting. Furthermore, these algorithms allow for obtaining different levels of heterogeneity by changing a single hyperparameter. Finally, numerical experiments provide a quantitative evaluation of the heterogeneity level using log-rank tests and a qualitative analysis of the generated splits. The implementation of the proposed methods is publicly available in favor of reproducibility and to encourage common practices to simulate federated environments for survival analysis.

Via

Access Paper or Ask Questions

Grand Challenge: Real-time Destination and ETA Prediction for Maritime Traffic

Oct 12, 2018

Oleh Bodunov, Florian Schmidt, André Martin, Andrey Brito, Christof Fetzer

Abstract:In this paper, we present our approach for solving the DEBS Grand Challenge 2018. The challenge asks to provide a prediction for (i) a destination and the (ii) arrival time of ships in a streaming-fashion using Geo-spatial data in the maritime context. Novel aspects of our approach include the use of ensemble learning based on Random Forest, Gradient Boosting Decision Trees (GBDT), XGBoost Trees and Extremely Randomized Trees (ERT) in order to provide a prediction for a destination while for the arrival time, we propose the use of Feed-forward Neural Networks. In our evaluation, we were able to achieve an accuracy of 97% for the port destination classification problem and 90% (in mins) for the ETA prediction.

Via

Access Paper or Ask Questions