Abstract:Random forests and, more generally, (decision\nobreakdash-)tree ensembles are widely used methods for classification and regression. Recent algorithmic advances allow to compute decision trees that are optimal for various measures such as their size or depth. We are not aware of such research for tree ensembles and aim to contribute to this area. Mainly, we provide two novel algorithms and corresponding lower bounds. First, we are able to carry over and substantially improve on tractability results for decision trees, obtaining a $(6\delta D S)^S \cdot poly$-time algorithm, where $S$ is the number of cuts in the tree ensemble, $D$ the largest domain size, and $\delta$ is the largest number of features in which two examples differ. To achieve this, we introduce the witness-tree technique which also seems promising for practice. Second, we show that dynamic programming, which has been successful for decision trees, may also be viable for tree ensembles, providing an $\ell^n \cdot poly$-time algorithm, where $\ell$ is the number of trees and $n$ the number of examples. Finally, we compare the number of cuts necessary to classify training data sets for decision trees and tree ensembles, showing that ensembles may need exponentially fewer cuts for increasing number of trees.
Abstract:We study the network untangling problem introduced by Rozenshtein, Tatti, and Gionis [DMKD 2021], which is a variant of Vertex Cover on temporal graphs -- graphs whose edge set changes over discrete time steps. They introduce two problem variants. The goal is to select at most $k$ time intervals for each vertex such that all time-edges are covered and (depending on the problem variant) either the maximum interval length or the total sum of interval lengths is minimized. This problem has data mining applications in finding activity timelines that explain the interactions of entities in complex networks. Both variants of the problem are NP-hard. In this paper, we initiate a multivariate complexity analysis involving the following parameters: number of vertices, lifetime of the temporal graph, number of intervals per vertex, and the interval length bound. For both problem versions, we (almost) completely settle the parameterized complexity for all combinations of those four parameters, thereby delineating the border of fixed-parameter tractability.