We propose data-dependent test statistics based on a one-dimensional witness function, which we call witness two-sample tests (WiTS tests). We first optimize the witness function by maximizing an asymptotic test-power objective and then use as the test statistic the difference in means of the witness evaluated on two held-out test samples. When the witness function belongs to a reproducing kernel Hilbert space, we show that the optimal witness is given via kernel Fisher discriminant analysis, whose solution we compute in closed form. We show that the WiTS test based on a characteristic kernel is consistent against any fixed alternative. Our experiments demonstrate that the WiTS test can achieve higher test power than existing two-sample tests with optimized kernels, suggesting that learning a high- or infinite-dimensional representation of the data may not be necessary for two-sample testing. The proposed procedure works beyond kernel methods, allowing practitioners to apply it within their preferred machine learning framework.