Abstract:Distributions of unseen data have been all treated as out-of-distribution (OOD), making their generalization a significant challenge. Much evidence suggests that the size increase of training data can monotonically decrease generalization errors in test data. However, this is not true from other observations and analysis. In particular, when the training data have multiple source domains and the test data contain distribution drifts, then not all generalization errors on the test data decrease monotonically with the increasing size of training data. Such a non-decreasing phenomenon is formally investigated under a linear setting with empirical verification across varying visual benchmarks. Motivated by these results, we redefine the OOD data as a type of data outside the convex hull of the training domains and prove a new generalization bound based on this new definition. It implies that the effectiveness of a well-trained model can be guaranteed for the unseen data that is within the convex hull of the training domains. But, for some data beyond the convex hull, a non-decreasing error trend can happen. Therefore, we investigate the performance of popular strategies such as data augmentation and pre-training to overcome this issue. Moreover, we propose a novel reinforcement learning selection algorithm in the source domains only that can deliver superior performance over the baseline methods.