Abstract:The goal of domain generalization is to learn from multiple source domains to generalize to unseen target domains under distribution discrepancy. Current state-of-the-art methods in this area are fully supervised, but for many real-world problems it is hardly possible to obtain enough labeled samples. In this paper, we propose the first method of domain generalization to leverage unlabeled samples, combining of meta learning's episodic training and semi-supervised learning, called DGSML. DGSML employs an entropy-based pseudo-labeling approach to assign labels to unlabeled samples and then utilizes a novel discrepancy loss to ensure that class centroids before and after labeling unlabeled samples are close to each other. To learn a domain-invariant representation, it also utilizes a novel alignment loss to ensure that the distance between pairs of class centroids, computed after adding the unlabeled samples, is preserved across different domains. DGSML is trained by a meta learning approach to mimic the distribution shift between the input source domains and unseen target domains. Experimental results on benchmark datasets indicate that DGSML outperforms state-of-the-art domain generalization and semi-supervised learning methods.
Abstract:Event sequences can be modeled by temporal point processes (TPPs) to capture their asynchronous and probabilistic nature. We propose an intensity-free framework that directly models the point process distribution by utilizing normalizing flows. This approach is capable of capturing highly complex temporal distributions and does not rely on restrictive parametric forms. Comparisons with state-of-the-art baseline models on both synthetic and challenging real-life datasets show that the proposed framework is effective at modeling the stochasticity of discrete event sequences.
Abstract:We propose a novel probabilistic generative model for action sequences. The model is termed the Action Point Process VAE (APP-VAE), a variational auto-encoder that can capture the distribution over the times and categories of action sequences. Modeling the variety of possible action sequences is a challenge, which we show can be addressed via the APP-VAE's use of latent representations and non-linear functions to parameterize distributions over which event is likely to occur next in a sequence and at what time. We empirically validate the efficacy of APP-VAE for modeling action sequences on the MultiTHUMOS and Breakfast datasets.
Abstract:Deep ConvNets have shown great performance for single-label image classification (e.g. ImageNet), but it is necessary to move beyond the single-label classification task because pictures of everyday life are inherently multi-label. Multi-label classification is a more difficult task than single-label classification because both the input images and output label spaces are more complex. Furthermore, collecting clean multi-label annotations is more difficult to scale-up than single-label annotations. To reduce the annotation cost, we propose to train a model with partial labels i.e. only some labels are known per image. We first empirically compare different labeling strategies to show the potential for using partial labels on multi-label datasets. Then to learn with partial labels, we introduce a new classification loss that exploits the proportion of known labels per example. Our approach allows the use of the same training settings as when learning with all the annotations. We further explore several curriculum learning based strategies to predict missing labels. Experiments are performed on three large-scale multi-label datasets: MS COCO, NUS-WIDE and Open Images.
Abstract:Activity analysis in which multiple people interact across a large space is challenging due to the interplay of individual actions and collective group dynamics. We propose an end-to-end approach for learning person trajectory representations for group activity analysis. The learned representations encode rich spatio-temporal dependencies and capture useful motion patterns for recognizing individual events, as well as characteristic group dynamics that can be used to identify groups from their trajectories alone. We develop our deep learning approach in the context of team sports, which provide well-defined sets of events (e.g. pass, shot) and groups of people (teams). Analysis of events and team formations using NHL hockey and NBA basketball datasets demonstrate the generality of our approach.