Abstract:Collaborative learning is an educational approach that enhances learning through shared goals and working together. Interaction and regulation are two essential factors related to the success of collaborative learning. Since the information from various modalities can reflect the quality of collaboration, a new multimodal dataset with cognitive and emotional triggers is introduced in this paper to explore how regulations affect interactions during the collaborative process. Specifically, a learning task with intentional interventions is designed and assigned to high school students aged 15 years old (N=81) in average. Multimodal signals, including video, Kinect, audio, and physiological data, are collected and exploited to study regulations in collaborative learning in terms of individual-participant-single-modality, individual-participant-multiple-modality, and multiple-participant-multiple-modality. Analysis of annotated emotions, body gestures, and their interactions indicates that our multimodal dataset with designed treatments could effectively examine moments of regulation in collaborative learning. In addition, preliminary experiments based on baseline models suggest that the dataset provides a challenging in-the-wild scenario, which could further contribute to the fields of education and affective computing.
Abstract:With the strength of deep generative models, 3D pose transfer regains intensive research interests in recent years. Existing methods mainly rely on a variety of constraints to achieve the pose transfer over 3D meshes, e.g., the need for manually encoding for shape and pose disentanglement. In this paper, we present an unsupervised approach to conduct the pose transfer between any arbitrate given 3D meshes. Specifically, a novel Intrinsic-Extrinsic Preserved Generative Adversarial Network (IEP-GAN) is presented for both intrinsic (i.e., shape) and extrinsic (i.e., pose) information preservation. Extrinsically, we propose a co-occurrence discriminator to capture the structural/pose invariance from distinct Laplacians of the mesh. Meanwhile, intrinsically, a local intrinsic-preserved loss is introduced to preserve the geodesic priors while avoiding heavy computations. At last, we show the possibility of using IEP-GAN to manipulate 3D human meshes in various ways, including pose transfer, identity swapping and pose interpolation with latent code vector arithmetic. The extensive experiments on various 3D datasets of humans, animals and hands qualitatively and quantitatively demonstrate the generality of our approach. Our proposed model produces better results and is substantially more efficient compared to recent state-of-the-art methods. Code is available: https://github.com/mikecheninoulu/Unsupervised_IEPGAN
Abstract:We introduce a new dataset for the emotional artificial intelligence research: identity-free video dataset for Micro-Gesture Understanding and Emotion analysis (iMiGUE). Different from existing public datasets, iMiGUE focuses on nonverbal body gestures without using any identity information, while the predominant researches of emotion analysis concern sensitive biometric data, like face and speech. Most importantly, iMiGUE focuses on micro-gestures, i.e., unintentional behaviors driven by inner feelings, which are different from ordinary scope of gestures from other gesture datasets which are mostly intentionally performed for illustrative purposes. Furthermore, iMiGUE is designed to evaluate the ability of models to analyze the emotional states by integrating information of recognized micro-gesture, rather than just recognizing prototypes in the sequences separately (or isolatedly). This is because the real need for emotion AI is to understand the emotional states behind gestures in a holistic way. Moreover, to counter for the challenge of imbalanced sample distribution of this dataset, an unsupervised learning method is proposed to capture latent representations from the micro-gesture sequences themselves. We systematically investigate representative methods on this dataset, and comprehensive experimental results reveal several interesting insights from the iMiGUE, e.g., micro-gesture-based analysis can promote emotion understanding. We confirm that the new iMiGUE dataset could advance studies of micro-gesture and emotion AI.
Abstract:Recently, there has been a raising surge of momentum for deep representation learning in hyperbolic spaces due to theirhigh capacity of modeling data like knowledge graphs or synonym hierarchies, possessing hierarchical structure. We refer it ashyperbolic deep neural network in this paper. Such a hyperbolic neural architecture potentially leads to drastically compact models withmuch more physical interpretability than its counterpart in Euclidean space. To stimulate future research, this paper presents acoherent and comprehensive review of the literature around the neural components in the construction of hyperbolic deep neuralnetworks, as well as the generalization of the leading deep approaches to the Hyperbolic space. It also presents current applicationsaround various machine learning tasks on several publicly available datasets, together with insightful observations and identifying openquestions and promising future directions.
Abstract:Sparse coding has achieved a great success in various image processing studies. However, there is not any benchmark to measure the sparsity of image patch/group because sparse discriminant conditions cannot keep unchanged. This paper analyzes the sparsity of group based on the strategy of the rank minimization. Firstly, an adaptive dictionary for each group is designed. Then, we prove that group-based sparse coding is equivalent to the rank minimization problem, and thus the sparse coefficient of each group is measured by estimating the singular values of each group. Based on that measurement, the weighted Schatten $p$-norm minimization (WSNM) has been found to be the closest solution to the real singular values of each group. Thus, WSNM can be equivalently transformed into a non-convex $\ell_p$-norm minimization problem in group-based sparse coding. To make the proposed scheme tractable and robust, the alternating direction method of multipliers (ADMM) is used to solve the $\ell_p$-norm minimization problem. Experimental results on two applications: image inpainting and image compressive sensing (CS) recovery have shown that the proposed scheme outperforms many state-of-the-art methods.