Sony Computer Science Laboratory Inc.
Abstract:This work introduces SMARTe-VR, a platform for student monitoring in an immersive virtual reality environment designed for online education. SMARTe-VR is aimed to gather data for adaptive learning, focusing on facial biometrics and learning metadata. The platform allows instructors to create tailored learning sessions with video lectures, featuring an interface with an Auto QA system to evaluate understanding, interaction tools (e.g., textbook highlighting and lecture tagging), and real-time feedback. Additionally, we release a dataset containing 5 research challenges with data from 10 users in VR-based TOEIC sessions. This dataset, spanning over 25 hours, includes facial features, learning metadata, 450 responses, question difficulty levels, concept tags, and understanding labels. Alongside the database, we present preliminary experiments using Item Response Theory models, adapted for understanding detection using facial features. Two architectures were explored: a Temporal Convolutional Network for local features and a Multilayer Perceptron for global features.
Abstract:How can we edit or transform the geometric or color property of a point cloud? In this study, we propose a neural style transfer method for point clouds which allows us to transfer the style of geometry or color from one point cloud either independently or simultaneously to another. This transfer is achieved by manipulating the content representations and Gram-based style representations extracted from a pre-trained PointNet-based classification network for colored point clouds. As Gram-based style representation is invariant to the number or the order of points, the same method can be extended to transfer the style extracted from an image to the color expression of a point cloud by merely treating the image as a set of pixels. Experimental results demonstrate the capability of the proposed method for transferring style from either an image or a point cloud to another point cloud of a single object or even an indoor scene.
Abstract:This paper introduces DensePoint, a densely sampled and annotated point cloud dataset containing over 10,000 single objects across 16 categories, by merging different kind of information from two existing datasets. Each point cloud in DensePoint contains 40,000 points, and each point is associated with two sorts of information: RGB value and part annotation. In addition, we propose a method for point cloud colorization by utilizing Generative Adversarial Networks (GANs). The network makes it possible to generate colours for point clouds of single objects by only giving the point cloud itself. Experiments on DensePoint show that there exist clear boundaries in point clouds between different parts of an object, suggesting that the proposed network is able to generate reasonably good colours. Our dataset is publicly available on the project page.
Abstract:Augmented reality is a research area that tries to embody an electronic information space within the real world, through computational devices. A crucial issue within this area, is the recognition of real world objects or situations. In natural language processing, it is much easier to determine interpretations of utterances, even if they are ill-formed, when the context or situation is fixed. We therefore introduce robust, natural language processing into a system of augmented reality with situation awareness. Based on this idea, we have developed a portable system, called the Ubiquitous Talker. This consists of an LCD display that reflects the scene at which a user is looking as if it is a transparent glass, a CCD camera for recognizing real world objects with color-bar ID codes, a microphone for recognizing a human voice and a speaker which outputs a synthesized voice. The Ubiquitous Talker provides its user with some information related to a recognized object, by using the display and voice. It also accepts requests or questions as voice inputs. The user feels as if he/she is talking with the object itself through the system.
Abstract:Human face-to-face conversation is an ideal model for human-computer dialogue. One of the major features of face-to-face communication is its multiplicity of communication channels that act on multiple modalities. To realize a natural multimodal dialogue, it is necessary to study how humans perceive information and determine the information to which humans are sensitive. A face is an independent communication channel that conveys emotional and conversational signals, encoded as facial expressions. We have developed an experimental system that integrates speech dialogue and facial animation, to investigate the effect of introducing communicative facial expressions as a new modality in human-computer conversation. Our experiments have shown that facial expressions are helpful, especially upon first contact with the system. We have also discovered that featuring facial expressions at an early stage improves subsequent interaction.