Abstract:The objective of this work is to develop an AI foundation model for physical signals that can generalize across diverse phenomena, domains, applications, and sensing apparatuses. We propose a phenomenological approach and framework for creating and validating such AI foundation models. Based on this framework, we developed and trained a model on 0.59 billion samples of cross-modal sensor measurements, ranging from electrical current to fluid flow to optical sensors. Notably, no prior knowledge of physical laws or inductive biases were introduced into the model. Through several real-world experiments, we demonstrate that a single foundation model could effectively encode and predict physical behaviors, such as mechanical motion and thermodynamics, including phenomena not seen in training. The model also scales across physical processes of varying complexity, from tracking the trajectory of a simple spring-mass system to forecasting large electrical grid dynamics. This work highlights the potential of building a unified AI foundation model for diverse physical world processes.
Abstract:We present a large-scale study exploring the capability of temporal deep neural networks to interpret natural human kinematics and introduce the first method for active biometric authentication with mobile inertial sensors. At Google, we have created a first-of-its-kind dataset of human movements, passively collected by 1500 volunteers using their smartphones daily over several months. We (1) compare several neural architectures for efficient learning of temporal multi-modal data representations, (2) propose an optimized shift-invariant dense convolutional mechanism (DCWRNN), and (3) incorporate the discriminatively-trained dynamic features in a probabilistic generative framework taking into account temporal characteristics. Our results demonstrate that human kinematics convey important information about user identity and can serve as a valuable component of multi-modal authentication systems.
Abstract:In this paper, a part-based technique for real time detection of users' faces on mobile devices is proposed. This method is specifically designed for detecting partially cropped and occluded faces captured using a smartphone's front-facing camera for continuous authentication. The key idea is to detect facial segments in the frame and cluster the results to obtain the region which is most likely to contain a face. Extensive experimentation on a mobile dataset of 50 users shows that our method performs better than many state-of-the-art face detection methods in terms of accuracy and processing speed.