Abstract:We propose a novel unsupervised method to learn the pose and part-segmentation of articulated objects with rigid parts. Given two observations of an object in different articulation states, our method learns the geometry and appearance of object parts by using an implicit model from the first observation, distils the part segmentation and articulation from the second observation while rendering the latter observation. Additionally, to tackle the complexities in the joint optimization of part segmentation and articulation, we propose a voxel grid-based initialization strategy and a decoupled optimization procedure. Compared to the prior unsupervised work, our model obtains significantly better performance, and generalizes to objects with multiple parts while it can be efficiently from few views for the latter observation.
Abstract:Recent developments in large-scale machine learning models for general-purpose understanding, translation and generation of language are driving impact across a variety of sectors including medicine, robotics, and scientific discovery. The strength of such Large Language Models (LLMs) stems from the large corpora that they are trained with. While this imbues them with a breadth of capabilities, they have been found unsuitable for some specific types of problems such as advanced mathematics. In this paper, we highlight the inability of LLMs to reason about physics tasks. We demonstrate that their ability to infer parameters of physical systems can be improved, without retraining, by augmenting their context with feedback from physical simulation.
Abstract:Artistic authoring of 3D environments is a laborious enterprise that also requires skilled content creators. There have been impressive improvements in using machine learning to address different aspects of generating 3D content, such as generating meshes, arranging geometry, synthesizing textures, etc. In this paper we develop a model to generate Bidirectional Reflectance Distribution Functions (BRDFs) from descriptive textual prompts. BRDFs are four dimensional probability distributions that characterize the interaction of light with surface materials. They are either represented parametrically, or by tabulating the probability density associated with every pair of incident and outgoing angles. The former lends itself to artistic editing while the latter is used when measuring the appearance of real materials. Numerous works have focused on hypothesizing BRDF models from images of materials. We learn a mapping from textual descriptions of materials to parametric BRDFs. Our model is first trained using a semi-supervised approach before being tuned via an unsupervised scheme. Although our model is general, in this paper we specifically generate parameters for MDL materials, conditioned on natural language descriptions, within NVIDIA's Omniverse platform. This enables use cases such as real-time text prompts to change materials of objects in 3D environments such as "dull plastic" or "shiny iron". Since the output of our model is a parametric BRDF, rather than an image of the material, it may be used to render materials using any shape under arbitrarily specified viewing and lighting conditions.
Abstract:Simplicial complexes can be viewed as high dimensional generalizations of graphs that explicitly encode multi-way ordered relations between vertices at different resolutions, all at once. This concept is central towards detection of higher dimensional topological features of data, features to which graphs, encoding only pairwise relationships, remain oblivious. While attempts have been made to extend Graph Neural Networks (GNNs) to a simplicial complex setting, the methods do not inherently exploit, or reason about, the underlying topological structure of the network. We propose a graph convolutional model for learning functions parametrized by the $k$-homological features of simplicial complexes. By spectrally manipulating their combinatorial $k$-dimensional Hodge Laplacians, the proposed model enables learning topological features of the underlying simplicial complexes, specifically, the distance of each $k$-simplex from the nearest "optimal" $k$-th homology generator, effectively providing an alternative to homology localization.
Abstract:Proteins perform critical processes in all living systems: converting solar energy into chemical energy, replicating DNA, as the basis of highly performant materials, sensing and much more. While an incredible range of functionality has been sampled in nature, it accounts for a tiny fraction of the possible protein universe. If we could tap into this pool of unexplored protein structures, we could search for novel proteins with useful properties that we could apply to tackle the environmental and medical challenges facing humanity. This is the purpose of protein design. Sequence design is an important aspect of protein design, and many successful methods to do this have been developed. Recently, deep-learning methods that frame it as a classification problem have emerged as a powerful approach. Beyond their reported improvement in performance, their primary advantage over physics-based methods is that the computational burden is shifted from the user to the developers, thereby increasing accessibility to the design method. Despite this trend, the tools for assessment and comparison of such models remain quite generic. The goal of this paper is to both address the timely problem of evaluation and to shine a spotlight, within the Machine Learning community, on specific assessment criteria that will accelerate impact. We present a carefully curated benchmark set of proteins and propose a number of standard tests to assess the performance of deep learning based methods. Our robust benchmark provides biological insight into the behaviour of design methods, which is essential for evaluating their performance and utility. We compare five existing models with two novel models for sequence prediction. Finally, we test the designs produced by these models with AlphaFold2, a state-of-the-art structure-prediction algorithm, to determine if they are likely to fold into the intended 3D shapes.
Abstract:Model-free reinforcement learning (RL) is a powerful tool to learn a broad range of robot skills and policies. However, a lack of policy interpretability can inhibit their successful deployment in downstream applications, particularly when differences in environmental conditions may result in unpredictable behaviour or generalisation failures. As a result, there has been a growing emphasis in machine learning around the inclusion of stronger inductive biases in models to improve generalisation. This paper proposes an alternative strategy, inverse value estimation for interpretable policy certificates (IV-Posterior), which seeks to identify the inductive biases or idealised conditions of operation already held by pre-trained policies, and then use this information to guide their deployment. IV-Posterior uses MaskedAutoregressive Flows to fit distributions over the set of conditions or environmental parameters in which a policy is likely to be effective. This distribution can then be used as a policy certificate in downstream applications. We illustrate the use of IV-Posterior across a two environments, and show that substantial performance gains can be obtained when policy selection incorporates knowledge of the inductive biases that these policies hold.
Abstract:Humans can easily reason about the sequence of high level actions needed to complete tasks, but it is particularly difficult to instil this ability in robots trained from relatively few examples. This work considers the task of neural action sequencing conditioned on a single reference visual state. This task is extremely challenging as it is not only subject to the significant combinatorial complexity that arises from large action sets, but also requires a model that can perform some form of symbol grounding, mapping high dimensional input data to actions, while reasoning about action relationships. Drawing on human cognitive abilities to rearrange objects in scenes to create new configurations, we take a permutation perspective and argue that action sequencing benefits from the ability to reason about both permutations and ordering concepts. Empirical analysis shows that neural models trained with latent permutations outperform standard neural architectures in constrained action sequencing tasks. Results also show that action sequencing using visual permutations is an effective mechanism to initialise and speed up traditional planning techniques and successfully scales to far greater action set sizes than models considered previously.
Abstract:Numerical integration is a computational procedure that is widely encountered across disciplines when reasoning about data. We derive a formula in closed form to calculate the multidimensional integral of functions fw that are representable using a shallow feed-forward neural network with weights w and a sigmoid activation function. We demonstrate its applicability in estimating numerical integration of arbitrary functions f over hyper-rectangular domains in the absence of a prior. To achieve this, we first train the network to learn $fw \approx f$ using point-samples of the integrand. We then use our formula to calculate the exact integral of the learned function fw. Our formula operates on the weights w of the trained approximator network. We show that this formula can itself be expressed as a shallow feed-forward network, which we call a Q-NET, with w as its inputs. Although the Q-NET does not have any learnable parameters, we use this abstraction to derive a family of elegant parametric formulae that represent the marginal distributions of the input function over arbitrary subsets of input dimensions in functional form. We perform empirical evaluations of Q-NETs for integrating smooth functions as well as functions with discontinuities.
Abstract:Photographs of wild animals in their natural habitats can be recorded unobtrusively via cameras that are triggered by motion nearby. The installation of such camera traps is becoming increasingly common across the world. Although this is a convenient source of invaluable data for biologists, ecologists and conservationists, the arduous task of poring through potentially millions of pictures each season introduces prohibitive costs and frustrating delays. We develop automatic algorithms that are able to detect animals, identify the species of animals and to recognize individual animals for two species. we propose the first fully-automatic tool that can recognize specific individuals of leopard and tiger due to their characteristic body markings. We adopt a class of supervised learning approach of machine learning where a Deep Convolutional Neural Network (DCNN) is trained using several instances of manually-labelled images for each of the three classification tasks. We demonstrate the effectiveness of our approach on a data set of camera-trap images recorded in the jungles of Southern India.
Abstract:This paper addresses a common class of problems where a robot learns to perform a discovery task based on example solutions, or human demonstrations. For example consider the problem of ultrasound scanning, where the demonstration requires that an expert adaptively searches for a satisfactory view of internal organs, vessels or tissue and potential anomalies while maintaining optimal contact between the probe and surface tissue. Such problems are currently solved by inferring notional rewards that, when optimised for, result in a plan that mimics demonstrations. A pivotal assumption, that plans with higher reward should be exponentially more likely, leads to the de facto approach for reward inference in robotics. While this approach of maximum entropy inverse reinforcement learning leads to a general and elegant formulation, it struggles to cope with frequently encountered sub-optimal demonstrations. In this paper, we propose an alternative approach to cope with the class of problems where sub-optimal demonstrations occur frequently. We hypothesise that, in tasks which require discovery, successive states of any demonstration are progressively more likely to be associated with a higher reward. We formalise this temporal ranking approach and show that it improves upon maximum-entropy approaches to perform reward inference for autonomous ultrasound scanning, a novel application of learning from demonstration in medical imaging.