Abstract:Bayes' rule describes how to infer posterior beliefs about latent variables given observations, and inference is a critical step in learning algorithms for latent variable models (LVMs). Although there are exact algorithms for inference and learning for certain LVMs such as linear Gaussian models and mixture models, researchers must typically develop approximate inference and learning algorithms when applying novel LVMs. In this paper we study the line that separates LVMs that rely on approximation schemes from those that do not, and develop a general theory of exponential family, latent variable models for which inference and learning may be implemented exactly. Firstly, under mild assumptions about the exponential family form of a given LVM, we derive necessary and sufficient conditions under which the LVM prior is in the same exponential family as its posterior, such that the prior is conjugate to the posterior. We show that all models that satisfy these conditions are constrained forms of a particular class of exponential family graphical model. We then derive general inference and learning algorithms, and demonstrate them on a variety of example models. Finally, we show how to compose our models into graphical models that retain tractable inference and learning. In addition to our theoretical work, we have implemented our algorithms in a collection of libraries with which we provide numerous demonstrations of our theory, and with which researchers may apply our theory in novel statistical settings.
Abstract:Animal vision is thought to optimize various objectives from metabolic efficiency to discrimination performance, yet its ultimate objective is to facilitate the survival of the animal within its ecological niche. However, modeling animal behavior in complex environments has been challenging. To study how environments shape and constrain visual processing, we developed a deep reinforcement learning framework in which an agent moves through a 3-d environment that it perceives through a vision model, where its only goal is to survive. Within this framework we developed a foraging task where the agent must gather food that sustains it, and avoid food that harms it. We first established that the complexity of the vision model required for survival on this task scaled with the variety and visual complexity of the food in the environment. Moreover, we showed that a recurrent network architecture was necessary to fully exploit complex vision models on the most visually demanding tasks. Finally, we showed how different network architectures learned distinct representations of the environment and task, and lead the agent to exhibit distinct behavioural strategies. In summary, this paper lays the foundation for a computational approach to visual ecology, provides extensive benchmarks for future work, and demonstrates how representations and behaviour emerge from an agent's drive for survival.
Abstract:To avoid the curse of dimensionality, a common approach to clustering high-dimensional data is to first project the data into a space of reduced dimension, and then cluster the projected data. Although effective, this two-stage approach prevents joint optimization of the dimensionality-reduction and clustering models, and obscures how well the complete model describes the data. Here, we show how a family of such two-stage models can be combined into a single, hierarchical model that we call a hierarchical mixture of Gaussians (HMoG). An HMoG simultaneously captures both dimensionality-reduction and clustering, and its performance is quantified in closed-form by the likelihood function. By formulating and extending existing models with exponential family theory, we show how to maximize the likelihood of HMoGs with expectation-maximization. We apply HMoGs to synthetic data and RNA sequencing data, and demonstrate how they exceed the limitations of two-stage models. Ultimately, HMoGs are a rigorous generalization of a common statistical framework, and provide researchers with a method to improve model performance when clustering high-dimensional data.
Abstract:Parallel recordings of neural spike counts have revealed the existence of context-dependent noise correlations in neural populations. Theories of population coding have also shown that such correlations can impact the information encoded by neural populations about external stimuli. Although studies have shown that these correlations often have a low-dimensional structure, it has proven difficult to capture this structure in a model that is compatible with theories of rate coding in correlated populations. To address this difficulty we develop a novel model based on conditional finite mixtures of independent Poisson distributions. The model can be conditioned on context variables (e.g. stimuli or task variables), and the number of mixture components in the model can be cross-validated to estimate the dimensionality of the target correlations. We derive an expectation-maximization algorithm to efficiently fit the model to realistic amounts of data from large neural populations. We then demonstrate that the model successfully captures stimulus-dependent correlations in the responses of macaque V1 neurons to oriented gratings. Our model incorporates arbitrary nonlinear context-dependence, and can thus be applied to improve predictions of neural activity based on deep neural networks.
Abstract:In order to interact intelligently with objects in the world, animals must first transform neural population responses into estimates of the dynamic, unknown stimuli which caused them. The Bayesian solution to this problem is known as a Bayes filter, which applies Bayes' rule to combine population responses with the predictions of an internal model. In this paper we present a method for learning to approximate a Bayes filter when the stimulus dynamics are unknown. To do this we use the inferential properties of probabilistic population codes to compute Bayes' rule, and train a neural network to compute approximate predictions by the method of maximum likelihood. In particular, we perform stochastic gradient descent on the negative log-likelihood with a novel approximation of the gradient. We demonstrate our methods on a finite-state, a linear, and a nonlinear filtering problem, and show how the hidden layer of the neural network develops tuning curves which are consistent with findings in experimental neuroscience.
Abstract:The posterior parietal cortex is believed to direct eye movements, especially in regards to target tracking tasks, and a number of debates exist over the precise nature of the computations performed by the parietal cortex, with each side supported by different sets of biological evidence. In this paper I will present my model which navigates a course between some of these debates, towards the end of presenting a model which can explain some of the competing interpretations among the data sets. In particular, rather than assuming that proprioception or efference copies form the key source of information for computing eye position information, I use a biological plausible implementation of a Kalman filter to optimally combine the two signals, and a simple gain control mechanism in order to accommodate the latency of the proprioceptive signal. Fitting within the Bayesian brain hypothesis, the result is a Bayes optimal solution to the eye control problem, with a range of data supporting claims of biological plausibility.