Abstract:The algorithms used to train neural networks, like stochastic gradient descent (SGD), have close parallels to natural processes that navigate a high-dimensional parameter space -- for example protein folding or evolution. Our study uses a Fokker-Planck approach, adapted from statistical physics, to explore these parallels in a single, unified framework. We focus in particular on the stationary state of the system in the long-time limit, which in conventional SGD is out of equilibrium, exhibiting persistent currents in the space of network parameters. As in its physical analogues, the current is associated with an entropy production rate for any given training trajectory. The stationary distribution of these rates obeys the integral and detailed fluctuation theorems -- nonequilibrium generalizations of the second law of thermodynamics. We validate these relations in two numerical examples, a nonlinear regression network and MNIST digit classification. While the fluctuation theorems are universal, there are other aspects of the stationary state that are highly sensitive to the training details. Surprisingly, the effective loss landscape and diffusion matrix that determine the shape of the stationary distribution vary depending on the simple choice of minibatching done with or without replacement. We can take advantage of this nonequilibrium sensitivity to engineer an equilibrium stationary state for a particular application: sampling from a posterior distribution of network weights in Bayesian machine learning. We propose a new variation of stochastic gradient Langevin dynamics (SGLD) that harnesses without replacement minibatching. In an example system where the posterior is exactly known, this SGWORLD algorithm outperforms SGLD, converging to the posterior orders of magnitude faster as a function of the learning rate.
Abstract:Causal inference in networks should account for interference, which occurs when a unit's outcome is influenced by treatments or outcomes of peers. There can be heterogeneous peer influence between units when a unit's outcome is subjected to variable influence from different peers based on their attributes and relationships, or when each unit has a different susceptibility to peer influence. Existing solutions to causal inference under interference consider either homogeneous influence from peers or specific heterogeneous influence mechanisms (e.g., based on local neighborhood structure). This paper presents a methodology for estimating individual causal effects in the presence of heterogeneous peer influence due to arbitrary mechanisms. We propose a structural causal model for networks that can capture arbitrary assumptions about network structure, interference conditions, and causal dependence. We identify potential heterogeneous contexts using the causal model and propose a novel graph neural network-based estimator to estimate individual causal effects. We show that existing state-of-the-art methods for individual causal effect estimation produce biased results in the presence of heterogeneous peer influence, and that our proposed estimator is robust.
Abstract:Cannabis legalization has been welcomed by many U.S. states but its role in escalation from tobacco e-cigarette use to cannabis vaping is unclear. Meanwhile, cannabis vaping has been associated with new lung diseases and rising adolescent use. To understand the impact of cannabis legalization on escalation, we design an observational study to estimate the causal effect of recreational cannabis legalization on the development of pro-cannabis attitude for e-cigarette users. We collect and analyze Twitter data which contains opinions about cannabis and JUUL, a very popular e-cigarette brand. We use weakly supervised learning for personal tweet filtering and classification for stance detection. We discover that recreational cannabis legalization policy has an effect on increased development of pro-cannabis attitudes for users already in favor of e-cigarettes.