Abstract:Drought is one of the most destructive and expensive natural disasters, severely impacting natural resources and risks by depleting water resources and diminishing agricultural yields. Under climate change, accurately predicting drought is critical for mitigating drought-induced risks. However, the intricate interplay among the physical and biological drivers that regulate droughts limits the predictability and understanding of drought, particularly at a subseasonal to seasonal (S2S) time scale. While deep learning has been demonstrated with potential in addressing climate forecasting challenges, its application to drought prediction has received relatively less attention. In this work, we propose a new dataset, DroughtSet, which integrates relevant predictive features and three drought indices from multiple remote sensing and reanalysis datasets across the contiguous United States (CONUS). DroughtSet specifically provides the machine learning community with a new real-world dataset to benchmark drought prediction models and more generally, time-series forecasting methods. Furthermore, we propose a spatial-temporal model SPDrought to predict and interpret S2S droughts. Our model learns from the spatial and temporal information of physical and biological features to predict three types of droughts simultaneously. Multiple strategies are employed to quantify the importance of physical and biological features for drought prediction. Our results provide insights for researchers to better understand the predictability and sensitivity of drought to biological and physical conditions. We aim to contribute to the climate field by proposing a new tool to predict and understand the occurrence of droughts and provide the AI community with a new benchmark to study deep learning applications in climate science.
Abstract:As machine learning (ML) algorithms are used in applications that involve humans, concerns have arisen that these algorithms may be biased against certain social groups. \textit{Counterfactual fairness} (CF) is a fairness notion proposed in Kusner et al. (2017) that measures the unfairness of ML predictions; it requires that the prediction perceived by an individual in the real world has the same marginal distribution as it would be in a counterfactual world, in which the individual belongs to a different group. Although CF ensures fair ML predictions, it fails to consider the downstream effects of ML predictions on individuals. Since humans are strategic and often adapt their behaviors in response to the ML system, predictions that satisfy CF may not lead to a fair future outcome for the individuals. In this paper, we introduce \textit{lookahead counterfactual fairness} (LCF), a fairness notion accounting for the downstream effects of ML models which requires the individual \textit{future status} to be counterfactually fair. We theoretically identify conditions under which LCF can be satisfied and propose an algorithm based on the theorems. We also extend the concept to path-dependent fairness. Experiments on both synthetic and real data validate the proposed method.
Abstract:Performative prediction (PP) is a framework that captures distribution shifts that occur during the training of machine learning models due to their deployment. As the trained model is used, its generated data could cause the model to evolve, leading to deviations from the original data distribution. The impact of such model-induced distribution shifts in the federated learning (FL) setup remains unexplored despite being increasingly likely to transpire in real-life use cases. Although Jin et al. (2024) recently extended PP to FL in a straightforward manner, the resulting model only converges to a performative stable point, which may be far from optimal. The methods in Izzo et al. (2021); Miller et al. (2021) can find a performative optimal point in centralized settings, but they require the performative risk to be convex and the training data to be noiseless, assumptions often violated in realistic FL systems. This paper overcomes all of these shortcomings and proposes Performative robust optimal Federated Learning (ProFL), an algorithm that finds performative optimal points in FL from noisy and contaminated data. We present the convergence analysis under the Polyak-Lojasiewicz condition, which applies to non-convex objectives. Extensive experiments on multiple datasets validate our proposed algorithms' efficiency.
Abstract:This paper studies algorithmic decision-making under human's strategic behavior, where a decision maker uses an algorithm to make decisions about human agents, and the latter with information about the algorithm may exert effort strategically and improve to receive favorable decisions. Unlike prior works that assume agents benefit from their efforts immediately, we consider realistic scenarios where the impacts of these efforts are persistent and agents benefit from efforts by making improvements gradually. We first develop a dynamic model to characterize persistent improvements and based on this construct a Stackelberg game to model the interplay between agents and the decision-maker. We analytically characterize the equilibrium strategies and identify conditions under which agents have incentives to improve. With the dynamics, we then study how the decision-maker can design an optimal policy to incentivize the largest improvements inside the agent population. We also extend the model to settings where 1) agents may be dishonest and game the algorithm into making favorable but erroneous decisions; 2) honest efforts are forgettable and not sufficient to guarantee persistent improvements. With the extended models, we further examine conditions under which agents prefer honest efforts over dishonest behavior and the impacts of forgettable efforts.
Abstract:Enforcing orthonormal or isometric property for the weight matrices has been shown to enhance the training of deep neural networks by mitigating gradient exploding/vanishing and increasing the robustness of the learned networks. However, despite its practical performance, the theoretical analysis of orthonormality in neural networks is still lacking; for example, how orthonormality affects the convergence of the training process. In this letter, we aim to bridge this gap by providing convergence analysis for training orthonormal deep linear neural networks. Specifically, we show that Riemannian gradient descent with an appropriate initialization converges at a linear rate for training orthonormal deep linear neural networks with a class of loss functions. Unlike existing works that enforce orthonormal weight matrices for all the layers, our approach excludes this requirement for one layer, which is crucial to establish the convergence guarantee. Our results shed light on how increasing the number of hidden layers can impact the convergence speed. Experimental results validate our theoretical analysis.
Abstract:Semi-supervised Learning (SSL) has been proven vulnerable to out-of-distribution (OOD) samples in realistic large-scale unsupervised datasets due to over-confident pseudo-labeling OODs as in-distribution (ID). A key underlying problem is class-wise latent space spreading from closed seen space to open unseen space, and the bias is further magnified in SSL's self-training loops. To close the ID distribution set so that OODs are better rejected for safe SSL, we propose Prototype Fission(PF) to divide class-wise latent spaces into compact sub-spaces by automatic fine-grained latent space mining, driven by coarse-grained labels only. Specifically, we form multiple unique learnable sub-class prototypes for each class, optimized towards both diversity and consistency. The Diversity Modeling term encourages samples to be clustered by one of the multiple sub-class prototypes, while the Consistency Modeling term clusters all samples of the same class to a global prototype. Instead of "opening set", i.e., modeling OOD distribution, Prototype Fission "closes set" and makes it hard for OOD samples to fit in sub-class latent space. Therefore, PF is compatible with existing methods for further performance gains. Extensive experiments validate the effectiveness of our method in open-set SSL settings in terms of successfully forming sub-classes, discriminating OODs from IDs and improving overall accuracy. Codes will be released.
Abstract:Detecting critical nodes in sparse networks is important in a variety of application domains. A Critical Node Problem (CNP) aims to find a set of critical nodes from a network whose deletion maximally degrades the pairwise connectivity of the residual network. Due to its general NP-hard nature, state-of-the-art CNP solutions are based on heuristic approaches. Domain knowledge and trial-and-error are usually required when designing such approaches, thus consuming considerable effort and time. This work proposes a feature importance-aware graph attention network for node representation and combines it with dueling double deep Q-network to create an end-to-end algorithm to solve CNP for the first time. It does not need any problem-specific knowledge or labeled datasets as required by most of existing methods. Once the model is trained, it can be generalized to cope with various types of CNPs (with different sizes and topological structures) without re-training. Extensive experiments on 28 real-world networks show that the proposed method is highly comparable to state-of-the-art methods. It does not require any problem-specific knowledge and, hence, can be applicable to many applications including those impossible ones by using the existing approaches. It can be combined with some local search methods to further improve its solution quality. Extensive comparison results are given to show its effectiveness in solving CNP.