Abstract:The problem of recovering a signal $\boldsymbol{x} \in \mathbb{R}^n$ from a quadratic system $\{y_i=\boldsymbol{x}^\top\boldsymbol{A}_i\boldsymbol{x},\ i=1,\ldots,m\}$ with full-rank matrices $\boldsymbol{A}_i$ frequently arises in applications such as unassigned distance geometry and sub-wavelength imaging. With i.i.d. standard Gaussian matrices $\boldsymbol{A}_i$, this paper addresses the high-dimensional case where $m\ll n$ by incorporating prior knowledge of $\boldsymbol{x}$. First, we consider a $k$-sparse $\boldsymbol{x}$ and introduce the thresholded Wirtinger flow (TWF) algorithm that does not require the sparsity level $k$. TWF comprises two steps: the spectral initialization that identifies a point sufficiently close to $\boldsymbol{x}$ (up to a sign flip) when $m=O(k^2\log n)$, and the thresholded gradient descent (with a good initialization) that produces a sequence linearly converging to $\boldsymbol{x}$ with $m=O(k\log n)$ measurements. Second, we explore the generative prior, assuming that $\boldsymbol{x}$ lies in the range of an $L$-Lipschitz continuous generative model with $k$-dimensional inputs in an $\ell_2$-ball of radius $r$. We develop the projected gradient descent (PGD) algorithm that also comprises two steps: the projected power method that provides an initial vector with $O\big(\sqrt{\frac{k \log L}{m}}\big)$ $\ell_2$-error given $m=O(k\log(Lnr))$ measurements, and the projected gradient descent that refines the $\ell_2$-error to $O(\delta)$ at a geometric rate when $m=O(k\log\frac{Lrn}{\delta^2})$. Experimental results corroborate our theoretical findings and show that: (i) our approach for the sparse case notably outperforms the existing provable algorithm sparse power factorization; (ii) leveraging the generative prior allows for precise image recovery in the MNIST dataset from a small number of quadratic measurements.
Abstract:Driver distraction has become a significant cause of severe traffic accidents over the past decade. Despite the growing development of vision-driven driver monitoring systems, the lack of comprehensive perception datasets restricts road safety and traffic security. In this paper, we present an AssIstive Driving pErception dataset (AIDE) that considers context information both inside and outside the vehicle in naturalistic scenarios. AIDE facilitates holistic driver monitoring through three distinctive characteristics, including multi-view settings of driver and scene, multi-modal annotations of face, body, posture, and gesture, and four pragmatic task designs for driving understanding. To thoroughly explore AIDE, we provide experimental benchmarks on three kinds of baseline frameworks via extensive methods. Moreover, two fusion strategies are introduced to give new insights into learning effective multi-stream/modal representations. We also systematically investigate the importance and rationality of the key components in AIDE and benchmarks. The project link is https://github.com/ydk122024/AIDE.
Abstract:Purpose: To improve the quality of quantitative MR images recovered from undersampled measurements, we incorporate the signal model of the variable-flip-angle (VFA) multi-echo 3D gradient-echo (GRE) method into the reconstruction of $T_1$, $T_2^*$ and proton density (PD) maps. Additionally, we investigate the use of complementary undersampling patterns to determine optimal undersampling schemes for quantitative MRI. Theory: We propose a probabilistic Bayesian formulation of the recovery problem. Our proposed approach, approximate message passing with built-in parameter estimation (AMP-PE), enables the joint recovery of distribution parameters, VFA multi-echo images, and $T_1$, $T_2^*$, and PD maps without the need for hyperparameter tuning. Methods: We conducted both retrospective and prospective undersampling to obtain Fourier measurements using variable-density and Poisson-disk patterns. We investigated a variety of undersampling schemes, adopting complementary patterns across different flip angles and/or echo times. Results: AMP-PE adopts a joint recovery strategy, it outperforms the state-of-the-art $l1$-norm minimization approach that follows a decoupled recovery strategy. For $T_1$ mapping, employing fixed sampling patterns across different echo times produced the best performance. Whereas for $T_2^*$ and proton density mappings, using complementary sampling patterns across different flip angles yielded the best performance. Conclusion: AMP-PE achieves better performance by combining information from both the MR signal model and the sparse prior on VFA multi-echo images. It is equipped with automatic and adaptive parameter estimation, and works naturally with the clinical prospective undersampling scheme.
Abstract:Context-Aware Emotion Recognition (CAER) is a crucial and challenging task that aims to perceive the emotional states of the target person with contextual information. Recent approaches invariably focus on designing sophisticated architectures or mechanisms to extract seemingly meaningful representations from subjects and contexts. However, a long-overlooked issue is that a context bias in existing datasets leads to a significantly unbalanced distribution of emotional states among different context scenarios. Concretely, the harmful bias is a confounder that misleads existing models to learn spurious correlations based on conventional likelihood estimation, significantly limiting the models' performance. To tackle the issue, this paper provides a causality-based perspective to disentangle the models from the impact of such bias, and formulate the causalities among variables in the CAER task via a tailored causal graph. Then, we propose a Contextual Causal Intervention Module (CCIM) based on the backdoor adjustment to de-confound the confounder and exploit the true causal effect for model training. CCIM is plug-in and model-agnostic, which improves diverse state-of-the-art approaches by considerable margins. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our CCIM and the significance of causal insight.
Abstract:In wearable sensing applications, data is inevitable to be irregularly sampled or partially missing, which pose challenges for any downstream application. An unique aspect of wearable data is that it is time-series data and each channel can be correlated to another one, such as x, y, z axis of accelerometer. We argue that traditional methods have rarely made use of both times-series dynamics of the data as well as the relatedness of the features from different sensors. We propose a model, termed as DynImp, to handle different time point's missingness with nearest neighbors along feature axis and then feeding the data into a LSTM-based denoising autoencoder which can reconstruct missingness along the time axis. We experiment the model on the extreme missingness scenario ($>50\%$ missing rate) which has not been widely tested in wearable data. Our experiments on activity recognition show that the method can exploit the multi-modality features from related sensors and also learn from history time-series dynamics to reconstruct the data under extreme missingness.
Abstract:Medical events of interest, such as mortality, often happen at a low rate in electronic medical records, as most admitted patients survive. Training models with this imbalance rate (class density discrepancy) may lead to suboptimal prediction. Traditionally this problem is addressed through ad-hoc methods such as resampling or reweighting but performance in many cases is still limited. We propose a framework for training models for this imbalance issue: 1) we first decouple the feature extraction and classification process, adjusting training batches separately for each component to mitigate bias caused by class density discrepancy; 2) we train the network with both a density-aware loss and a learnable cost matrix for misclassifications. We demonstrate our model's improved performance in real-world medical datasets (TOPCAT and MIMIC-III) to show improved AUC-ROC, AUC-PRC, Brier Skill Score compared with the baselines in the domain.
Abstract:Purpose: It has been challenging to recover QSM in the presence of phase errors, which could be caused by the noise or strong local susceptibility shifts in cases of brain hemorrhage and calcification. We propose a Bayesian formulation for QSM where a two-component Gaussian-mixture distribution is used to model the long-tailed noise (error) distribution, and design an approximate message passing (AMP) algorithm with automatic and adaptive parameter estimation. Theory: Wavelet coefficients of the susceptibility map follow the Laplace distribution. The measurement noise follows a two-component Gaussian-mixture distribution where the second Gaussian component models the noise outliers. The distribution parameters are treated as unknown variables and jointly recovered with the susceptibility using AMP. Methods: The proposed AMP with parameter estimation (AMP-PE) is compared with the state-of-the-art nonlinear L1-QSM and MEDI approaches that adopt the L1-norm and L2-norm data-fidelity terms respectively. The three approaches are tested on the Sim2Snr1 data from QSM challenge 2.0, the in vivo data from both healthy and hemorrhage scans. Results: On the simulated Sim2Snr1 dataset, AMP-PE achieved the lowest NRMSE and SSIM, MEDI achieved the lowest HFEN, and each approach also has its own strong suit when it comes to various local evaluation metrics. On the in vivo dataset, AMP-PE is better at preserving structural details and removing streaking artifacts than L1-QSM and MEDI. Conclusion: By leveraging a customized Gaussian-mixture noise prior, AMP-PE achieves better performance on the challenging QSM cases involving hemorrhage and calcification. It is equipped with built-in parameter estimation, which avoids subjective bias from the usual visual fine-tuning step of in vivo reconstruction.
Abstract:Unknown-view tomography (UVT) reconstructs a 3D density map from its 2D projections at unknown, random orientations. A line of work starting with Kam (1980) employs the method of moments (MoM) with rotation-invariant Fourier features to solve UVT in the frequency domain, assuming that the orientations are uniformly distributed. This line of work includes the recent orthogonal matrix retrieval (OMR) approaches based on matrix factorization, which, while elegant, either require side information about the density that is not available, or fail to be sufficiently robust. In order for OMR to break free from those restrictions, we propose to jointly recover the density map and the orthogonal matrices by requiring that they be mutually consistent. We regularize the resulting non-convex optimization problem by a denoised reference projection and a nonnegativity constraint. This is enabled by the new closed-form expressions for spatial autocorrelation features. Further, we design an easy-to-compute initial density map which effectively mitigates the non-convexity of the reconstruction problem. Experimental results show that the proposed OMR with spatial consensus is more robust and performs significantly better than the previous state-of-the-art OMR approach in the typical low-SNR scenario of 3D UVT.
Abstract:Designing efficient sparse recovery algorithms that could handle noisy quantized measurements is important in a variety of applications -- from radar to source localization, spectrum sensing and wireless networking. We take advantage of the approximate message passing (AMP) framework to achieve this goal given its high computational efficiency and state-of-the-art performance. In AMP, the signal of interest is assumed to follow certain prior distribution with unknown parameters. Previous works focused on finding the parameters that maximize the measurement likelihood via expectation maximization -- an increasingly difficult problem to solve in cases involving complicated probability models. In this paper, we treat the parameters as unknown variables and compute their posteriors via AMP. The parameters and signal of interest can then be jointly recovered. Compared to previous methods, the proposed approach leads to a simple and elegant parameter estimation scheme, allowing us to directly work with 1-bit quantization noise model. We then further extend our approach to general multi-bit quantization noise model. Experimental results show that the proposed framework provides significant improvement over state-of-the-art methods across a wide range of sparsity and noise levels.
Abstract:In many machine learning tasks, input features with varying degrees of predictive capability are acquired at varying costs. In order to optimize the performance-cost trade-off, one would select features to observe a priori. However, given the changing context with previous observations, the subset of predictive features to select may change dynamically. Therefore, we face the challenging new problem of foresight dynamic selection (FDS): finding a dynamic and light-weight policy to decide which features to observe next, before actually observing them, for overall performance-cost trade-offs. To tackle FDS, this paper proposes a Bayesian learning framework of Variational Foresight Dynamic Selection (VFDS). VFDS learns a policy that selects the next feature subset to observe, by optimizing a variational Bayesian objective that characterizes the trade-off between model performance and feature cost. At its core is an implicit variational distribution on binary gates that are dependent on previous observations, which will select the next subset of features to observe. We apply VFDS on the Human Activity Recognition (HAR) task where the performance-cost trade-off is critical in its practice. Extensive results demonstrate that VFDS selects different features under changing contexts, notably saving sensory costs while maintaining or improving the HAR accuracy. Moreover, the features that VFDS dynamically select are shown to be interpretable and associated with the different activity types. We will release the code.