Abstract:Optics-guided Thermal UAV image Super-Resolution (OTUAV-SR) has attracted significant research interest due to its potential applications in security inspection, agricultural measurement, and object detection. Existing methods often employ single guidance model to generate the guidance features from optical images to assist thermal UAV images super-resolution. However, single guidance models make it difficult to generate effective guidance features under favorable and adverse conditions in UAV scenarios, thus limiting the performance of OTUAV-SR. To address this issue, we propose a novel Guidance Disentanglement network (GDNet), which disentangles the optical image representation according to typical UAV scenario attributes to form guidance features under both favorable and adverse conditions, for robust OTUAV-SR. Moreover, we design an attribute-aware fusion module to combine all attribute-based optical guidance features, which could form a more discriminative representation and fit the attribute-agnostic guidance process. To facilitate OTUAV-SR research in complex UAV scenarios, we introduce VGTSR2.0, a large-scale benchmark dataset containing 3,500 aligned optical-thermal image pairs captured under diverse conditions and scenes. Extensive experiments on VGTSR2.0 demonstrate that GDNet significantly improves OTUAV-SR performance over state-of-the-art methods, especially in the challenging low-light and foggy environments commonly encountered in UAV scenarios. The dataset and code will be publicly available at https://github.com/Jocelyney/GDNet.
Abstract:Table-based Question Answering (TQA) involves answering questions based on tabular data. The complexity of table structures and question logic makes this task difficult even for Large Language Models (LLMs). This paper improves TQA performance by leveraging LLMs' reasoning capabilities. Inspired by how humans solve TQA tasks, we propose a Seek-and-Solve pipeline that instructs the LLM to first seek relevant information and then answer questions. The two stages are integrated at the reasoning level, and their Chain of Thought (CoT) paths are integrated into a coherent Seek-and-Solve CoT (SS-CoT). Furthermore, we present a compact single-stage TQA-solving prompt distilled from the pipeline. Experiments demonstrate that under In-Context Learning settings, using samples with SS-CoT paths as demonstrations, the TQA-solving prompt can effectively guide the LLM to solve complex TQA tasks, resulting in improved performance and reliability. Our results highlight the importance of properly eliciting LLMs' reasoning capabilities in solving complex TQA tasks.
Abstract:This paper investigates the capabilities and effectiveness of backward sensing centered on reconfigurable intelligent surfaces (RISs). We demonstrate that the direction of arrival (DoA) estimation of incident waves in the far-field regime can be accomplished using a single RIS by leveraging configurational diversity. Furthermore, we identify that the spatial diversity achieved through deploying multiple RISs enables accurate localization of multiple power sources. Physically accurate and mathematically concise models are introduced to characterize forward signal aggregations via RISs. By employing linearized approximations inherent in the far-field region, the measurement process for various configurations can be expressed as a system of linear equations. The mathematical essence of backward sensing lies in solving this system. A theoretical framework for determining key performance indicators is established through condition number analysis of the sensing operators. In the context of localization using multiple RISs, we examine relationships among the rank of sensing operators, the size of the region of interest (RoI), and the number of elements and measurements. For DoA estimations, we provide an upper bound for the relative error of the least squares reconstruction algorithm. These quantitative analyses offer essential insights for system design and optimization. Numerical experiments validate our findings. To demonstrate the practicality of our proposed RIS-centric sensing approach, we develop a proof-of-concept prototype using universal software radio peripherals (USRP) and employ a magnitude-only reconstruction algorithm tailored for this system. To our knowledge, this represents the first trial of its kind.
Abstract:Reconfigurable intelligent surfaces (RISs) have emerged as a promising auxiliary technology for radio frequency imaging. However, existing works face challenges of faint and intricate back-scattered waves and the restricted field-of-view (FoV), both resulting from complex target structures and a limited number of antennas. The synergistic benefits of multi-RIS-aided imaging hold promise for addressing these challenges. Here, we propose a dual-RIS-aided imaging system, Dreamer, which operates collaboratively in complementary modes (reflection-mode and transmission-mode). Dreamer significantly expands the FoV and enhances perception by deploying dual-RIS across various spatial and measurement patterns. Specifically, we perform a fine-grained analysis of how radio-frequency (RF) signals encode scene information in the scattered object modeling. Based on this modeling, we design illumination strategies to balance spatial resolution and observation scale, and implement a prototype system in a typical indoor environment. Moreover, we design a novel artificial neural network with a CNN-external-attention mechanism to translate RF signals into high-resolution images of human contours. Our approach achieves an impressive SSIM score exceeding 0.83, validating its effectiveness in broadening perception modes and enhancing imaging capabilities. The code to reproduce our results is available at https://github.com/fuhaiwang/Dreamer.
Abstract:In contextual optimization, a decision-maker observes historical samples of uncertain variables and associated concurrent covariates, without knowing their joint distribution. Given an additional covariate observation, the goal is to choose a decision that minimizes some operational costs. A prevalent issue here is covariate shift, where the marginal distribution of the new covariate differs from historical samples, leading to decision performance variations with nonparametric or parametric estimators. To address this, we propose a distributionally robust approach that uses an ambiguity set by the intersection of two Wasserstein balls, each centered on typical nonparametric or parametric distribution estimators. Computationally, we establish the tractable reformulation of this distributionally robust optimization problem. Statistically, we provide guarantees for our Wasserstein ball intersection approach under covariate shift by analyzing the measure concentration of the estimators. Furthermore, to reduce computational complexity, we employ a surrogate objective that maintains similar generalization guarantees. Through synthetic and empirical case studies on income prediction and portfolio optimization, we demonstrate the strong empirical performance of our proposed models.
Abstract:In this paper, we propose a multi-RIS-aided wireless imaging framework in 3D facing the distributed placement of multi-sensor networks. The system creates a randomized reflection pattern by adjusting the RIS phase shift, enabling the receiver to capture signals within the designated space of interest (SoI). Firstly, a multi-RIS-aided linear imaging channel modeling is proposed. We introduce a theoretical framework of computational imaging to recover the signal strength distribution of the SOI. For the RIS-aided imaging system, the impact of multiple parameters on the performance of the imaging system is analyzed. The simulation results verify the correctness of the proposal. Furthermore, we propose an amplitude-only imaging algorithm for the RIS-aided imaging system to mitigate the problem of phase unpredictability. Finally, the performance verification of the imaging algorithm is carried out by proof of concept experiments under reasonable parameter settings.
Abstract:In real Mobility-on-Demand (MoD) systems, demand is subject to high and dynamic volatility, which is difficult to predict by conventional time-series forecasting approaches. Most existing forecasting approaches yield the point value as the prediction result, which ignores the uncertainty that exists in the forecasting result. This will lead to the forecasting result severely deviating from the true demand value due to the high volatility existing in demand. To fill the gap, we propose an extended recurrent mixture density network (XRMDN), which extends the weight and mean neural networks to recurrent neural networks. The recurrent neurons for mean and variance can capture the trend of the historical data-series data, which enables a better forecasting result in dynamic and high volatility. We conduct comprehensive experiments on one taxi trip record and one bike-sharing real MoD data set to validate the performance of XRMDN. Specifically, we compare our model to three types of benchmark models, including statistical, machine learning, and deep learning models on three evaluation metrics. The validation results show that XRMDN outperforms the three groups of benchmark models in terms of the evaluation metrics. Most importantly, XRMDN substantially improves the forecasting accuracy with the demands in strong volatility. Last but not least, this probabilistic demand forecasting model contributes not only to the demand prediction in MoD systems but also to other optimization application problems, especially optimization under uncertainty, in MoD applications.
Abstract:We present a method for pretraining a recurrent mixture density network (RMDN). We also propose a slight modification to the architecture of the RMDN-GARCH proposed by Nikolaev et al. [2012]. The pretraining method helps the RMDN avoid bad local minima during training and improves its robustness to the persistent NaN problem, as defined by Guillaumes [2017], which is often encountered with mixture density networks. Such problem consists in frequently obtaining "Not a number" (NaN) values during training. The pretraining method proposed resolves these issues by training the linear nodes in the hidden layer of the RMDN before starting including non-linear node updates. Such an approach improves the performance of the RMDN and ensures it surpasses that of the GARCH model, which is the RMDN's linear counterpart.
Abstract:The lip movements information is critical for many audio-visual tasks. However, extracting lip movements information from videos is challenging, as it can be easily perturbed by factors like personal identities and head poses. This paper proposes utilizing the parametric 3D face model to disentangle lip movements information explicitly. Building on top of the recent 3D face reconstruction advances, we firstly offer a method that can consistently disentangle expression information, where the lip movements information lies. Then we demonstrate that once the influences of perturbing factors are alleviated by synthesizing faces with the disentangled lip movements information, the lip-sync task can be done better with much fewer data. Finally, we show its effectiveness in the wild by testing it on an unseen dataset for the active speaker detection task and achieving competitive performance.
Abstract:Momentive offers solutions in market research, customer experience, and enterprise feedback. The technology is gleaned from the billions of real responses to questions asked on the platform. However, people may create biased questions. A double-barreled question (DBQ) is a common type of biased question that asks two aspects in one question. For example, "Do you agree with the statement: The food is yummy, and the service is great.". This DBQ confuses survey respondents because there are two parts in a question. DBQs impact both the survey respondents and the survey owners. Momentive aims to detect DBQs and recommend survey creators to make a change towards gathering high quality unbiased survey data. Previous research work has suggested detecting DBQs by checking the existence of grammatical conjunction. While this is a simple rule-based approach, this method is error-prone because conjunctions can also exist in properly constructed questions. We present an end-to-end machine learning approach for DBQ classification in this work. We handled this imbalanced data using active learning, and compared state-of-the-art embedding algorithms to transform text data into vectors. Furthermore, we proposed a model interpretation technique propagating the vector-level SHAP values to a SHAP value for each word in the questions. We concluded that the word2vec subword embedding with maximum pooling is the optimal word embedding representation in terms of precision and running time in the offline experiments using the survey data at Momentive. The A/B test and production metrics indicate that this model brings a positive change to the business. To the best of our knowledge, this is the first machine learning framework for DBQ detection, and it successfully differentiates Momentive from the competitors. We hope our work sheds light on machine learning approaches for bias question detection.