Abstract:Augmented Reality assistance are increasingly popular for supporting users with tasks like assembly and cooking. However, current practice typically provide reactive responses initialized from user requests, lacking consideration of rich contextual and user-specific information. To address this limitation, we propose a novel AR assistance system, Satori, that models both user states and environmental contexts to deliver proactive guidance. Our system combines the Belief-Desire-Intention (BDI) model with a state-of-the-art multi-modal large language model (LLM) to infer contextually appropriate guidance. The design is informed by two formative studies involving twelve experts. A sixteen within-subject study find that Satori achieves performance comparable to an designer-created Wizard-of-Oz (WoZ) system without relying on manual configurations or heuristics, thereby enhancing generalizability, reusability and opening up new possibilities for AR assistance.
Abstract:Text presented in augmented reality provides in-situ, real-time information for users. However, this content can be challenging to apprehend quickly when engaging in cognitively demanding AR tasks, especially when it is presented on a head-mounted display. We propose ARTiST, an automatic text simplification system that uses a few-shot prompt and GPT-3 models to specifically optimize the text length and semantic content for augmented reality. Developed out of a formative study that included seven users and three experts, our system combines a customized error calibration model with a few-shot prompt to integrate the syntactic, lexical, elaborative, and content simplification techniques, and generate simplified AR text for head-worn displays. Results from a 16-user empirical study showed that ARTiST lightens the cognitive load and improves performance significantly over both unmodified text and text modified via traditional methods. Our work constitutes a step towards automating the optimization of batch text data for readability and performance in augmented reality.
Abstract:Deep learning (DL) based channel estimation (CE) and multiple input and multiple output detection (MIMODet), as two separate research topics, have provided convinced evidence to demonstrate the effectiveness and robustness of artificial intelligence (AI) for receiver design. However, problem remains on how to unify the CE and MIMODet by optimizing AI's structure to achieve near optimal detection performance such as widely considered QR with M-algorithm (QRM) that can perform close to the maximum likelihood (ML) detector. In this paper, we propose an AI receiver that connects CE and MIMODet as an unified architecture. As a merit, CE and MIMODet only adopt structural input features and conventional neural networks (NN) to perform end-to-end (E2E) training offline. Numerical results show that, by adopting a simple super-resolution based convolutional neural network (SRCNN) as channel estimator and domain knowledge enhanced graphical neural network (GNN) as detector, the proposed QRM enhanced GNN receiver (QRMNet) achieves comparable block error rate (BLER) performance to near-optimal baseline detectors.
Abstract:In this paper, we consider the interference rejection combining (IRC) receiver, which improves the cell-edge user throughput via suppressing inter-cell interference and requires estimating the covariance matrix including the inter-cell interference with high accuracy. In order to solve the problem of sample covariance matrix estimation with limited samples, a regularization parameter optimization based on the minimum eigenvalue criterion is developed. It is different from traditional methods that aim at minimizing the mean squared error, but goes straight at the objective of optimizing the final performance of the IRC receiver. A lower bound of the minimum eigenvalue that is easier to calculate is also derived. Simulation results demonstrate that the proposed approach is effective and can approach the performance of the oracle estimator in terms of the mutual information metric.
Abstract:To reduce the toxic degeneration in a pretrained Language Model (LM), previous work on Language Model detoxification has focused on reducing the toxicity of the generation itself (self-toxicity) without consideration of the context. As a result, a type of implicit offensive language where the generations support the offensive language in the context is ignored. Different from the LM controlling tasks in previous work, where the desired attributes are fixed for generation, the desired stance of the generation depends on the offensiveness of the context. Therefore, we propose a novel control method to do context-dependent detoxification with the stance taken into consideration. We introduce meta prefixes to learn the contextualized stance control strategy and to generate the stance control prefix according to the input context. The generated stance prefix is then combined with the toxicity control prefix to guide the response generation. Experimental results show that our proposed method can effectively learn the context-dependent stance control strategies while keeping a low self-toxicity of the underlying LM.
Abstract:Integrating free-text explanations to in-context learning of large language models (LLM) is shown to elicit strong reasoning capabilities along with reasonable explanations. In this paper, we consider the problem of leveraging the explanations generated by LLM to improve the training of small reasoners, which are more favorable in real-production deployment due to their low cost. We systematically explore three explanation generation approaches from LLM and utilize a multi-task learning framework to facilitate small models to acquire strong reasoning power together with explanation generation capabilities. Experiments on multiple reasoning tasks show that our method can consistently and significantly outperform finetuning baselines across different settings, and even perform better than finetuning/prompting a 60x larger GPT-3 (175B) model by up to 9.5% in accuracy. As a side benefit, human evaluation further shows that our method can generate high-quality explanations to justify its predictions, moving towards the goal of explainable AI.
Abstract:Building dialogue systems requires a large corpus of annotated dialogues. Such datasets are usually created via crowdsourcing, which is expensive and time-consuming. In this paper, we propose a novel method for dialogue simulation based on language model in-context learning, dubbed as \textsc{Dialogic}. Seeded with a few annotated dialogues, \textsc{Dialogic} automatically selects in-context examples for demonstration and prompts GPT-3 to generate new dialogues and their annotations in a controllable way. Leveraging the strong in-context learning ability of GPT-3, our method can be used to rapidly expand a small set of dialogue data without requiring \textit{human involvement} or \textit{parameter update}, and is thus much more cost-efficient and time-saving than crowdsourcing. Experimental results on the MultiWOZ dataset demonstrate that training a model on the simulated dialogues leads to even better performance than using the same amount of human-generated dialogues in the low-resource settings, with as few as 85 dialogues as the seed data. Human evaluation results also show that our simulated dialogues has high language fluency and annotation accuracy. The code and data are available at \href{https://github.com/Leezekun/dialogic}{https://github.com/Leezekun/dialogic}.
Abstract:Soft demodulation of received symbols into bit log-likelihood ratios (LLRs) is at the very heart of multiple-input-multiple-output (MIMO) detection. However, the optimal maximum a posteriori (MAP) detector is complicated and infeasible to be used in a practical system. In this paper, we propose a soft MIMO detection algorithm based on marginal posterior probability statistics (MPPS). With the help of optimal transport theory and order statistics theory, we transform the posteriori probability distribution of each layer into a Gaussian distribution. Then the full sampling paths can be implicitly restored from the first- and second-order moment statistics of the transformed distribution. A lightweight network is designed to learn to recovery the log-MAP LLRs from the moment statistics with low complexity. Simulation results show that the proposed algorithm can improve the performance significantly with reduced samples under fading and correlated channels.
Abstract:Recent work has shown that large pretrained Language Models (LMs) can not only perform remarkably well on a range of Natural Language Processing (NLP) tasks but also start improving on reasoning tasks such as arithmetic induction, symbolic manipulation, and commonsense reasoning with increasing size of models. However, it is still unclear what the underlying capabilities of these LMs are. Surprisingly, we find that these models have limitations on certain basic symbolic manipulation tasks such as copy, reverse, and addition. When the total number of symbols or repeating symbols increases, the model performance drops quickly. We investigate the potential causes behind this phenomenon and examine a set of possible methods, including explicit positional markers, fine-grained computation steps, and LMs with callable programs. Experimental results show that none of these techniques can solve the simplest addition induction problem completely. In the end, we introduce LMs with tutor, which demonstrates every single step of teaching. LMs with tutor is able to deliver 100% accuracy in situations of OOD and repeating symbols, shedding new insights on the boundary of large LMs in induction.
Abstract:To guide the generation of large pretrained language models (LM), previous work has focused on directly fine-tuning the language model or utilizing an attribute discriminator. In this work, we propose a novel lightweight framework for controllable GPT2 generation, which utilizes a set of small attribute-specific vectors, called prefixes, to steer natural language generation. Different from prefix-tuning, where each prefix is trained independently, we take the relationship among prefixes into consideration and train multiple prefixes simultaneously. We propose a novel supervised method and also an unsupervised method to train the prefixes for single-aspect control while the combination of these two methods can achieve multi-aspect control. Experimental results on both single-aspect and multi-aspect control show that our methods can guide generation towards the desired attributes while keeping high linguistic quality.