Abstract:While Large Language Models (LLMs) possess significant capabilities in open-world agent tasks, they also face challenges in rapidly adapting to new, specialized tasks due to their reliance on static pre-trained knowledge. Traditional methods such as fine-tuning are often costly, data-intensive, and may lead to "catastrophic forgetting." Therefore, we present KnowMap, a novel approach that dynamically constructs a knowledge base from environmental and experiential data. KnowMap fine-tunes a small knowledge-embedding model to equip a larger LLM with valuable task-specific knowledge. Our experiments on the ScienceWorld benchmark demonstrate 17.71% improvement for the performance of gpt-4-turbo model. KnowMap not only provides an efficient and effective means for LLM task-adapting, but also highlights how integrating environmental and experiential knowledge can enhance LLMs' reasoning capabilities.
Abstract:Knowledge distillation is a model compression technique in which a compact "student" network is trained to replicate the predictive behavior of a larger "teacher" network. In logit-based knowledge distillation it has become the de facto approach to augment cross-entropy with a distillation term. Typically this term is either a KL divergence-matching marginal probabilities or a correlation-based loss capturing intra- and inter-class relationships but in every case it sits as an add-on to cross-entropy with its own weight that must be carefully tuned. In this paper we adopt a choice-theoretic perspective and recast knowledge distillation under the Plackett-Luce model by interpreting teacher logits as "worth" scores. We introduce Plackett-Luce Distillation (PLD), a weighted list-wise ranking loss in which the teacher model transfers knowledge of its full ranking of classes, weighting each ranked choice by its own confidence. PLD directly optimizes a single teacher-optimal ranking of the true label first, followed by the remaining classes in descending teacher confidence, yielding a convex, translation-invariant surrogate that subsumes weighted cross-entropy. Empirically on standard image classification benchmarks, PLD improves Top-1 accuracy by an average of +0.42% over DIST (arXiv:2205.10536) and +1.04% over KD (arXiv:1503.02531) in homogeneous settings and by +0.48% and +1.09% over DIST and KD, respectively, in heterogeneous settings.
Abstract:The item cold-start problem is crucial for online recommender systems, as the success of the cold-start phase determines whether items can transition into popular ones. Prompt learning, a powerful technique used in natural language processing (NLP) to address zero- or few-shot problems, has been adapted for recommender systems to tackle similar challenges. However, existing methods typically rely on content-based properties or text descriptions for prompting, which we argue may be suboptimal for cold-start recommendations due to 1) semantic gaps with recommender tasks, 2) model bias caused by warm-up items contribute most of the positive feedback to the model, which is the core of the cold-start problem that hinders the recommender quality on cold-start items. We propose to leverage high-value positive feedback, termed pinnacle feedback as prompt information, to simultaneously resolve the above two problems. We experimentally prove that compared to the content description proposed in existing works, the positive feedback is more suitable to serve as prompt information by bridging the semantic gaps. Besides, we propose item-wise personalized prompt networks to encode pinnaclce feedback to relieve the model bias by the positive feedback dominance problem. Extensive experiments on four real-world datasets demonstrate the superiority of our model over state-of-the-art methods. Moreover, PROMO has been successfully deployed on a popular short-video sharing platform, a billion-user scale commercial short-video application, achieving remarkable performance gains across various commercial metrics within cold-start scenarios
Abstract:With the rise of e-commerce and short videos, online recommender systems that can capture users' interests and update new items in real-time play an increasingly important role. In both online and offline recommendation, the cold-start problem due to interaction sparsity has been affecting the recommendation effect of cold-start items, which is also known as the long-tail problem of item distribution. Many cold-start scheme based on fine-tuning or knowledge transferring shows excellent performance on offline recommendation. Yet, these schemes are infeasible for online recommendation on streaming data pipelines due to different training method, computational overhead and time constraints. Inspired by the above questions, we propose a model-agnostic recommendation algorithm called Popularity-Aware Meta-learning (PAM), to address the item cold-start problem under streaming data settings. PAM divides the incoming data into different meta-learning tasks by predefined item popularity thresholds. The model can distinguish and reweight behavior-related features and content-related features in each task based on their different roles in different popularity levels, thus adapting to recommendations for cold-start samples. These task-fixing design significantly reduces additional computation and storage costs compared to offline methods. Furthermore, PAM also introduced data augmentation and an additional self-supervised loss specifically designed for low-popularity tasks, leveraging insights from high-popularity samples. This approach effectively mitigates the issue of inadequate supervision due to the scarcity of cold-start samples. Experimental results across multiple public datasets demonstrate the superiority of our approach over other baseline methods in addressing cold-start challenges in online streaming data scenarios.
Abstract:Smartphones have significantly enhanced our daily learning, communication, and entertainment, becoming an essential component of modern life. However, certain populations, including the elderly and individuals with disabilities, encounter challenges in utilizing smartphones, thus necessitating mobile app operation assistants, a.k.a. mobile app agent. With considerations for privacy, permissions, and cross-platform compatibility issues, we endeavor to devise and develop PeriGuru in this work, a peripheral robotic mobile app operation assistant based on GUI image understanding and prompting with Large Language Model (LLM). PeriGuru leverages a suite of computer vision techniques to analyze GUI screenshot images and employs LLM to inform action decisions, which are then executed by robotic arms. PeriGuru achieves a success rate of 81.94% on the test task set, which surpasses by more than double the method without PeriGuru's GUI image interpreting and prompting design. Our code is available on https://github.com/Z2sJ4t/PeriGuru.
Abstract:As a widely used localization and sensing technique, radars will play an important role in future wireless networks. However, the wireless channels between the radar and the targets are passively adopted by traditional radars, which limits the performance of target detection. To address this issue, we propose to use the reconfigurable intelligent surface (RIS) to improve the detection accuracy of radar systems due to its capability to customize channel conditions by adjusting its phase shifts, which is referred to as MetaRadar. In such a system, it is challenging to jointly optimize both radar waveforms and RIS phase shifts in order to improve the multi-target detection performance. To tackle this challenge, we design a waveform and phase shift optimization (WPSO) algorithm to effectively solve the multi-target detection problem, and also analyze the performance of the proposed MetaRadar scheme theoretically. Simulation results show that the detection performance of the MetaRadar scheme is significantly better than that of the traditional radar schemes.
Abstract:Semantic segmentation is a process of partitioning an image into multiple segments for recognizing humans and objects, which can be widely applied in scenarios such as healthcare and safety monitoring. To avoid privacy violation, using RF signals instead of an image for human and object recognition has gained increasing attention. However, human and object recognition by using RF signals is usually a passive signal collection and analysis process without changing the radio environment, and the recognition accuracy is restricted significantly by unwanted multi-path fading, and/or the limited number of independent channels between RF transceivers in uncontrollable radio environments. This paper introduces MetaSketch, a novel RF-sensing system that performs semantic recognition and segmentation for humans and objects by making the radio environment reconfigurable. A metamaterial surface is incorporated into MetaSketch and diversifies the information carried by RF signals. Using compressive sensing techniques, MetaSketch reconstructs a point cloud consisting of the reflection coefficients of humans and objects at different spatial points, and recognizes the semantic meaning of the points by using symmetric multilayer perceptron groups. Our evaluation results show that MetaSketch is capable of generating favorable radio environments and extracting exact point clouds, and labeling the semantic meaning of the points with an average error rate of less than 1% in an indoor space.
Abstract:Indoor wireless simultaneous localization and mapping (SLAM) is considered as a promising technique to provide positioning services in future 6G systems. However, the accuracy of traditional wireless SLAM system heavily relies on the quality of propagation paths, which is limited by the uncontrollable wireless environment. In this paper, we propose a novel SLAM system assisted by a reconfigurable intelligent surface (RIS) to address this issue. By configuring the phase shifts of the RIS, the strength of received signals can be enhanced to resist the disturbance of noise. However, the selection of phase shifts heavily influences the localization and mapping phase, which makes the design very challenging. To tackle this challenge, we formulate the RIS-assisted indoor SLAM optimization problem and design an error minimization algorithm for it. Simulations show that the RIS assisted SLAM system can decrease the positioning error by at least 31% compared with benchmark schemes.
Abstract:In the coming 6G communications, the internet of things (IoT) serves as a key enabler to collect environmental information and is expected to achieve ubiquitous deployment. However, it is challenging for traditional IoT sensors to meet this demand because of their requirement of power supplies and frequent maintenance, which is due to their sense-then-transmit working principle. To address this challenge, we propose a meta-IoT sensing system, where the IoT sensors are based on specially designed meta-materials. The meta-IoT sensors achieve simultaneous sensing and transmission and thus require no power supplies. In order to design a meta-IoT sensing system with optimal sensing accuracy, we jointly consider the sensing and transmission of meta-IoT sensors and propose an efficient algorithm to jointly optimizes the meta-IoT structure and the sensing function at the receiver of the system. As an example, we apply the proposed system and algorithm in sensing environmental temperature and humidity levels. Simulation results show that by using the proposed algorithm, the sensing accuracy can be significantly increased.
Abstract:Traffic flow forecasting is of great significance for improving the efficiency of transportation systems and preventing emergencies. Due to the highly non-linearity and intricate evolutionary patterns of short-term and long-term traffic flow, existing methods often fail to take full advantage of spatial-temporal information, especially the various temporal patterns with different period shifting and the characteristics of road segments. Besides, the globality representing the absolute value of traffic status indicators and the locality representing the relative value have not been considered simultaneously. This paper proposes a neural network model that focuses on the globality and locality of traffic networks as well as the temporal patterns of traffic data. The cycle-based dilated deformable convolution block is designed to capture different time-varying trends on each node accurately. Our model can extract both global and local spatial information since we combine two graph convolutional network methods to learn the representations of nodes and edges. Experiments on two real-world datasets show that the model can scrutinize the spatial-temporal correlation of traffic data, and its performance is better than the compared state-of-the-art methods. Further analysis indicates that the locality and globality of the traffic networks are critical to traffic flow prediction and the proposed TSSRGCN model can adapt to the various temporal traffic patterns.