Abstract:Modern AI assistants have made significant progress in natural language understanding and API/tool integration, with emerging efforts to incorporate diverse interfaces (such as Web interfaces) for enhanced scalability and functionality. However, current approaches that heavily rely on repeated LLM-driven HTML parsing are computationally expensive and error-prone, particularly when handling dynamic web interfaces and multi-step tasks. To overcome these challenges, we introduce PAFFA (Premeditated Actions For Fast Agents), a framework designed to enhance web interaction capabilities through an Action API Library of reusable, verified browser interaction functions. By pre-computing interaction patterns and employing two core methodologies - "Dist-Map" for task-agnostic element distillation and "Unravel" for incremental page-wise exploration - PAFFA reduces inference calls by 87% while maintaining robust performance even as website structures evolve. This framework accelerates multi-page task execution and offers a scalable solution to advance autonomous web agent research.
Abstract:Active reconfigurable intelligent surfaces (RISs) can improve the performance of integrated sensing and communication (ISAC), and therefore enable simultaneous data transmission and target sensing. However, when the line-of-sight (LoS) link between the base station and the sensing target is blocked, the sensing signals suffer from severe path loss, resulting in an inferior sensing performance. To address this issue, this paper employs a sensor-aided active RIS to enhance ISAC system performance. The goal is to maximize the signal-to-noise ratio of the echo signal from the target at the sensor-array while meeting constraints on communication signal quality, power budgets, and RIS amplification limits. The optimization problem is challenging due to its non-convex nature and the coupling between the optimization variables. We propose a closed-form solution for receive beamforming, and a successive convex approximation based iterative method for transmit and reflection beamforming design. Simulation results demonstrate the advantage of the proposed sensor-aided active RIS-assisted system model over its non-sensor-aided counterpart.
Abstract:Communication networks are evolving from solely emphasizing communication to facilitating multiple functionalities. In this regard, integrated sensing, communication, and powering (ISCAP) provides an efficient way of enabling data transmission, radar sensing, and wireless power transfer simultaneously. Such a multi-functional network requires a multi-functional architectural solution. Toward this end, sensor-aided zero-energy reconfigurable intelligent surfaces (SAZE-RISs) offer an energy-efficient solution for ISCAP by meeting the requirements of the end users as well as supplying power for the RIS. This paper explores the use of SAZE-RIS within the ISCAP framework. First, we present the general system architecture, operational protocols, and main application scenarios for employing SAZE-RIS in ISCAP. Next, we discuss methods for managing the conflicting requirements of communication, sensing, and powering within ISCAP and the role of SAZE-RIS in this process. We then provide a detailed case study complete with simulation results, offering valuable insights into the design choices and tradeoffs that come into play when adopting this technology. Furthermore, we discuss the related challenges and open research avenues, highlighting areas that require further exploration to fully realize the potential of SAZE-RIS within this ISCAP framework.
Abstract:Large Language Models (LLMs) have become a cornerstone in the field of Natural Language Processing (NLP), offering transformative capabilities in understanding and generating human-like text. However, with their rising prominence, the security and vulnerability aspects of these models have garnered significant attention. This paper presents a comprehensive survey of the various forms of attacks targeting LLMs, discussing the nature and mechanisms of these attacks, their potential impacts, and current defense strategies. We delve into topics such as adversarial attacks that aim to manipulate model outputs, data poisoning that affects model training, and privacy concerns related to training data exploitation. The paper also explores the effectiveness of different attack methodologies, the resilience of LLMs against these attacks, and the implications for model integrity and user trust. By examining the latest research, we provide insights into the current landscape of LLM vulnerabilities and defense mechanisms. Our objective is to offer a nuanced understanding of LLM attacks, foster awareness within the AI community, and inspire robust solutions to mitigate these risks in future developments.
Abstract:Semantic Segmentation (SS) of LiDAR point clouds is essential for many applications, such as urban planning and autonomous driving. While much progress has been made in interpreting SS predictions for images, interpreting point cloud SS predictions remains a challenge. This paper introduces pGS-CAM, a novel gradient-based method for generating saliency maps in neural network activation layers. Inspired by Grad-CAM, which uses gradients to highlight local importance, pGS-CAM is robust and effective on a variety of datasets (SemanticKITTI, Paris-Lille3D, DALES) and 3D deep learning architectures (KPConv, RandLANet). Our experiments show that pGS-CAM effectively accentuates the feature learning in intermediate activations of SS architectures by highlighting the contribution of each point. This allows us to better understand how SS models make their predictions and identify potential areas for improvement. Relevant codes are available at https://github.com/geoai4cities/pGS-CAM.
Abstract:The emergence of various technologies demanding both high data rates and precise sensing performance, such as autonomous vehicles and internet of things devices, has propelled an increasing popularity of integrated sensing and communication (ISAC) in recent years. ISAC offers an efficient framework for communication and sensing where both functionalities are carried out in a shared spectrum, utilizing the same hardware, beamformer and waveform design. At the same time, intelligent metasurfaces have been identified as an architectural enabler for the upcoming sixth-generation (6G) of wireless communication due to their ability to control the propagation environment in an energy-efficient manner. Due to the potential of metasurfaces to enhance both communication and sensing performance, numerous papers have explored the performance gains of using metasurfaces to improve ISAC. This survey reviews the existing literature on metasurface-assisted ISAC, detailing the associated challenges and opportunities. To provide a comprehensive overview, we commence by offering relevant background information on standalone metasurface-assisted communication and metasurface-assisted sensing systems, followed by a discussion on the fundamentals of ISAC. The core part of the paper then summarizes the state-of-the-art studies on metasurface-assisted ISAC with metasurfaces employed as separate entities placed between the transmitter and receiver, also known as reconfigurable intelligent surfaces, with an emphasis on its two levels of integration: radio-communications co-existence and dual-function radar-communications. We also review the current works in the area of holographic ISAC where metasurfaces are used to form part of ISAC transmitter. Within each category, the challenges, opportunities and future research directions are also highlighted.
Abstract:Natural Language Generation (NLG) typically involves evaluating the generated text in various aspects (e.g., consistency and naturalness) to obtain a comprehensive assessment. However, multi-aspect evaluation remains challenging as it may require the evaluator to generalize to any given evaluation aspect even if it's absent during training. In this paper, we introduce X-Eval, a two-stage instruction tuning framework to evaluate the text in both seen and unseen aspects customized by end users. X-Eval consists of two learning stages: the vanilla instruction tuning stage that improves the model's ability to follow evaluation instructions, and an enhanced instruction tuning stage that exploits the connections between fine-grained evaluation aspects to better assess text quality. To support the training of X-Eval, we collect AspectInstruct, the first instruction tuning dataset tailored for multi-aspect NLG evaluation spanning 27 diverse evaluation aspects with 65 tasks. To enhance task diversity, we devise an augmentation strategy that converts human rating annotations into diverse forms of NLG evaluation tasks, including scoring, comparison, ranking, and Boolean question answering. Extensive experiments across three essential categories of NLG tasks: dialogue generation, summarization, and data-to-text coupled with 21 aspects in meta-evaluation, demonstrate that our X-Eval enables even a lightweight language model to achieve a comparable if not higher correlation with human judgments compared to the state-of-the-art NLG evaluators, such as GPT-4.
Abstract:LiDAR-generated point clouds are crucial for perceiving outdoor environments. The segmentation of point clouds is also essential for many applications. Previous research has focused on using self-attention and convolution (local attention) mechanisms individually in semantic segmentation architectures. However, there is limited work on combining the learned representations of these attention mechanisms to improve performance. Additionally, existing research that combines convolution with self-attention relies on global attention, which is not practical for processing large point clouds. To address these challenges, this study proposes a new architecture, pCTFusion, which combines kernel-based convolutions and self-attention mechanisms for better feature learning and capturing local and global dependencies in segmentation. The proposed architecture employs two types of self-attention mechanisms, local and global, based on the hierarchical positions of the encoder blocks. Furthermore, the existing loss functions do not consider the semantic and position-wise importance of the points, resulting in reduced accuracy, particularly at sharp class boundaries. To overcome this, the study models a novel attention-based loss function called Pointwise Geometric Anisotropy (PGA), which assigns weights based on the semantic distribution of points in a neighborhood. The proposed architecture is evaluated on SemanticKITTI outdoor dataset and showed a 5-7% improvement in performance compared to the state-of-the-art architectures. The results are particularly encouraging for minor classes, often misclassified due to class imbalance, lack of space, and neighbor-aware feature encoding. These developed methods can be leveraged for the segmentation of complex datasets and can drive real-world applications of LiDAR point cloud.
Abstract:The use of reconfigurable intelligent surfaces (RISs) has been proposed in the past few years to achieve a better communication system performance by creating a programmable wireless propagation environment. In this paper, we target maximizing both energy efficiency and user fairness in RIS-assisted millimeter-wave systems with imperfect channel state information. We formulate the energy efficiency and fairness maximization problem as a multi-objective optimization problem. We split the corresponding multi-objective optimization problem into two stages using a lexicographic approach. In the first stage, the energy efficiency is maximized; then in the second stage, the fairness is maximized subject to a maximum reduction in the optimal value of the energy efficiency. We propose a projected gradient ascent based alternating optimization procedure to solve the optimization problem in each stage. We further employ the penalty dual decomposition method to address the challenging energy efficiency constraint in the second stage. Simulation results show that the proposed algorithm can achieve a better trade-off between energy efficiency and fairness compared to the methods that target only one of those metrics.
Abstract:We propose CHRT (Control Hidden Representation Transformation) - a controlled language generation framework that steers large language models to generate text pertaining to certain attributes (such as toxicity). CHRT gains attribute control by modifying the hidden representation of the base model through learned transformations. We employ a contrastive-learning framework to learn these transformations that can be combined to gain multi-attribute control. The effectiveness of CHRT is experimentally shown by comparing it with seven baselines over three attributes. CHRT outperforms all the baselines in the task of detoxification, positive sentiment steering, and text simplification while minimizing the loss in linguistic qualities. Further, our approach has the lowest inference latency of only 0.01 seconds more than the base model, making it the most suitable for high-performance production environments. We open-source our code and release two novel datasets to further propel controlled language generation research.