Abstract:We propose LightLLM, a model that fine tunes pre-trained large language models (LLMs) for light-based sensing tasks. It integrates a sensor data encoder to extract key features, a contextual prompt to provide environmental information, and a fusion layer to combine these inputs into a unified representation. This combined input is then processed by the pre-trained LLM, which remains frozen while being fine-tuned through the addition of lightweight, trainable components, allowing the model to adapt to new tasks without altering its original parameters. This approach enables flexible adaptation of LLM to specialized light sensing tasks with minimal computational overhead and retraining effort. We have implemented LightLLM for three light sensing tasks: light-based localization, outdoor solar forecasting, and indoor solar estimation. Using real-world experimental datasets, we demonstrate that LightLLM significantly outperforms state-of-the-art methods, achieving 4.4x improvement in localization accuracy and 3.4x improvement in indoor solar estimation when tested in previously unseen environments. We further demonstrate that LightLLM outperforms ChatGPT-4 with direct prompting, highlighting the advantages of LightLLM's specialized architecture for sensor data fusion with textual prompts.
Abstract:Post-training quantization (PTQ) for vision transformers (ViTs) has garnered significant attention due to its efficiency in compressing models. However, existing methods typically overlook the intricate interdependence between quantized weight and activation, leading to considerable quantization error. In this paper, we propose ERQ, a two-step PTQ approach meticulously crafted to sequentially reduce the quantization error arising from activation and weight quantization. ERQ first introduces Activation quantization error reduction (Aqer) that strategically formulates the minimization of activation quantization error as a Ridge Regression problem, tackling it by updating weights with full-precision. Subsequently, ERQ introduces Weight quantization error reduction (Wqer) that adopts an iterative approach to mitigate the quantization error induced by weight quantization. In each iteration, an empirically derived, efficient proxy is employed to refine the rounding directions of quantized weights, coupled with a Ridge Regression solver to curtail weight quantization error. Experimental results attest to the effectiveness of our approach. Notably, ERQ surpasses the state-of-the-art GPTQ by 22.36% in accuracy for W3A4 ViT-S.
Abstract:Multiple robot systems are favored for object manipulation and transportation, especially for large objects. However, in more complex manipulation such as flipping, these systems encounter a new challenge, configuration disconnectivity of manipulators. Grasping objects by manipulators will impose closed-chain constraints on the system, which in turn limits the feasible motions of manipulators and further compromises the configuration connectivity. Multiple mobile manipulator systems show much more flexibility in object manipulation with the mobility of the mobile platform and have the potential to address the above problem. In this paper, a novel planning framework is proposed for complex flipping manipulation by incorporating platform motions and regrasping. Firstly, two types of trajectories, mobile manipulator planning and regrasping planning, are classified and can be assigned different priorities for different tasks. Secondly, corresponding planning methods are designed for each type of trajectory. Specifically, in mobile manipulator planning, the configuration of the platform is determined through optimization to ensure connectivity when the manipulator approaches configuration boundaries. In regrasping planning, closed-chain constraints are temporarily disregarded and the manipulation capabilities are prioritized to facilitate subsequent planning. Finally, the structure of the overall planning framework is provided. Experimental results demonstrate that the proposed planner efficiently plans the motions of the system to accomplish flipping manipulation. Additionally, a comprehensive experiment emphasizes the significance of our planner in extending the capabilities of multiple mobile manipulator systems in complex tasks.
Abstract:Albeit the scalable performance of vision transformers (ViTs), the dense computational costs (training & inference) undermine their position in industrial applications. Post-training quantization (PTQ), tuning ViTs with a tiny dataset and running in a low-bit format, well addresses the cost issue but unluckily bears more performance drops in lower-bit cases. In this paper, we introduce I&S-ViT, a novel method that regulates the PTQ of ViTs in an inclusive and stable fashion. I&S-ViT first identifies two issues in the PTQ of ViTs: (1) Quantization inefficiency in the prevalent log2 quantizer for post-Softmax activations; (2) Rugged and magnified loss landscape in coarse-grained quantization granularity for post-LayerNorm activations. Then, I&S-ViT addresses these issues by introducing: (1) A novel shift-uniform-log2 quantizer (SULQ) that incorporates a shift mechanism followed by uniform quantization to achieve both an inclusive domain representation and accurate distribution approximation; (2) A three-stage smooth optimization strategy (SOS) that amalgamates the strengths of channel-wise and layer-wise quantization to enable stable learning. Comprehensive evaluations across diverse vision tasks validate I&S-ViT' superiority over existing PTQ of ViTs methods, particularly in low-bit scenarios. For instance, I&S-ViT elevates the performance of 3-bit ViT-B by an impressive 50.68%.
Abstract:We present object handling and transport by a multi-robot team with a deformable sheet as a carrier. Due to the deformability of the sheet and the high dimension of the whole system, it is challenging to clearly describe all the possible positions of the object on the sheet for a given formation of the multi-robot system. A complete forward kinematics (FK) method is proposed in this paper for object handling by an $N$-mobile robot team with a deformable sheet. Based on the virtual variable cables model, a constrained quadratic problem (CQP) is formulated by combining the form closure and minimum potential energy conditions of the system. Analytical solutions to the CQP are presented and then further verified with the force closure condition. With the proposed FK method, all possible solutions are obtained with the given initial sheet shape and the robot team formation. We demonstrate the effectiveness, completeness, and efficiency of the FK method with simulation and experimental results.
Abstract:Planning coverage path for multiple robots in a decentralized way enhances robustness to coverage tasks handling uncertain malfunctions. To achieve high efficiency in a distributed manner for each single robot, a comprehensive understanding of both the complicated environments and cooperative agents intent is crucial. Unfortunately, existing works commonly consider only part of these factors, resulting in imbalanced subareas or unnecessary overlaps. To tackle this issue, we introduce a Decentralized reinforcement learning framework with dual guidance to train each agent to solve the decentralized multiple coverage path planning problem straightly through the environment states. As distributed robots require others intentions to perform better coverage efficiency, we utilize two guidance methods, artificial potential fields and heuristic guidance, to include and integrate others intentions into observations for each robot. With our constructed framework, results have shown our agents successfully learn to determine their own subareas while achieving full coverage, balanced subareas and low overlap rates. We then implement spanning tree cover within those subareas to construct actual routes for each robot and complete given coverage tasks. Our performance is also compared with the state of the art decentralized method showing at most 10 percent lower overlap rates while performing high efficiency in similar environments.
Abstract:Multi-mobile robot systems show great advantages over one single robot in many applications. However, the robots are required to form desired task-specified formations, making feasible motions decrease significantly. Thus, it is challenging to determine whether the robots can pass through an obstructed environment under formation constraints, especially in an obstacle-rich environment. Furthermore, is there an optimal path for the robots? To deal with the two problems, a novel graphbased motion planner is proposed in this paper. A mapping between workspace and configuration space of multi-mobile robot systems is first built, where valid configurations can be acquired to satisfy both formation constraints and collision avoidance. Then, an undirected graph is generated by verifying connectivity between valid configurations. The breadth-first search method is employed to answer the question of whether there is a feasible path on the graph. Finally, an optimal path will be planned on the updated graph, considering the cost of path length and formation preference. Simulation results show that the planner can be applied to get optimal motions of robots under formation constraints in obstacle-rich environments. Additionally, different constraints are considered.
Abstract:Multi-robot transportation (MRT) is to transport the object to the destination by the cooperation of multiple robots. In the process of object transportation, obstacle avoidance is an indispensable feature. In traditional local planners, obstacles are usually considered insurmountable, so the robot team bypasses the obstacles as a whole. However, many obstacles can be crossed in real situation. Studying the obstacle crossing ability of robot team can improve the efficiency of MRT and increase the planning success rate in complex environment. Inspired by the patient transfer through bed sheet, this paper focuses on the object transportation by multi-mobile robots with deformable sheet. A new local planner with obstacle crossing capability is proposed, which consists of three parts: deformable sheet modeling, formation optimization and local path generation. It can successfully find an obstacle crossing path in complex scenarios where other planners fail. The effectiveness and the versatility of the planner is verified by a case study with three mobile robots in the experiment and a simulation with four robots.
Abstract:Although neural machine translation with the encoder-decoder framework has achieved great success recently, it still suffers drawbacks of forgetting distant information, which is an inherent disadvantage of recurrent neural network structure, and disregarding relationship between source words during encoding step. Whereas in practice, the former information and relationship are often useful in current step. We target on solving these problems and thus introduce relation networks to learn better representations of the source. The relation networks are able to facilitate memorization capability of recurrent neural network via associating source words with each other, this would also help retain their relationships. Then the source representations and all the relations are fed into the attention component together while decoding, with the main encoder-decoder framework unchanged. Experiments on several datasets show that our method can improve the translation performance significantly over the conventional encoder-decoder model and even outperform the approach involving supervised syntactic knowledge.
Abstract:Even though sequence-to-sequence neural machine translation (NMT) model have achieved state-of-art performance in the recent fewer years, but it is widely concerned that the recurrent neural network (RNN) units are very hard to capture the long-distance state information, which means RNN can hardly find the feature with long term dependency as the sequence becomes longer. Similarly, convolutional neural network (CNN) is introduced into NMT for speeding recently, however, CNN focus on capturing the local feature of the sequence; To relieve this issue, we incorporate a relation network into the standard encoder-decoder framework to enhance information-propogation in neural network, ensuring that the information of the source sentence can flow into the decoder adequately. Experiments show that proposed framework outperforms the statistical MT model and the state-of-art NMT model significantly on two data sets with different scales.