Abstract:Accurate trajectory tracking is an essential characteristic for the safe navigation of a quadrotor in cluttered or disturbed environments. In this paper, we present in detail two state-of-the-art model-based control frameworks for trajectory tracking: the Linear Model Predictive Controller (LMPC) and the Nonlinear Model Predictive Controller (NMPC). Additionally, the kinematic and dynamic models of the quadrotor are comprehensively described. Finally, a simulation system is implemented to verify feasibility, demonstrating the effectiveness of both controllers.
Abstract:Intelligent aerial platforms such as Unmanned Aerial Vehicles (UAVs) are expected to revolutionize various fields, including transportation, traffic management, field monitoring, industrial production, and agricultural management. Among these, precise control is a critical task that determines the performance and capabilities of UAV systems. However, current research primarily focuses on trajectory tracking and minimizing flight errors, with limited attention to improving flight time. In this paper, we propose a Model Predictive Control (MPC) approach aimed at minimizing flight time while addressing the limitations of the commonly used classical MPC controllers. Furthermore, the MPC method and its application for UAV control are presented in detail. Finally, the results demonstrate that the proposed controller outperforms the standard MPC in terms of efficiency. Moreover, this approach shows potential to become a foundation for integrating intelligent algorithms into basic controllers.
Abstract:Depth information plays a crucial role in autonomous systems for environmental perception and robot state estimation. With the rapid development of deep neural network technology, depth estimation has been extensively studied and shown potential for practical applications. However, in particularly challenging environments such as low-light and noisy underwater conditions, direct application of machine learning models may not yield the desired results. Therefore, in this paper, we present an approach to enhance underwater image quality to improve depth estimation effectiveness. First, underwater images are processed through methods such as color compensation, brightness equalization, and enhancement of contrast and sharpness of objects in the image. Next, we perform depth estimation using the Udepth model on the enhanced images. Finally, the results are evaluated and presented to verify the effectiveness and accuracy of the enhanced depth image quality approach for underwater robots.
Abstract:Localization and navigation are two crucial issues for mobile robots. In this paper, we propose an approach for localization and navigation systems for a differential-drive robot based on monocular SLAM. The system is implemented on the Robot Operating System (ROS). The hardware includes a differential-drive robot with an embedded computing platform (Jetson Xavier AGX), a 2D camera, and a LiDAR sensor for collecting external environmental information. The A* algorithm and Dynamic Window Approach (DWA) are used for path planning based on a 2D grid map. The ORB_SLAM3 algorithm is utilized to extract environmental features, providing the robot's pose for the localization and navigation processes. Finally, the system is tested in the Gazebo simulation environment and visualized through Rviz, demonstrating the efficiency and potential of the system for indoor localization and navigation of mobile robots.
Abstract:In this paper, we propose the development of an interactive platform between humans and a dual-arm robotic system based on the Robot Operating System (ROS) and a multimodal artificial intelligence model. Our proposed platform consists of two main components: a dual-arm robotic hardware system and software that includes image processing tasks and natural language processing using a 3D camera and embedded computing. First, we designed and developed a dual-arm robotic system with a positional accuracy of less than 2 cm, capable of operating independently, performing industrial and service tasks while simultaneously simulating and modeling the robot in the ROS environment. Second, artificial intelligence models for image processing are integrated to execute object picking and classification tasks with an accuracy of over 90%. Finally, we developed remote control software using voice commands through a natural language processing model. Experimental results demonstrate the accuracy of the multimodal artificial intelligence model and the flexibility of the dual-arm robotic system in interactive human environments.
Abstract:Navigating safely in dynamic human environments is crucial for mobile service robots, and social navigation is a key aspect of this process. In this paper, we proposed an integrative approach that combines motion prediction and trajectory planning to enable safe and socially-aware robot navigation. The main idea of the proposed method is to leverage the advantages of Socially Acceptable trajectory prediction and Timed Elastic Band (TEB) by incorporating human interactive information including position, orientation, and motion into the objective function of the TEB algorithms. In addition, we designed social constraints to ensure the safety of robot navigation. The proposed system is evaluated through physical simulation using both quantitative and qualitative metrics, demonstrating its superior performance in avoiding human and dynamic obstacles, thereby ensuring safe navigation. The implementations are open source at: \url{https://github.com/thanhnguyencanh/SGan-TEB.git}
Abstract:Localization is one of the most crucial tasks for Unmanned Aerial Vehicle systems (UAVs) directly impacting overall performance, which can be achieved with various sensors and applied to numerous tasks related to search and rescue operations, object tracking, construction, etc. However, due to the negative effects of challenging environments, UAVs may lose signals for localization. In this paper, we present an effective path-planning system leveraging semantic segmentation information to navigate around texture-less and problematic areas like lakes, oceans, and high-rise buildings using a monocular camera. We introduce a real-time semantic segmentation architecture and a novel keyframe decision pipeline to optimize image inputs based on pixel distribution, reducing processing time. A hierarchical planner based on the Dynamic Window Approach (DWA) algorithm, integrated with a cost map, is designed to facilitate efficient path planning. The system is implemented in a photo-realistic simulation environment using Unity, aligning with segmentation model parameters. Comprehensive qualitative and quantitative evaluations validate the effectiveness of our approach, showing significant improvements in the reliability and efficiency of UAV localization in challenging environments.
Abstract:Most learned B-frame codecs with hierarchical temporal prediction suffer from the domain shift issue caused by the discrepancy in the Group-of-Pictures (GOP) size used for training and test. As such, the motion estimation network may fail to predict large motion properly. One effective strategy to mitigate this domain shift issue is to downsample video frames for motion estimation. However, finding the optimal downsampling factor involves a time-consuming rate-distortion optimization process. This work introduces lightweight classifiers to determine the downsampling factor. To strike a good rate-distortion-complexity trade-off, our classifiers observe simple state signals, including only the coding and reference frames, to predict the best downsampling factor. We present two variants that adopt binary and multi-class classifiers, respectively. The binary classifier adopts the Focal Loss for training, classifying between motion estimation at high and low resolutions. Our multi-class classifier is trained with novel soft labels incorporating the knowledge of the rate-distortion costs of different downsampling factors. Both variants operate as add-on modules without the need to re-train the B-frame codec. Experimental results confirm that they achieve comparable coding performance to the brute-force search methods while greatly reducing computational complexity.
Abstract:In the realm of robotics, achieving simultaneous localization and mapping (SLAM) is paramount for autonomous navigation, especially in challenging environments like texture-less structures. This paper proposed a factor-graph-based model that tightly integrates IMU and encoder sensors to enhance positioning in such environments. The system operates by meticulously evaluating the data from each sensor. Based on these evaluations, weights are dynamically adjusted to prioritize the more reliable source of information at any given moment. The robot's state is initialized using IMU data, while the encoder aids motion estimation in long corridors. Discrepancies between the two states are used to correct IMU drift. The effectiveness of this method is demonstrably validated through experimentation. Compared to Karto SLAM, a widely used SLAM algorithm, this approach achieves an improvement of 26.98% in rotation angle error and 67.68% reduction in position error. These results convincingly demonstrate the method's superior accuracy and robustness in texture-less environments.
Abstract:Learned hierarchical B-frame coding aims to leverage bi-directional reference frames for better coding efficiency. However, the domain shift between training and test scenarios due to dataset limitations poses a challenge. This issue arises from training the codec with small groups of pictures (GOP) but testing it on large GOPs. Specifically, the motion estimation network, when trained on small GOPs, is unable to handle large motion at test time, incurring a negative impact on compression performance. To mitigate the domain shift, we present an online motion resolution adaptation (OMRA) method. It adapts the spatial resolution of video frames on a per-frame basis to suit the capability of the motion estimation network in a pre-trained B-frame codec. Our OMRA is an online, inference technique. It need not re-train the codec and is readily applicable to existing B-frame codecs that adopt hierarchical bi-directional prediction. Experimental results show that OMRA significantly enhances the compression performance of two state-of-the-art learned B-frame codecs on commonly used datasets.