Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nak Young Chong

Quality-focused Active Adversarial Policy for Safe Grasping in Human-Robot Interaction

Mar 25, 2025

Chenghao Li, Razvan Beuran, Nak Young Chong

Abstract:Vision-guided robot grasping methods based on Deep Neural Networks (DNNs) have achieved remarkable success in handling unknown objects, attributable to their powerful generalizability. However, these methods with this generalizability tend to recognize the human hand and its adjacent objects as graspable targets, compromising safety during Human-Robot Interaction (HRI). In this work, we propose the Quality-focused Active Adversarial Policy (QFAAP) to solve this problem. Specifically, the first part is the Adversarial Quality Patch (AQP), wherein we design the adversarial quality patch loss and leverage the grasp dataset to optimize a patch with high quality scores. Next, we construct the Projected Quality Gradient Descent (PQGD) and integrate it with the AQP, which contains only the hand region within each real-time frame, endowing the AQP with fast adaptability to the human hand shape. Through AQP and PQGD, the hand can be actively adversarial with the surrounding objects, lowering their quality scores. Therefore, further setting the quality score of the hand to zero will reduce the grasping priority of both the hand and its adjacent objects, enabling the robot to grasp other objects away from the hand without emergency stops. We conduct extensive experiments on the benchmark datasets and a cobot, showing the effectiveness of QFAAP. Our code and demo videos are available here: https://github.com/clee-jaist/QFAAP.

Via

Access Paper or Ask Questions

Enhancing Social Robot Navigation with Integrated Motion Prediction and Trajectory Planning in Dynamic Human Environments

Nov 04, 2024

Thanh Nguyen Canh, Xiem HoangVan, Nak Young Chong

Abstract:Navigating safely in dynamic human environments is crucial for mobile service robots, and social navigation is a key aspect of this process. In this paper, we proposed an integrative approach that combines motion prediction and trajectory planning to enable safe and socially-aware robot navigation. The main idea of the proposed method is to leverage the advantages of Socially Acceptable trajectory prediction and Timed Elastic Band (TEB) by incorporating human interactive information including position, orientation, and motion into the objective function of the TEB algorithms. In addition, we designed social constraints to ensure the safety of robot navigation. The proposed system is evaluated through physical simulation using both quantitative and qualitative metrics, demonstrating its superior performance in avoiding human and dynamic obstacles, thereby ensuring safe navigation. The implementations are open source at: \url{https://github.com/thanhnguyencanh/SGan-TEB.git}

* In the 24th International Conference on Control, Automation, and Systems (ICCAS 2024), Jeju, Korea

Via

Access Paper or Ask Questions

Toward Integrating Semantic-aware Path Planning and Reliable Localization for UAV Operations

Nov 04, 2024

Thanh Nguyen Canh, Huy-Hoang Ngo, Xiem HoangVan, Nak Young Chong

Figure 1 for Toward Integrating Semantic-aware Path Planning and Reliable Localization for UAV Operations

Figure 2 for Toward Integrating Semantic-aware Path Planning and Reliable Localization for UAV Operations

Figure 3 for Toward Integrating Semantic-aware Path Planning and Reliable Localization for UAV Operations

Figure 4 for Toward Integrating Semantic-aware Path Planning and Reliable Localization for UAV Operations

Abstract:Localization is one of the most crucial tasks for Unmanned Aerial Vehicle systems (UAVs) directly impacting overall performance, which can be achieved with various sensors and applied to numerous tasks related to search and rescue operations, object tracking, construction, etc. However, due to the negative effects of challenging environments, UAVs may lose signals for localization. In this paper, we present an effective path-planning system leveraging semantic segmentation information to navigate around texture-less and problematic areas like lakes, oceans, and high-rise buildings using a monocular camera. We introduce a real-time semantic segmentation architecture and a novel keyframe decision pipeline to optimize image inputs based on pixel distribution, reducing processing time. A hierarchical planner based on the Dynamic Window Approach (DWA) algorithm, integrated with a cost map, is designed to facilitate efficient path planning. The system is implemented in a photo-realistic simulation environment using Unity, aligning with segmentation model parameters. Comprehensive qualitative and quantitative evaluations validate the effectiveness of our approach, showing significant improvements in the reliability and efficiency of UAV localization in challenging environments.

* In The 24th International Conference on Control, Automation, and Systems (ICCAS 2024), Jeju, Korea

Via

Access Paper or Ask Questions

Pyramid-Monozone Synergistic Grasping Policy in Dense Clutter

Sep 11, 2024

Chenghao Li, Nak Young Chong

Figure 1 for Pyramid-Monozone Synergistic Grasping Policy in Dense Clutter

Figure 2 for Pyramid-Monozone Synergistic Grasping Policy in Dense Clutter

Figure 3 for Pyramid-Monozone Synergistic Grasping Policy in Dense Clutter

Figure 4 for Pyramid-Monozone Synergistic Grasping Policy in Dense Clutter

Abstract:Grasping a diverse range of novel objects from dense clutter poses a great challenge to robots because of the occlusion among these objects. In this work, we propose the Pyramid-Monozone Synergistic Grasping Policy (PMSGP) that enables robots to cleverly avoid most occlusions during grasping. Specifically, we initially construct the Pyramid Se quencing Policy (PSP) to sequence each object in the scene into a pyramid structure. By isolating objects layer-by-layer, the grasp candidates will focus on a single layer during each grasp. Then, we devise the Monozone Sampling Policy (MSP) to sample the grasp candidates in the top layer. Through this manner, each grasp will target the topmost object, thereby effectively avoiding most occlusions. We perform more than 7000 real world grasping among 300 novel objects in dense clutter scenes, demonstrating that PMSGP significantly outperforms seven competitive grasping methods. All grasping videos are available at: https://www.youtube.com/@chenghaoli4532/playlists.

Via

Access Paper or Ask Questions

S-CycleGAN: Semantic Segmentation Enhanced CT-Ultrasound Image-to-Image Translation for Robotic Ultrasonography

Jun 03, 2024

Yuhan Song, Nak Young Chong

Abstract:Ultrasound imaging is pivotal in various medical diagnoses due to its non-invasive nature and safety. In clinical practice, the accuracy and precision of ultrasound image analysis are critical. Recent advancements in deep learning are showing great capacity of processing medical images. However, the data hungry nature of deep learning and the shortage of high-quality ultrasound image training data suppress the development of deep learning based ultrasound analysis methods. To address these challenges, we introduce an advanced deep learning model, dubbed S-CycleGAN, which generates high-quality synthetic ultrasound images from computed tomography (CT) data. This model incorporates semantic discriminators within a CycleGAN framework to ensure that critical anatomical details are preserved during the style transfer process. The synthetic images produced are used to augment training datasets for semantic segmentation models and robot-assisted ultrasound scanning system development, enhancing their ability to accurately parse real ultrasound imagery.

* This paper is submitted to 2024 IEEE International Conference on Cyborg and Bionic Systems, and still under review

Via

Access Paper or Ask Questions

Object-Oriented Semantic Mapping for Reliable UAVs Navigation

Jan 16, 2024

Thanh Nguyen Canh, Armagan Elibol, Nak Young Chong, Xiem HoangVan

Abstract:To autonomously navigate in real-world environments, special in search and rescue operations, Unmanned Aerial Vehicles (UAVs) necessitate comprehensive maps to ensure safety. However, the prevalent metric map often lacks semantic information crucial for holistic scene comprehension. In this paper, we proposed a system to construct a probabilistic metric map enriched with object information extracted from the environment from RGB-D images. Our approach combines a state-of-the-art YOLOv8-based object detection framework at the front end and a 2D SLAM method - CartoGrapher at the back end. To effectively track and position semantic object classes extracted from the front-end interface, we employ the innovative BoT-SORT methodology. A novel association method is introduced to extract the position of objects and then project it with the metric map. Unlike previous research, our approach takes into reliable navigating in the environment with various hollow bottom objects. The output of our system is a probabilistic map, which significantly enhances the map's representation by incorporating object-specific attributes, encompassing class distinctions, accurate positioning, and object heights. A number of experiments have been conducted to evaluate our proposed approach. The results show that the robot can effectively produce augmented semantic maps containing several objects (notably chairs and desks). Furthermore, our system is evaluated within an embedded computer - Jetson Xavier AGX unit to demonstrate the use case in real-world applications.

* In the 12th International Conference on Control, Automation and Information Sciences (ICCAIS 2023), Hanoi, Vietnam

Via

Access Paper or Ask Questions

S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera

Jan 16, 2024

Thanh Nguyen Canh, Van-Truong Nguyen, Xiem HoangVan, Armagan Elibol, Nak Young Chong

Figure 1 for S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera

Figure 2 for S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera

Figure 3 for S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera

Figure 4 for S3M: Semantic Segmentation Sparse Mapping for UAVs with RGB-D Camera

Abstract:Unmanned Aerial Vehicles (UAVs) hold immense potential for critical applications, such as search and rescue operations, where accurate perception of indoor environments is paramount. However, the concurrent amalgamation of localization, 3D reconstruction, and semantic segmentation presents a notable hurdle, especially in the context of UAVs equipped with constrained power and computational resources. This paper presents a novel approach to address challenges in semantic information extraction and utilization within UAV operations. Our system integrates state-of-the-art visual SLAM to estimate a comprehensive 6-DoF pose and advanced object segmentation methods at the back end. To improve the computational and storage efficiency of the framework, we adopt a streamlined voxel-based 3D map representation - OctoMap to build a working system. Furthermore, the fusion algorithm is incorporated to obtain the semantic information of each frame from the front-end SLAM task, and the corresponding point. By leveraging semantic information, our framework enhances the UAV's ability to perceive and navigate through indoor spaces, addressing challenges in pose estimation accuracy and uncertainty reduction. Through Gazebo simulations, we validate the efficacy of our proposed system and successfully embed our approach into a Jetson Xavier AGX unit for real-world applications.

* In The 2024 IEEE/SICE International Symposium on System Integration (SII2024), Ha Long, Vietnam

Via

Access Paper or Ask Questions

Abdominal Multi-Organ Segmentation Based on Feature Pyramid Network and Spatial Recurrent Neural Network

Aug 29, 2023

Yuhan Song, Armagan Elibol, Nak Young Chong

Abstract:As recent advances in AI are causing the decline of conventional diagnostic methods, the realization of end-to-end diagnosis is fast approaching. Ultrasound image segmentation is an important step in the diagnostic process. An accurate and robust segmentation model accelerates the process and reduces the burden of sonographers. In contrast to previous research, we take two inherent features of ultrasound images into consideration: (1) different organs and tissues vary in spatial sizes, (2) the anatomical structures inside human body form a relatively constant spatial relationship. Based on those two ideas, we propose a new image segmentation model combining Feature Pyramid Network (FPN) and Spatial Recurrent Neural Network (SRNN). We discuss why we use FPN to extract anatomical structures of different scales and how SRNN is implemented to extract the spatial context features in abdominal ultrasound images.

* IFAC World Congress 2023 paper

Via

Access Paper or Ask Questions

Does Deep Learning REALLY Outperform Non-deep Machine Learning for Clinical Prediction on Physiological Time Series?

Nov 11, 2022

Ke Liao, Wei Wang, Armagan Elibol, Lingzhong Meng, Xu Zhao, Nak Young Chong

Abstract:Machine learning has been widely used in healthcare applications to approximate complex models, for clinical diagnosis, prognosis, and treatment. As deep learning has the outstanding ability to extract information from time series, its true capabilities on sparse, irregularly sampled, multivariate, and imbalanced physiological data are not yet fully explored. In this paper, we systematically examine the performance of machine learning models for the clinical prediction task based on the EHR, especially physiological time series. We choose Physionet 2019 challenge public dataset to predict Sepsis outcomes in ICU units. Ten baseline machine learning models are compared, including 3 deep learning methods and 7 non-deep learning methods, commonly used in the clinical prediction domain. Nine evaluation metrics with specific clinical implications are used to assess the performance of models. Besides, we sub-sample training dataset sizes and use learning curve fit to investigate the impact of the training dataset size on the performance of the machine learning models. We also propose the general pre-processing method for the physiology time-series data and use Dice Loss to deal with the dataset imbalanced problem. The results show that deep learning indeed outperforms non-deep learning, but with certain conditions: firstly, evaluating with some particular evaluation metrics (AUROC, AUPRC, Sensitivity, and FNR), but not others; secondly, the training dataset size is large enough (with an estimation of a magnitude of thousands).

Via

Access Paper or Ask Questions

Mean Spectral Normalization of Deep Neural Networks for Embedded Automation

Jul 09, 2019

Anand Krishnamoorthy Subramanian, Nak Young Chong

Figure 1 for Mean Spectral Normalization of Deep Neural Networks for Embedded Automation

Figure 2 for Mean Spectral Normalization of Deep Neural Networks for Embedded Automation

Figure 3 for Mean Spectral Normalization of Deep Neural Networks for Embedded Automation

Figure 4 for Mean Spectral Normalization of Deep Neural Networks for Embedded Automation

Abstract:Deep Neural Networks (DNNs) have begun to thrive in the field of automation systems, owing to the recent advancements in standardising various aspects such as architecture, optimization techniques, and regularization. In this paper, we take a step towards a better understanding of Spectral Normalization (SN) and its potential for standardizing regularization of a wider range of Deep Learning models, following an empirical approach. We conduct several experiments to study their training dynamics, in comparison with the ubiquitous Batch Normalization (BN) and show that SN increases the gradient sparsity and controls the gradient variance. Furthermore, we show that SN suffers from a phenomenon, we call the mean-drift effect, which mitigates its performance. We, then, propose a weight reparameterization called as the Mean Spectral Normalization (MSN) to resolve the mean drift, thereby significantly improving the network's performance. Our model performs ~16% faster as compared to BN in practice, and has fewer trainable parameters. We also show the performance of our MSN for small, medium, and large CNNs - 3-layer CNN, VGG7 and DenseNet-BC, respectively - and unsupervised image generation tasks using Generative Adversarial Networks (GANs) to evaluate its applicability for a broad range of embedded automation tasks.

* 8 pagesm IEEE CASE 2019

Via

Access Paper or Ask Questions