Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

An Vuong

On Symbol Error Probability-based Beamforming in MIMO Gaussian Wiretap Channels

Apr 04, 2025

Nam Nguyen, An Vuong, Thuan Nguyen, Thinh Nguyen

Abstract:This paper investigates beamforming schemes designed to minimize the symbol error probability (SEP) for an authorized user while guaranteeing that the likelihood of an eavesdropper correctly recovering symbols remains below a predefined threshold. Unlike previous works that focus on maximizing secrecy capacity, our work is centered around finding an optimal beamforming vector for binary antipodal signal detection in multiple-input multiple-output (MIMO) Gaussian wiretap channels. Finding the optimal beamforming vector in this setting is challenging. Computationally efficient algorithms such as convex techniques cannot be applied to find the optimal solution. To that end, our proposed algorithm relies on Karush-Kuhn-Tucker (KKT) conditions and a generalized eigen-decomposition method to find the exact solution. In addition, we also develop an approximate, practical algorithm to find a good beamforming matrix when using M-ary detection schemes. Numerical results are presented to assess the performance of the proposed methods across various scenarios.

Via

Access Paper or Ask Questions

GraspMamba: A Mamba-based Language-driven Grasp Detection Framework with Hierarchical Feature Learning

Sep 22, 2024

Huy Hoang Nguyen, An Vuong, Anh Nguyen, Ian Reid, Minh Nhat Vu

Abstract:Grasp detection is a fundamental robotic task critical to the success of many industrial applications. However, current language-driven models for this task often struggle with cluttered images, lengthy textual descriptions, or slow inference speed. We introduce GraspMamba, a new language-driven grasp detection method that employs hierarchical feature fusion with Mamba vision to tackle these challenges. By leveraging rich visual features of the Mamba-based backbone alongside textual information, our approach effectively enhances the fusion of multimodal features. GraspMamba represents the first Mamba-based grasp detection model to extract vision and language features at multiple scales, delivering robust performance and rapid inference time. Intensive experiments show that GraspMamba outperforms recent methods by a clear margin. We validate our approach through real-world robotic experiments, highlighting its fast inference speed.

* 8 pages. Project page: https://airvlab.github.io/grasp-anything/

Via

Access Paper or Ask Questions

SDE-based Multiplicative Noise Removal

Aug 19, 2024

An Vuong, Thinh Nguyen

Figure 1 for SDE-based Multiplicative Noise Removal

Figure 2 for SDE-based Multiplicative Noise Removal

Figure 3 for SDE-based Multiplicative Noise Removal

Figure 4 for SDE-based Multiplicative Noise Removal

Abstract:Multiplicative noise, also known as speckle or pepper noise, commonly affects images produced by synthetic aperture radar (SAR), lasers, or optical lenses. Unlike additive noise, which typically arises from thermal processes or external factors, multiplicative noise is inherent to the system, originating from the fluctuation in diffuse reflections. These fluctuations result in multiple copies of the same signal with varying magnitudes being combined. Consequently, despeckling, or removing multiplicative noise, necessitates different techniques compared to those used for additive noise removal. In this paper, we propose a novel approach using Stochastic Differential Equations based diffusion models to address multiplicative noise. We demonstrate that multiplicative noise can be effectively modeled as a Geometric Brownian Motion process in the logarithmic domain. Utilizing the Fokker-Planck equation, we derive the corresponding reverse process for image denoising. To validate our method, we conduct extensive experiments on two different datasets, comparing our approach to both classical signal processing techniques and contemporary CNN-based noise removal models. Our results indicate that the proposed method significantly outperforms existing methods on perception-based metrics such as FID and LPIPS, while maintaining competitive performance on traditional metrics like PSNR and SSIM.

* 9 pages, 4 figures

Via

Access Paper or Ask Questions

Language-driven Grasp Detection with Mask-guided Attention

Jul 29, 2024

Tuan Van Vo, Minh Nhat Vu, Baoru Huang, An Vuong, Ngan Le, Thieu Vo, Anh Nguyen

Figure 1 for Language-driven Grasp Detection with Mask-guided Attention

Figure 2 for Language-driven Grasp Detection with Mask-guided Attention

Figure 3 for Language-driven Grasp Detection with Mask-guided Attention

Figure 4 for Language-driven Grasp Detection with Mask-guided Attention

Abstract:Grasp detection is an essential task in robotics with various industrial applications. However, traditional methods often struggle with occlusions and do not utilize language for grasping. Incorporating natural language into grasp detection remains a challenging task and largely unexplored. To address this gap, we propose a new method for language-driven grasp detection with mask-guided attention by utilizing the transformer attention mechanism with semantic segmentation features. Our approach integrates visual data, segmentation mask features, and natural language instructions, significantly improving grasp detection accuracy. Our work introduces a new framework for language-driven grasp detection, paving the way for language-driven robotic applications. Intensive experiments show that our method outperforms other recent baselines by a clear margin, with a 10.0% success score improvement. We further validate our method in real-world robotic experiments, confirming the effectiveness of our approach.

* Accepted at IROS 2024

Via

Access Paper or Ask Questions

Lightweight Language-driven Grasp Detection using Conditional Consistency Model

Jul 25, 2024

Nghia Nguyen, Minh Nhat Vu, Baoru Huang, An Vuong, Ngan Le, Thieu Vo, Anh Nguyen

Figure 1 for Lightweight Language-driven Grasp Detection using Conditional Consistency Model

Figure 2 for Lightweight Language-driven Grasp Detection using Conditional Consistency Model

Figure 3 for Lightweight Language-driven Grasp Detection using Conditional Consistency Model

Figure 4 for Lightweight Language-driven Grasp Detection using Conditional Consistency Model

Abstract:Language-driven grasp detection is a fundamental yet challenging task in robotics with various industrial applications. In this work, we present a new approach for language-driven grasp detection that leverages the concept of lightweight diffusion models to achieve fast inference time. By integrating diffusion processes with grasping prompts in natural language, our method can effectively encode visual and textual information, enabling more accurate and versatile grasp positioning that aligns well with the text query. To overcome the long inference time problem in diffusion models, we leverage the image and text features as the condition in the consistency model to reduce the number of denoising timesteps during inference. The intensive experimental results show that our method outperforms other recent grasp detection methods and lightweight diffusion models by a clear margin. We further validate our method in real-world robotic experiments to demonstrate its fast inference time capability.

* Accepted at IROS 2024

Via

Access Paper or Ask Questions

Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

Jul 18, 2024

Toan Nguyen, Minh Nhat Vu, Baoru Huang, An Vuong, Quan Vuong, Ngan Le, Thieu Vo, Anh Nguyen

Figure 1 for Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

Figure 2 for Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

Figure 3 for Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

Figure 4 for Language-Driven 6-DoF Grasp Detection Using Negative Prompt Guidance

Abstract:6-DoF grasp detection has been a fundamental and challenging problem in robotic vision. While previous works have focused on ensuring grasp stability, they often do not consider human intention conveyed through natural language, hindering effective collaboration between robots and users in complex 3D environments. In this paper, we present a new approach for language-driven 6-DoF grasp detection in cluttered point clouds. We first introduce Grasp-Anything-6D, a large-scale dataset for the language-driven 6-DoF grasp detection task with 1M point cloud scenes and more than 200M language-associated 3D grasp poses. We further introduce a novel diffusion model that incorporates a new negative prompt guidance learning strategy. The proposed negative prompt strategy directs the detection process toward the desired object while steering away from unwanted ones given the language input. Our method enables an end-to-end framework where humans can command the robot to grasp desired objects in a cluttered scene using natural language. Intensive experimental results show the effectiveness of our method in both benchmarking experiments and real-world scenarios, surpassing other baselines. In addition, we demonstrate the practicality of our approach in real-world robotic applications. Our project is available at https://airvlab.github.io/grasp-anything.

* Accepted at ECCV 2024

Via

Access Paper or Ask Questions

Language-driven Scene Synthesis using Multi-conditional Diffusion Model

Oct 24, 2023

An Vuong, Minh Nhat Vu, Toan Tien Nguyen, Baoru Huang, Dzung Nguyen, Thieu Vo, Anh Nguyen

Abstract:Scene synthesis is a challenging problem with several industrial applications. Recently, substantial efforts have been directed to synthesize the scene using human motions, room layouts, or spatial graphs as the input. However, few studies have addressed this problem from multiple modalities, especially combining text prompts. In this paper, we propose a language-driven scene synthesis task, which is a new task that integrates text prompts, human motion, and existing objects for scene synthesis. Unlike other single-condition synthesis tasks, our problem involves multiple conditions and requires a strategy for processing and encoding them into a unified space. To address the challenge, we present a multi-conditional diffusion model, which differs from the implicit unification approach of other diffusion literature by explicitly predicting the guiding points for the original data distribution. We demonstrate that our approach is theoretically supportive. The intensive experiment results illustrate that our method outperforms state-of-the-art benchmarks and enables natural scene editing applications. The source code and dataset can be accessed at https://lang-scene-synth.github.io/.

* Accepted to NeurIPS 2023

Via

Access Paper or Ask Questions

Open-Vocabulary Affordance Detection in 3D Point Clouds

Mar 04, 2023

Toan Nguyen, Minh Nhat Vu, An Vuong, Dzung Nguyen, Thieu Vo, Ngan Le, Anh Nguyen

Abstract:Affordance detection is a challenging problem with a wide variety of robotic applications. Traditional affordance detection methods are limited to a predefined set of affordance labels, hence potentially restricting the adaptability of intelligent robots in complex and dynamic environments. In this paper, we present the Open-Vocabulary Affordance Detection (OpenAD) method, which is capable of detecting an unbounded number of affordances in 3D point clouds. By simultaneously learning the affordance text and the point feature, OpenAD successfully exploits the semantic relationships between affordances. Therefore, our proposed method enables zero-shot detection and can detect previously unseen affordances without a single annotation example. Intensive experimental results show that OpenAD works effectively on a wide range of affordance detection setups and outperforms other baselines by a large margin. Additionally, we demonstrate the practicality of the proposed OpenAD in real-world robotic applications with a fast inference speed (~100 ms).

Via

Access Paper or Ask Questions