Abstract:The deployment of in-air acoustic sensors for industrial monitoring and autonomous robotics has grown significantly, often drawing inspiration from biological echolocation. However, developing and validating these systems in existing simulation frameworks remains challenging due to the computational cost of simulating high-frequency wave propagation in large, dynamic, and complex environments. While wave-based methods offer high accuracy, they scale poorly with frequency and volume. Conversely, existing geometric acoustic solvers often lack support for dynamic scenes, complex diffraction, or closed-loop robotic integration. In this work, we introduce SonoTraceUE, a high-fidelity acoustic simulation framework built as a plugin for Unreal Engine. By using a hardware-accelerated ray tracing-based specular reflection model, and a curvature-based Monte Carlo diffraction model, the system enables near real-time simulation of active and passive acoustic sensing in dynamic, multi-material environments. We validate the framework through two distinct experimental domains: a bioacoustic study and a robotics experiment. Our results demonstrate that SonoTraceUE achieves high correlation with real-world spectral and spatial data. The framework provides a versatile platform for synthetic data generation, hypothesis testing in bioacoustics, and the rapid prototyping of closed-loop robotic systems that use acoustic sensing.
Abstract:Procedural generation techniques in 3D rendering engines have revolutionized the creation of complex environments, reducing reliance on manual design. Recent approaches using Large Language Models (LLMs) for 3D scene generation show promise but often lack domain-specific reasoning, verification mechanisms, and modular design. These limitations lead to reduced control and poor scalability. This paper investigates the use of LLMs to generate agricultural synthetic simulation environments from natural language prompts, specifically to address the limitations of lacking domain-specific reasoning, verification mechanisms, and modular design. A modular multi-LLM pipeline was developed, integrating 3D asset retrieval, domain knowledge injection, and code generation for the Unreal rendering engine using its API. This results in a 3D environment with realistic planting layouts and environmental context, all based on the input prompt and the domain knowledge. To enhance accuracy and scalability, the system employs a hybrid strategy combining LLM optimization techniques such as few-shot prompting, Retrieval-Augmented Generation (RAG), finetuning, and validation. Unlike monolithic models, the modular architecture enables structured data handling, intermediate verification, and flexible expansion. The system was evaluated using structured prompts and semantic accuracy metrics. A user study assessed realism and familiarity against real-world images, while an expert comparison demonstrated significant time savings over manual scene design. The results confirm the effectiveness of multi-LLM pipelines in automating domain-specific 3D scene generation with improved reliability and precision. Future work will explore expanding the asset hierarchy, incorporating real-time generation, and adapting the pipeline to other simulation domains beyond agriculture.
Abstract:In-air acoustic imaging systems demand beamforming techniques that offer a high dynamic range and spatial resolution while also remaining robust. Conventional Delay-and-Sum (DAS) beamforming fails to meet these quality demands due to high sidelobes, a wide main lobe and the resulting low contrast, whereas advanced adaptive methods are typically precluded by the computational cost and the single-snapshot constraint of real-time field operation. To overcome this trade-off, we propose and detail the implementation of higher-order non-linear beamforming methods using the Delay-Multiply-and-Sum technique, coupled with Coherence Factor weighting, specifically adapted for ultrasonic in-air microphone arrays. Our efficient implementation allows for enabling GPU-accelerated, real-time performance on embedded computing platforms. Through validation against the DAS baseline using simulated and real-world acoustic data, we demonstrate that the proposed method provides significant improvements in image contrast, establishing higher-order non-linear beamforming as a practical, high-performance solution for in-air acoustic imaging.
Abstract:This paper presents a novel software-based approach to stabilizing the acoustic images for in-air 3D sonars. Due to uneven terrain, traditional static beamforming techniques can be misaligned, causing inaccurate measurements and imaging artifacts. Furthermore, mechanical stabilization can be more costly and prone to failure. We propose using an adaptive conventional beamforming approach by fusing it with real-time IMU data to adjust the sonar array's steering matrix dynamically based on the elevation tilt angle caused by the uneven ground. Additionally, we propose gaining compensation to offset emission energy loss due to the transducer's directivity pattern and validate our approach through various experiments, which show significant improvements in temporal consistency in the acoustic images. We implemented a GPU-accelerated software system that operates in real-time with an average execution time of 210ms, meeting autonomous navigation requirements.
Abstract:In challenging environments where traditional sensing modalities struggle, in-air sonar offers resilience to optical interference. Placing a priori known landmarks in these environments can eliminate accumulated errors in autonomous mobile systems such as Simultaneous Localization and Mapping (SLAM) and autonomous navigation. We present a novel approach using a convolutional neural network to detect and classify ten different reflector landmarks with varying radii using in-air 3D sonar. Additionally, the network predicts the orientation angle of the detected landmarks. The neural network is trained on cochleograms, representing echoes received by the sensor in a time-frequency domain. Experimental results in cluttered indoor settings show promising performance. The CNN achieves a 97.3% classification accuracy on the test dataset, accurately detecting both the presence and absence of landmarks. Moreover, the network predicts landmark orientation angles with an RMSE lower than 10 degrees, enhancing the utility in SLAM and autonomous navigation applications. This advancement improves the robustness and accuracy of autonomous systems in challenging environments.




Abstract:The predictive brain hypothesis suggests that perception can be interpreted as the process of minimizing the error between predicted perception tokens generated by an internal world model and actual sensory input tokens. When implementing working examples of this hypothesis in the context of in-air sonar, significant difficulties arise due to the sparse nature of the reflection model that governs ultrasonic sensing. Despite these challenges, creating consistent world models using sonar data is crucial for implementing predictive processing of ultrasound data in robotics. In an effort to enable robust robot behavior using ultrasound as the sole exteroceptive sensor modality, this paper introduces EchoPT, a pretrained transformer architecture designed to predict 2D sonar images from previous sensory data and robot ego-motion information. We detail the transformer architecture that drives EchoPT and compare the performance of our model to several state-of-the-art techniques. In addition to presenting and evaluating our EchoPT model, we demonstrate the effectiveness of this predictive perception approach in two robotic tasks.
Abstract:Echolocation is the prime sensing modality for many species of bats, who show the intricate ability to perform a plethora of tasks in complex and unstructured environments. Understanding this exceptional feat of sensorimotor interaction is a key aspect into building more robust and performant man-made sonar sensors. In order to better understand the underlying perception mechanisms it is important to get a good insight into the nature of the reflected signals that the bat perceives. While ensonification experiments are in important way to better understand the nature of these signals, they are as time-consuming to perform as they are informative. In this paper we present SonoTraceLab, an open-source software package for simulating both technical as well as biological sonar systems in complex scenes. Using simulation approaches can drastically increase insights into the nature of biological echolocation systems, while reducing the time- and material complexity of performing them.
Abstract:Various autonomous applications rely on recognizing specific known landmarks in their environment. For example, Simultaneous Localization And Mapping (SLAM) is an important technique that lays the foundation for many common tasks, such as navigation and long-term object tracking. This entails building a map on the go based on sensory inputs which are prone to accumulating errors. Recognizing landmarks in the environment plays a vital role in correcting these errors and further improving the accuracy of SLAM. The most popular choice of sensors for conducting SLAM today is optical sensors such as cameras or LiDAR sensors. These can use landmarks such as QR codes as a prerequisite. However, such sensors become unreliable in certain conditions, e.g., foggy, dusty, reflective, or glass-rich environments. Sonar has proven to be a viable alternative to manage such situations better. However, acoustic sensors also require a different type of landmark. In this paper, we put forward a method to detect the presence of bio-mimetic acoustic landmarks using support vector machines trained on the frequency bands of the reflecting acoustic echoes using an embedded real-time imaging sonar.




Abstract:Within academia and industry, there has been a need for expansive simulation frameworks that include model-based simulation of sensors, mobile vehicles, and the environment around them. To this end, the modular, real-time, and open-source AirSim framework has been a popular community-built system that fulfills some of those needs. However, the framework required adding systems to serve some complex industrial applications, including designing and testing new sensor modalities, Simultaneous Localization And Mapping (SLAM), autonomous navigation algorithms, and transfer learning with machine learning models. In this work, we discuss the modification and additions to our open-source version of the AirSim simulation framework, including new sensor modalities, vehicle types, and methods to generate realistic environments with changeable objects procedurally. Furthermore, we show the various applications and use cases the framework can serve.
Abstract:For autonomous navigation and robotic applications, sensing the environment correctly is crucial. Many sensing modalities for this purpose exist. In recent years, one such modality that is being used is in-air imaging sonar. It is ideal in complex environments with rough conditions such as dust or fog. However, like with most sensing modalities, to sense the full environment around the mobile platform, multiple such sensors are needed to capture the full 360-degree range. Currently the processing algorithms used to create this data are insufficient to do so for multiple sensors at a reasonably fast update rate. Furthermore, a flexible and robust framework is needed to easily implement multiple imaging sonar sensors into any setup and serve multiple application types for the data. In this paper we present a sensor network framework designed for this novel sensing modality. Furthermore, an implementation of the processing algorithm on a Graphics Processing Unit is proposed to potentially decrease the computing time to allow for real-time processing of one or more imaging sonar sensors at a sufficiently high update rate.