Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiahao Ma

SOAF: Scene Occlusion-aware Neural Acoustic Field

Jul 02, 2024

Huiyu Gao, Jiahao Ma, David Ahmedt-Aristizabal, Chuong Nguyen, Miaomiao Liu

Abstract:This paper tackles the problem of novel view audio-visual synthesis along an arbitrary trajectory in an indoor scene, given the audio-video recordings from other known trajectories of the scene. Existing methods often overlook the effect of room geometry, particularly wall occlusion to sound propagation, making them less accurate in multi-room environments. In this work, we propose a new approach called Scene Occlusion-aware Acoustic Field (SOAF) for accurate sound generation. Our approach derives a prior for sound energy field using distance-aware parametric sound-propagation modelling and then transforms it based on scene transmittance learned from the input video. We extract features from the local acoustic field centred around the receiver using a Fibonacci Sphere to generate binaural audio for novel views with a direction-aware attention mechanism. Extensive experiments on the real dataset~\emph{RWAVS} and the synthetic dataset~\emph{SoundSpaces} demonstrate that our method outperforms previous state-of-the-art techniques in audio generation. Project page: https://github.com/huiyu-gao/SOAF/.

Via

Access Paper or Ask Questions

HashPoint: Accelerated Point Searching and Sampling for Neural Rendering

Apr 22, 2024

Jiahao Ma, Miaomiao Liu, David Ahmedt-Aristizaba, Chuong Nguyen

Abstract:In this paper, we address the problem of efficient point searching and sampling for volume neural rendering. Within this realm, two typical approaches are employed: rasterization and ray tracing. The rasterization-based methods enable real-time rendering at the cost of increased memory and lower fidelity. In contrast, the ray-tracing-based methods yield superior quality but demand longer rendering time. We solve this problem by our HashPoint method combining these two strategies, leveraging rasterization for efficient point searching and sampling, and ray marching for rendering. Our method optimizes point searching by rasterizing points within the camera's view, organizing them in a hash table, and facilitating rapid searches. Notably, we accelerate the rendering process by adaptive sampling on the primary surface encountered by the ray. Our approach yields substantial speed-up for a range of state-of-the-art ray-tracing-based methods, maintaining equivalent or superior accuracy across synthetic and real test datasets. The code will be available at https://jiahao-ma.github.io/hashpoint/.

* The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2024
* CVPR2024 Highlight

Via

Access Paper or Ask Questions

Simulation and application of COVID-19 compartment model using physic-informed neural network

Aug 04, 2022

Jinhuan Ke, Jiahao Ma, Xiyu Yin

Figure 1 for Simulation and application of COVID-19 compartment model using physic-informed neural network

Figure 2 for Simulation and application of COVID-19 compartment model using physic-informed neural network

Figure 3 for Simulation and application of COVID-19 compartment model using physic-informed neural network

Figure 4 for Simulation and application of COVID-19 compartment model using physic-informed neural network

Abstract:In this work, SVEIDR model and its variants (Aged, Vaccination-structured models) are introduced to encode the effect of social contact for different age groups and vaccination status. Then we implement the Physic-Informed Neural Network on both simulation and real-world data. Results including the spread and forecasting analysis of COVID-19 learned from the neural network are shown in the paper.

Via

Access Paper or Ask Questions

Multiview Detection with Cardboard Human Modeling

Jul 10, 2022

Jiahao Ma, Zicheng Duan, Yunzhong Hou, Liang Zheng, Chuong Nguyen

Figure 1 for Multiview Detection with Cardboard Human Modeling

Figure 2 for Multiview Detection with Cardboard Human Modeling

Figure 3 for Multiview Detection with Cardboard Human Modeling

Figure 4 for Multiview Detection with Cardboard Human Modeling

Abstract:Multiview detection uses multiple calibrated cameras with overlapping fields of views to locate occluded pedestrians. In this field, existing methods typically adopt a "human modeling - aggregation" strategy. To find robust pedestrian representations, some intuitively use locations of detected 2D bounding boxes, while others use entire frame features projected to the ground plane. However, the former does not consider human appearance and leads to many ambiguities, and the latter suffers from projection errors due to the lack of accurate height of the human torso and head. In this paper, we propose a new pedestrian representation scheme based on human point clouds modeling. Specifically, using ray tracing for holistic human depth estimation, we model pedestrians as upright, thin cardboard point clouds on the ground. Then, we aggregate the point clouds of the pedestrian cardboard across multiple views for a final decision. Compared with existing representations, the proposed method explicitly leverages human appearance and reduces projection errors significantly by relatively accurate height estimation. On two standard evaluation benchmarks, the proposed method achieves very competitive results.

* The thesis is not perfect enough

Via

Access Paper or Ask Questions

Voxelized 3D Feature Aggregation for Multiview Detection

Dec 07, 2021

Jiahao Ma, Jinguang Tong, Shan Wang, Wei Zhao, Liang Zheng, Chuong Nguyen

Figure 1 for Voxelized 3D Feature Aggregation for Multiview Detection

Figure 2 for Voxelized 3D Feature Aggregation for Multiview Detection

Figure 3 for Voxelized 3D Feature Aggregation for Multiview Detection

Figure 4 for Voxelized 3D Feature Aggregation for Multiview Detection

Abstract:Multi-view detection incorporates multiple camera views to alleviate occlusion in crowded scenes, where the state-of-the-art approaches adopt homography transformations to project multi-view features to the ground plane. However, we find that these 2D transformations do not take into account the object's height, and with this neglection features along the vertical direction of same object are likely not projected onto the same ground plane point, leading to impure ground-plane features. To solve this problem, we propose VFA, voxelized 3D feature aggregation, for feature transformation and aggregation in multi-view detection. Specifically, we voxelize the 3D space, project the voxels onto each camera view, and associate 2D features with these projected voxels. This allows us to identify and then aggregate 2D features along the same vertical line, alleviating projection distortions to a large extent. Additionally, because different kinds of objects (human vs. cattle) have different shapes on the ground plane, we introduce the oriented Gaussian encoding to match such shapes, leading to increased accuracy and efficiency. We perform experiments on multiview 2D detection and multiview 3D detection problems. Results on four datasets (including a newly introduced MultiviewC dataset) show that our system is very competitive compared with the state-of-the-art approaches. %Our code and data will be open-sourced.Code and MultiviewC are released at https://github.com/Robert-Mar/VFA.

Via

Access Paper or Ask Questions