Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Peilin Li

GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module

Sep 17, 2024

Yichen Zhang, Zihan Wang, Jiali Han, Peilin Li, Jiaxun Zhang, Jianqiang Wang, Lei He, Keqiang Li

Figure 1 for GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module

Figure 2 for GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module

Figure 3 for GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module

Figure 4 for GS-Net: Generalizable Plug-and-Play 3D Gaussian Splatting Module

Abstract:3D Gaussian Splatting (3DGS) integrates the strengths of primitive-based representations and volumetric rendering techniques, enabling real-time, high-quality rendering. However, 3DGS models typically overfit to single-scene training and are highly sensitive to the initialization of Gaussian ellipsoids, heuristically derived from Structure from Motion (SfM) point clouds, which limits both generalization and practicality. To address these limitations, we propose GS-Net, a generalizable, plug-and-play 3DGS module that densifies Gaussian ellipsoids from sparse SfM point clouds, enhancing geometric structure representation. To the best of our knowledge, GS-Net is the first plug-and-play 3DGS module with cross-scene generalization capabilities. Additionally, we introduce the CARLA-NVS dataset, which incorporates additional camera viewpoints to thoroughly evaluate reconstruction and rendering quality. Extensive experiments demonstrate that applying GS-Net to 3DGS yields a PSNR improvement of 2.08 dB for conventional viewpoints and 1.86 dB for novel viewpoints, confirming the method's effectiveness and robustness.

Via

Access Paper or Ask Questions

HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription

Aug 31, 2022

Weixing Wei, Peilin Li, Yi Yu, Wei Li

Figure 1 for HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription

Figure 2 for HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription

Figure 3 for HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription

Figure 4 for HPPNet: Modeling the Harmonic Structure and Pitch Invariance in Piano Transcription

Abstract:While neural network models are making significant progress in piano transcription, they are becoming more resource-consuming due to requiring larger model size and more computing power. In this paper, we attempt to apply more prior about piano to reduce model size and improve the transcription performance. The sound of a piano note contains various overtones, and the pitch of a key does not change over time. To make full use of such latent information, we propose HPPNet that using the Harmonic Dilated Convolution to capture the harmonic structures and the Frequency Grouped Recurrent Neural Network to model the pitch-invariance over time. Experimental results on the MAESTRO dataset show that our piano transcription system achieves state-of-the-art performance both in frame and note scores (frame F1 93.15%, note F1 97.18%). Moreover, the model size is much smaller than the previous state-of-the-art deep learning models.

* Accepted to ISMIR 2022

Via

Access Paper or Ask Questions

HarmoF0: Logarithmic Scale Dilated Convolution For Pitch Estimation

May 02, 2022

Weixing Wei, Peilin Li, Yi Yu, Wei Li

Figure 1 for HarmoF0: Logarithmic Scale Dilated Convolution For Pitch Estimation

Figure 2 for HarmoF0: Logarithmic Scale Dilated Convolution For Pitch Estimation

Figure 3 for HarmoF0: Logarithmic Scale Dilated Convolution For Pitch Estimation

Figure 4 for HarmoF0: Logarithmic Scale Dilated Convolution For Pitch Estimation

Abstract:Sounds, especially music, contain various harmonic components scattered in the frequency dimension. It is difficult for normal convolutional neural networks to observe these overtones. This paper introduces a multiple rates dilated causal convolution (MRDC-Conv) method to capture the harmonic structure in logarithmic scale spectrograms efficiently. The harmonic is helpful for pitch estimation, which is important for many sound processing applications. We propose HarmoF0, a fully convolutional network, to evaluate the MRDC-Conv and other dilated convolutions in pitch estimation. The results show that this model outperforms the DeepF0, yields state-of-the-art performance in three datasets, and simultaneously reduces more than 90% parameters. We also find that it has stronger noise resistance and fewer octave errors.

* This paper is accepted by ICME2022

Via

Access Paper or Ask Questions