Abstract:Packet loss concealment (PLC) is a tool for enhancing speech degradation caused by poor network conditions or underflow/overflow in audio processing pipelines. We propose a real-time recurrent method that leverages previous outputs to mitigate artefact of lost packets without the prior knowledge of loss mask. The proposed full-band recurrent network (FRN) model operates at 48 kHz, which is suitable for high-quality telecommunication applications. Experiment results highlight the superiority of FRN over an offline non-causal baseline and a top performer in a recent PLC challenge.
Abstract:We introduce a block-online variant of the temporal feature-wise linear modulation (TFiLM) model to achieve bandwidth extension. The proposed architecture simplifies the UNet backbone of the TFiLM to reduce inference time and employs an efficient transformer at the bottleneck to alleviate performance degradation. We also utilize self-supervised pretraining and data augmentation to enhance the quality of bandwidth extended signals and reduce the sensitivity with respect to downsampling methods. Experiment results on the VCTK dataset show that the proposed method outperforms several recent baselines in terms of spectral distance and source-to-distortion ratio. Pretraining and filter augmentation also help stabilize and enhance the overall performance.
Abstract:In this paper, we categorize fine-grained images without using any object / part annotation neither in the training nor in the testing stage, a step towards making it suitable for deployments. Fine-grained image categorization aims to classify objects with subtle distinctions. Most existing works heavily rely on object / part detectors to build the correspondence between object parts by using object or object part annotations inside training images. The need for expensive object annotations prevents the wide usage of these methods. Instead, we propose to select useful parts from multi-scale part proposals in objects, and use them to compute a global image representation for categorization. This is specially designed for the annotation-free fine-grained categorization task, because useful parts have shown to play an important role in existing annotation-dependent works but accurate part detectors can be hardly acquired. With the proposed image representation, we can further detect and visualize the key (most discriminative) parts in objects of different classes. In the experiment, the proposed annotation-free method achieves better accuracy than that of state-of-the-art annotation-free and most existing annotation-dependent methods on two challenging datasets, which shows that it is not always necessary to use accurate object / part annotations in fine-grained image categorization.