Abstract:We describe an end-to-end speech synthesis system that uses generative adversarial training. We train our Vocoder for raw phoneme-to-audio conversion, using explicit phonetic, pitch and duration modeling. We experiment with several pre-trained models for contextualized and decontextualized word embeddings and we introduce a new method for highly expressive character voice matching, based on discreet style tokens.
Abstract:Rectifying the orientation of images represents a daily task for every photographer. This task may be complicated even for the human eye, especially when the horizon or other horizontal and vertical lines in the image are missing. In this paper we address this problem and propose a new deep learning network specially adapted for image rotation correction: we introduce the rectangle-shaped depthwise convolutions which are specialized in detecting long lines from the image and a new adapted loss function that addresses the problem of orientation errors. Compared to other methods that are able to detect rotation errors only on few image categories, like man-made structures, the proposed method can be used on a larger variety of photographs e.g., portraits, landscapes, sport, night photos etc. Moreover, the model is adapted to mobile devices and can be run in real time, both for pictures and for videos. An extensive evaluation of our model on different datasets shows that it remarkably generalizes, not being dependent on any particular type of image. Finally, we significantly outperform the state-of-the-art methods, providing superior results.
Abstract:Taking pictures through glass windows almost always produces undesired reflections that degrade the quality of the photo. The ill-posed nature of the reflection removal problem reached the attention of many researchers for more than decades. The main challenge of this problem is the lack of real training data and the necessity of generating realistic synthetic data. In this paper, we proposed a single image reflection removal method based on context understanding modules and adversarial training to efficiently restore the transmission layer without reflection. We also propose a complex data generation model in order to create a large training set with various type of reflections. Our proposed reflection removal method outperforms state-of-the-art methods in terms of PSNR and SSIM on the SIR benchmark dataset.