Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Vivian Hu

ReactIE: Enhancing Chemical Reaction Extraction with Weak Supervision

Jul 04, 2023

Ming Zhong, Siru Ouyang, Minhao Jiang, Vivian Hu, Yizhu Jiao, Xuan Wang, Jiawei Han

Abstract:Structured chemical reaction information plays a vital role for chemists engaged in laboratory work and advanced endeavors such as computer-aided drug design. Despite the importance of extracting structured reactions from scientific literature, data annotation for this purpose is cost-prohibitive due to the significant labor required from domain experts. Consequently, the scarcity of sufficient training data poses an obstacle to the progress of related models in this domain. In this paper, we propose ReactIE, which combines two weakly supervised approaches for pre-training. Our method utilizes frequent patterns within the text as linguistic cues to identify specific characteristics of chemical reactions. Additionally, we adopt synthetic data from patent records as distant supervision to incorporate domain knowledge into the model. Experiments demonstrate that ReactIE achieves substantial improvements and outperforms all existing baselines.

* Findings of ACL 2023, Short Paper

Via

Access Paper or Ask Questions

Ensemble prosody prediction for expressive speech synthesis

Apr 03, 2023

Tian Huey Teh, Vivian Hu, Devang S Ram Mohan, Zack Hodari, Christopher G. R. Wallis, Tomás Gomez Ibarrondo, Alexandra Torresquintero, James Leoni, Mark Gales, Simon King

Figure 1 for Ensemble prosody prediction for expressive speech synthesis

Figure 2 for Ensemble prosody prediction for expressive speech synthesis

Figure 3 for Ensemble prosody prediction for expressive speech synthesis

Figure 4 for Ensemble prosody prediction for expressive speech synthesis

Abstract:Generating expressive speech with rich and varied prosody continues to be a challenge for Text-to-Speech. Most efforts have focused on sophisticated neural architectures intended to better model the data distribution. Yet, in evaluations it is generally found that no single model is preferred for all input texts. This suggests an approach that has rarely been used before for Text-to-Speech: an ensemble of models. We apply ensemble learning to prosody prediction. We construct simple ensembles of prosody predictors by varying either model architecture or model parameter values. To automatically select amongst the models in the ensemble when performing Text-to-Speech, we propose a novel, and computationally trivial, variance-based criterion. We demonstrate that even a small ensemble of prosody predictors yields useful diversity, which, combined with the proposed selection criterion, outperforms any individual model from the ensemble.

* ICASSP 2023

Via

Access Paper or Ask Questions

Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis

Jun 15, 2021

Devang S Ram Mohan, Vivian Hu, Tian Huey Teh, Alexandra Torresquintero, Christopher G. R. Wallis, Marlene Staib, Lorenzo Foglianti, Jiameng Gao, Simon King

Figure 1 for Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis

Figure 2 for Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis

Figure 3 for Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis

Figure 4 for Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis

Abstract:Text does not fully specify the spoken form, so text-to-speech models must be able to learn from speech data that vary in ways not explained by the corresponding text. One way to reduce the amount of unexplained variation in training data is to provide acoustic information as an additional learning signal. When generating speech, modifying this acoustic information enables multiple distinct renditions of a text to be produced. Since much of the unexplained variation is in the prosody, we propose a model that generates speech explicitly conditioned on the three primary acoustic correlates of prosody: $F_{0}$, energy and duration. The model is flexible about how the values of these features are specified: they can be externally provided, or predicted from text, or predicted then subsequently modified. Compared to a model that employs a variational auto-encoder to learn unsupervised latent features, our model provides more interpretable, temporally-precise, and disentangled control. When automatically predicting the acoustic features from text, it generates speech that is more natural than that from a Tacotron 2 model with reference encoder. Subsequent human-in-the-loop modification of the predicted acoustic features can significantly further increase naturalness.

* To be published in Interspeech 2021. 5 pages, 4 figures

Via

Access Paper or Ask Questions

ADEPT: A Dataset for Evaluating Prosody Transfer

Jun 15, 2021

Alexandra Torresquintero, Tian Huey Teh, Christopher G. R. Wallis, Marlene Staib, Devang S Ram Mohan, Vivian Hu, Lorenzo Foglianti, Jiameng Gao, Simon King

Figure 1 for ADEPT: A Dataset for Evaluating Prosody Transfer

Figure 2 for ADEPT: A Dataset for Evaluating Prosody Transfer

Figure 3 for ADEPT: A Dataset for Evaluating Prosody Transfer

Abstract:Text-to-speech is now able to achieve near-human naturalness and research focus has shifted to increasing expressivity. One popular method is to transfer the prosody from a reference speech sample. There have been considerable advances in using prosody transfer to generate more expressive speech, but the field lacks a clear definition of what successful prosody transfer means and a method for measuring it. We introduce a dataset of prosodically-varied reference natural speech samples for evaluating prosody transfer. The samples include global variations reflecting emotion and interpersonal attitude, and local variations reflecting topical emphasis, propositional attitude, syntactic phrasing and marked tonicity. The corpus only includes prosodic variations that listeners are able to distinguish with reasonable accuracy, and we report these figures as a benchmark against which text-to-speech prosody transfer can be compared. We conclude the paper with a demonstration of our proposed evaluation methodology, using the corpus to evaluate two text-to-speech models that perform prosody transfer.

* 5 pages, 1 figure, accepted to Interspeech 2021

Via

Access Paper or Ask Questions