Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yaoming Zhu

Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation

Dec 20, 2022

Yaoming Zhu, Zewei Sun, Shanbo Cheng, Yuyang Huang, Liwei Wu, Mingxuan Wang

Abstract:Multimodal machine translation (MMT) aims to improve translation quality by incorporating information from other modalities, such as vision. Previous MMT systems mainly focus on better access and use of visual information and tend to validate their methods on image-related datasets. These studies face two challenges. First, they can only utilize triple data (bilingual texts with images), which is scarce; second, current benchmarks are relatively restricted and do not correspond to realistic scenarios. Therefore, this paper correspondingly establishes new methods and new datasets for MMT. First, we propose a framework 2/3-Triplet with two new approaches to enhance MMT by utilizing large-scale non-triple data: monolingual image-text data and parallel text-only data. Second, we construct an English-Chinese {e}-commercial {m}ulti{m}odal {t}ranslation dataset (including training and testing), named EMMT, where its test set is carefully selected as some words are ambiguous and shall be translated mistakenly without the help of images. Experiments show that our method is more suitable for real-world scenarios and can significantly improve translation performance by using more non-triple data. In addition, our model also rivals various SOTA models in conventional multimodal translation benchmarks.

* 8 pages

Via

Access Paper or Ask Questions

Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus

Jan 24, 2022

Yaoming Zhu, Liwei Wu, Shanbo Cheng, Mingxuan Wang

Figure 1 for Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus

Figure 2 for Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus

Figure 3 for Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus

Figure 4 for Unified Multimodal Punctuation Restoration Framework for Mixed-Modality Corpus

Abstract:The punctuation restoration task aims to correctly punctuate the output transcriptions of automatic speech recognition systems. Previous punctuation models, either using text only or demanding the corresponding audio, tend to be constrained by real scenes, where unpunctuated sentences are a mixture of those with and without audio. This paper proposes a unified multimodal punctuation restoration framework, named UniPunc, to punctuate the mixed sentences with a single model. UniPunc jointly represents audio and non-audio samples in a shared latent space, based on which the model learns a hybrid representation and punctuates both kinds of samples. We validate the effectiveness of the UniPunc on real-world datasets, which outperforms various strong baselines (e.g. BERT, MuSe) by at least 0.8 overall F1 scores, making a new state-of-the-art. Extensive experiments show that UniPunc's design is a pervasive solution: by grafting onto previous models, UniPunc enables them to punctuate on the mixed corpus. Our code is available at github.com/Yaoming95/UniPunc

* 5 pages, accepted by ICASSP'2022

Via

Access Paper or Ask Questions

The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Sep 24, 2021

Lihua Qian, Yi Zhou, Zaixiang Zheng, Yaoming Zhu, Zehui Lin, Jiangtao Feng, Shanbo Cheng, Lei Li, Mingxuan Wang, Hao Zhou

Figure 1 for The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Figure 2 for The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Figure 3 for The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Figure 4 for The Volctrans GLAT System: Non-autoregressive Translation Meets WMT21

Abstract:This paper describes the Volctrans' submission to the WMT21 news translation shared task for German->English translation. We build a parallel (i.e., non-autoregressive) translation system using the Glancing Transformer, which enables fast and accurate parallel decoding in contrast to the currently prevailing autoregressive models. To the best of our knowledge, this is the first parallel translation system that can be scaled to such a practical scenario like WMT competition. More importantly, our parallel translation system achieves the best BLEU score (35.0) on German->English translation task, outperforming all strong autoregressive counterparts.

* 10 pages, 5 figures, WMT2021

Via

Access Paper or Ask Questions

UniST: Unified End-to-end Model for Streaming and Non-streaming Speech Translation

Sep 15, 2021

Qianqian Dong, Yaoming Zhu, Mingxuan Wang, Lei Li

Figure 1 for UniST: Unified End-to-end Model for Streaming and Non-streaming Speech Translation

Figure 2 for UniST: Unified End-to-end Model for Streaming and Non-streaming Speech Translation

Figure 3 for UniST: Unified End-to-end Model for Streaming and Non-streaming Speech Translation

Figure 4 for UniST: Unified End-to-end Model for Streaming and Non-streaming Speech Translation

Abstract:This paper presents a unified end-to-end frame-work for both streaming and non-streamingspeech translation. While the training recipes for non-streaming speech translation have been mature, the recipes for streaming speechtranslation are yet to be built. In this work, wefocus on developing a unified model (UniST) which supports streaming and non-streaming ST from the perspective of fundamental components, including training objective, attention mechanism and decoding policy. Experiments on the most popular speech-to-text translation benchmark dataset, MuST-C, show that UniST achieves significant improvement for non-streaming ST, and a better-learned trade-off for BLEU score and latency metrics for streaming ST, compared with end-to-end baselines and the cascaded models. We will make our codes and evaluation tools publicly available.

Via

Access Paper or Ask Questions

Serial or Parallel? Plug-able Adapter for multilingual machine translation

Apr 16, 2021

Yaoming Zhu, Jiangtao Feng, Chengqi Zhao, Mingxuan Wang, Lei Li

Figure 1 for Serial or Parallel? Plug-able Adapter for multilingual machine translation

Figure 2 for Serial or Parallel? Plug-able Adapter for multilingual machine translation

Figure 3 for Serial or Parallel? Plug-able Adapter for multilingual machine translation

Figure 4 for Serial or Parallel? Plug-able Adapter for multilingual machine translation

Abstract:Developing a unified multilingual translation model is a key topic in machine translation research. However, existing approaches suffer from performance degradation: multilingual models yield inferior performance compared to the ones trained separately on rich bilingual data. We attribute the performance degradation to two issues: multilingual embedding conflation and multilingual fusion effects. To address the two issues, we propose PAM, a Transformer model augmented with defusion adaptation for multilingual machine translation. Specifically, PAM consists of embedding and layer adapters to shift the word and intermediate representations towards language-specific ones. Extensive experiment results on IWSLT, OPUS-100, and WMT benchmarks show that \method outperforms several strong competitors, including series adapter and multilingual knowledge distillation.

* 13 pages

Via

Access Paper or Ask Questions

The Volctrans Machine Translation System for WMT20

Oct 28, 2020

Liwei Wu, Xiao Pan, Zehui Lin, Yaoming Zhu, Mingxuan Wang, Lei Li

Figure 1 for The Volctrans Machine Translation System for WMT20

Figure 2 for The Volctrans Machine Translation System for WMT20

Figure 3 for The Volctrans Machine Translation System for WMT20

Figure 4 for The Volctrans Machine Translation System for WMT20

Abstract:This paper describes our VolcTrans system on WMT20 shared news translation task. We participated in 8 translation directions. Our basic systems are based on Transformer, with several variants (wider or deeper Transformers, dynamic convolutions). The final system includes text pre-process, data selection, synthetic data generation, advanced model ensemble, and multilingual pre-training.

Via

Access Paper or Ask Questions

GIKT: A Graph-based Interaction Model for Knowledge Tracing

Sep 13, 2020

Yang Yang, Jian Shen, Yanru Qu, Yunfei Liu, Kerong Wang, Yaoming Zhu, Weinan Zhang, Yong Yu

Figure 1 for GIKT: A Graph-based Interaction Model for Knowledge Tracing

Figure 2 for GIKT: A Graph-based Interaction Model for Knowledge Tracing

Figure 3 for GIKT: A Graph-based Interaction Model for Knowledge Tracing

Figure 4 for GIKT: A Graph-based Interaction Model for Knowledge Tracing

Abstract:With the rapid development in online education, knowledge tracing (KT) has become a fundamental problem which traces students' knowledge status and predicts their performance on new questions. Questions are often numerous in online education systems, and are always associated with much fewer skills. However, the previous literature fails to involve question information together with high-order question-skill correlations, which is mostly limited by data sparsity and multi-skill problems. From the model perspective, previous models can hardly capture the long-term dependency of student exercise history, and cannot model the interactions between student-questions, and student-skills in a consistent way. In this paper, we propose a Graph-based Interaction model for Knowledge Tracing (GIKT) to tackle the above probems. More specifically, GIKT utilizes graph convolutional network (GCN) to substantially incorporate question-skill correlations via embedding propagation. Besides, considering that relevant questions are usually scattered throughout the exercise history, and that question and skill are just different instantiations of knowledge, GIKT generalizes the degree of students' master of the question to the interactions between the student's current state, the student's history related exercises, the target question, and related skills. Experiments on three datasets demonstrate that GIKT achieves the new state-of-the-art performance, with at least 1% absolute AUC improvement.

* 16 pages,2 figures, ECMLPKDD2020

Via

Access Paper or Ask Questions

Signal Instructed Coordination in Team Competition

Sep 10, 2019

Liheng Chen, Hongyi Guo, Haifeng Zhang, Fei Fang, Yaoming Zhu, Ming Zhou, Weinan Zhang, Qing Wang, Yong Yu

Figure 1 for Signal Instructed Coordination in Team Competition

Figure 2 for Signal Instructed Coordination in Team Competition

Figure 3 for Signal Instructed Coordination in Team Competition

Figure 4 for Signal Instructed Coordination in Team Competition

Abstract:Most existing models of multi-agent reinforcement learning (MARL) adopt centralized training with decentralized execution framework. We demonstrate that the decentralized execution scheme restricts agents' capacity to find a better joint policy in team competition games, where each team of agents share the common rewards and cooperate to compete against other teams. To resolve this problem, we propose Signal Instructed Coordination (SIC), a novel coordination module that can be integrated with most existing models. SIC casts a common signal sampled from a pre-defined distribution to team members, and adopts an information-theoretic regularization to encourage agents to exploit in learning the instruction of centralized signals. Our experiments show that SIC can consistently improve team performance over well-recognized MARL models on matrix games and predator-prey games.

* 13 pages, 9 figures

Via

Access Paper or Ask Questions

Triple-to-Text: Converting RDF Triples into High-Quality Natural Languages via Optimizing an Inverse KL Divergence

May 25, 2019

Yaoming Zhu, Juncheng Wan, Zhiming Zhou, Liheng Chen, Lin Qiu, Weinan Zhang, Xin Jiang, Yong Yu

Figure 1 for Triple-to-Text: Converting RDF Triples into High-Quality Natural Languages via Optimizing an Inverse KL Divergence

Figure 2 for Triple-to-Text: Converting RDF Triples into High-Quality Natural Languages via Optimizing an Inverse KL Divergence

Figure 3 for Triple-to-Text: Converting RDF Triples into High-Quality Natural Languages via Optimizing an Inverse KL Divergence

Figure 4 for Triple-to-Text: Converting RDF Triples into High-Quality Natural Languages via Optimizing an Inverse KL Divergence

Abstract:Knowledge base is one of the main forms to represent information in a structured way. A knowledge base typically consists of Resource Description Frameworks (RDF) triples which describe the entities and their relations. Generating natural language description of the knowledge base is an important task in NLP, which has been formulated as a conditional language generation task and tackled using the sequence-to-sequence framework. Current works mostly train the language models by maximum likelihood estimation, which tends to generate lousy sentences. In this paper, we argue that such a problem of maximum likelihood estimation is intrinsic, which is generally irrevocable via changing network structures. Accordingly, we propose a novel Triple-to-Text (T2T) framework, which approximately optimizes the inverse Kullback-Leibler (KL) divergence between the distributions of the real and generated sentences. Due to the nature that inverse KL imposes large penalty on fake-looking samples, the proposed method can significantly reduce the probability of generating low-quality sentences. Our experiments on three real-world datasets demonstrate that T2T can generate higher-quality sentences and outperform baseline models in several evaluation metrics.

Via

Access Paper or Ask Questions

Neural Text Generation: Past, Present and Beyond

Mar 15, 2018

Sidi Lu, Yaoming Zhu, Weinan Zhang, Jun Wang, Yong Yu

Figure 1 for Neural Text Generation: Past, Present and Beyond

Figure 2 for Neural Text Generation: Past, Present and Beyond

Figure 3 for Neural Text Generation: Past, Present and Beyond

Figure 4 for Neural Text Generation: Past, Present and Beyond

Abstract:This paper presents a systematic survey on recent development of neural text generation models. Specifically, we start from recurrent neural network language models with the traditional maximum likelihood estimation training scheme and point out its shortcoming for text generation. We thus introduce the recently proposed methods for text generation based on reinforcement learning, re-parametrization tricks and generative adversarial nets (GAN) techniques. We compare different properties of these models and the corresponding techniques to handle their common problems such as gradient vanishing and generation diversity. Finally, we conduct a benchmarking experiment with different types of neural text generation models on two well-known datasets and discuss the empirical results along with the aforementioned model properties.

Via

Access Paper or Ask Questions