Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yves Lepage

High-Quality Data Augmentation for Low-Resource NMT: Combining a Translation Memory, a GAN Generator, and Filtering

Aug 22, 2024

Hengjie Liu, Ruibo Hou, Yves Lepage

Figure 1 for High-Quality Data Augmentation for Low-Resource NMT: Combining a Translation Memory, a GAN Generator, and Filtering

Figure 2 for High-Quality Data Augmentation for Low-Resource NMT: Combining a Translation Memory, a GAN Generator, and Filtering

Figure 3 for High-Quality Data Augmentation for Low-Resource NMT: Combining a Translation Memory, a GAN Generator, and Filtering

Figure 4 for High-Quality Data Augmentation for Low-Resource NMT: Combining a Translation Memory, a GAN Generator, and Filtering

Abstract:Back translation, as a technique for extending a dataset, is widely used by researchers in low-resource language translation tasks. It typically translates from the target to the source language to ensure high-quality translation results. This paper proposes a novel way of utilizing a monolingual corpus on the source side to assist Neural Machine Translation (NMT) in low-resource settings. We realize this concept by employing a Generative Adversarial Network (GAN), which augments the training data for the discriminator while mitigating the interference of low-quality synthetic monolingual translations with the generator. Additionally, this paper integrates Translation Memory (TM) with NMT, increasing the amount of data available to the generator. Moreover, we propose a novel procedure to filter the synthetic sentence pairs during the augmentation process, ensuring the high quality of the data.

Via

Access Paper or Ask Questions

Any four real numbers are on all fours with analogy

Jul 26, 2024

Yves Lepage, Miguel Couceiro

Figure 1 for Any four real numbers are on all fours with analogy

Figure 2 for Any four real numbers are on all fours with analogy

Figure 3 for Any four real numbers are on all fours with analogy

Figure 4 for Any four real numbers are on all fours with analogy

Abstract:This work presents a formalization of analogy on numbers that relies on generalized means. It is motivated by recent advances in artificial intelligence and applications of machine learning, where the notion of analogy is used to infer results, create data and even as an assessment tool of object representations, or embeddings, that are basically collections of numbers (vectors, matrices, tensors). This extended analogy use asks for mathematical foundations and clear understanding of the notion of analogy between numbers. We propose a unifying view of analogies that relies on generalized means defined in terms of a power parameter. In particular, we show that any four increasing positive real numbers is an analogy in a unique suitable power. In addition, we show that any such analogy can be reduced to an equivalent arithmetic analogy and that any analogical equation has a solution for increasing numbers, which generalizes without restriction to complex numbers. These foundational results provide a better understanding of analogies in areas where representations are numerical.

Via

Access Paper or Ask Questions

Fast BTG-Forest-Based Hierarchical Sub-sentential Alignment

Nov 20, 2017

Hao Wang, Yves Lepage

Figure 1 for Fast BTG-Forest-Based Hierarchical Sub-sentential Alignment

Figure 2 for Fast BTG-Forest-Based Hierarchical Sub-sentential Alignment

Abstract:In this paper, we propose a novel BTG-forest-based alignment method. Based on a fast unsupervised initialization of parameters using variational IBM models, we synchronously parse parallel sentences top-down and align hierarchically under the constraint of BTG. Our two-step method can achieve the same run-time and comparable translation performance as fast_align while it yields smaller phrase tables. Final SMT results show that our method even outperforms in the experiment of distantly related languages, e.g., English-Japanese.

* 6 pages

Via

Access Paper or Ask Questions

An Investigation of the Sampling-Based Alignment Method and Its Contributions

Aug 21, 2013

Juan Luo, Yves Lepage

Figure 1 for An Investigation of the Sampling-Based Alignment Method and Its Contributions

Figure 2 for An Investigation of the Sampling-Based Alignment Method and Its Contributions

Figure 3 for An Investigation of the Sampling-Based Alignment Method and Its Contributions

Figure 4 for An Investigation of the Sampling-Based Alignment Method and Its Contributions

Abstract:By investigating the distribution of phrase pairs in phrase translation tables, the work in this paper describes an approach to increase the number of n-gram alignments in phrase translation tables output by a sampling-based alignment method. This approach consists in enforcing the alignment of n-grams in distinct translation subtables so as to increase the number of n-grams. Standard normal distribution is used to allot alignment time among translation subtables, which results in adjustment of the distribution of n- grams. This leads to better evaluation results on statistical machine translation tasks than the original sampling-based alignment approach. Furthermore, the translation quality obtained by merging phrase translation tables computed from the sampling-based alignment method and from MGIZA++ is examined.

* International Journal of Artificial Intelligence & Applications (IJAIA), Vol. 4, No. 4, July 2013
* 11 pages

Via

Access Paper or Ask Questions