Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ona De Gibert

MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki

Mar 12, 2024

Timothee Mickus, Stig-Arne Grönroos, Joseph Attieh, Michele Boggia, Ona De Gibert, Shaoxiong Ji, Niki Andreas Lopi, Alessandro Raganato, Raúl Vázquez, Jörg Tiedemann

Figure 1 for MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki

Figure 2 for MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki

Figure 3 for MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki

Figure 4 for MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki

Abstract:NLP in the age of monolithic large language models is approaching its limits in terms of size and information that can be handled. The trend goes to modularization, a necessary step into the direction of designing smaller sub-networks and components with specialized functionality. In this paper, we present the MAMMOTH toolkit: a framework designed for training massively multilingual modular machine translation systems at scale, initially derived from OpenNMT-py and then adapted to ensure efficient training across computation clusters. We showcase its efficiency across clusters of A100 and V100 NVIDIA GPUs, and discuss our design philosophy and plans for future information. The toolkit is publicly available online.

* Presented as a demo at EACL 2024

Via

Access Paper or Ask Questions

Spanish Biomedical and Clinical Language Embeddings

Feb 25, 2021

Asier Gutiérrez-Fandiño, Jordi Armengol-Estapé, Casimiro Pio Carrino, Ona De Gibert, Aitor Gonzalez-Agirre, Marta Villegas

Figure 1 for Spanish Biomedical and Clinical Language Embeddings

Figure 2 for Spanish Biomedical and Clinical Language Embeddings

Abstract:We computed both Word and Sub-word Embeddings using FastText. For Sub-word embeddings we selected Byte Pair Encoding (BPE) algorithm to represent the sub-words. We evaluated the Biomedical Word Embeddings obtaining better results than previous versions showing the implication that with more data, we obtain better representations.

Via

Access Paper or Ask Questions