Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abe Ittycheriah

Gemini: A Family of Highly Capable Multimodal Models

Dec 19, 2023

Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth(+930 more)

Abstract:This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of Gemini models in cross-modal reasoning and language understanding will enable a wide variety of use cases and we discuss our approach toward deploying them responsibly to users.

Via

Access Paper or Ask Questions

PaLM 2 Technical Report

May 17, 2023

Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen(+118 more)

Abstract:We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities. When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report.

Via

Access Paper or Ask Questions

A Generative Approach to Titling and Clustering Wikipedia Sections

May 22, 2020

Anjalie Field, Sascha Rothe, Simon Baumgartner, Cong Yu, Abe Ittycheriah

Figure 1 for A Generative Approach to Titling and Clustering Wikipedia Sections

Figure 2 for A Generative Approach to Titling and Clustering Wikipedia Sections

Figure 3 for A Generative Approach to Titling and Clustering Wikipedia Sections

Figure 4 for A Generative Approach to Titling and Clustering Wikipedia Sections

Abstract:We evaluate the performance of transformer encoders with various decoders for information organization through a new task: generation of section headings for Wikipedia articles. Our analysis shows that decoders containing attention mechanisms over the encoder output achieve high-scoring results by generating extractive text. In contrast, a decoder without attention better facilitates semantic encoding and can be used to generate section embeddings. We additionally introduce a new loss function, which further encourages the decoder to generate high-quality embeddings.

* Accepted to WNGT Workshop at ACL 2020

Via

Access Paper or Ask Questions

Coverage Embedding Models for Neural Machine Translation

Aug 29, 2016

Haitao Mi, Baskaran Sankaran, Zhiguo Wang, Abe Ittycheriah

Figure 1 for Coverage Embedding Models for Neural Machine Translation

Figure 2 for Coverage Embedding Models for Neural Machine Translation

Figure 3 for Coverage Embedding Models for Neural Machine Translation

Abstract:In this paper, we enhance the attention-based neural machine translation (NMT) by adding explicit coverage embedding models to alleviate issues of repeating and dropping translations in NMT. For each source word, our model starts with a full coverage embedding vector to track the coverage status, and then keeps updating it with neural networks as the translation goes. Experiments on the large-scale Chinese-to-English task show that our enhanced model improves the translation quality significantly on various test sets over the strong large vocabulary NMT system.

* 6 pages; In Proceddings of EMNLP 2016

Via

Access Paper or Ask Questions

Temporal Attention Model for Neural Machine Translation

Aug 09, 2016

Baskaran Sankaran, Haitao Mi, Yaser Al-Onaizan, Abe Ittycheriah

Figure 1 for Temporal Attention Model for Neural Machine Translation

Figure 2 for Temporal Attention Model for Neural Machine Translation

Figure 3 for Temporal Attention Model for Neural Machine Translation

Figure 4 for Temporal Attention Model for Neural Machine Translation

Abstract:Attention-based Neural Machine Translation (NMT) models suffer from attention deficiency issues as has been observed in recent research. We propose a novel mechanism to address some of these limitations and improve the NMT attention. Specifically, our approach memorizes the alignments temporally (within each sentence) and modulates the attention with the accumulated temporal memory, as the decoder generates the candidate translation. We compare our approach against the baseline NMT model and two other related approaches that address this issue either explicitly or implicitly. Large-scale experiments on two language pairs show that our approach achieves better and robust gains over the baseline and related NMT approaches. Our model further outperforms strong SMT baselines in some settings even without using ensembles.

* 8 pages

Via

Access Paper or Ask Questions

Supervised Attentions for Neural Machine Translation

Jul 30, 2016

Haitao Mi, Zhiguo Wang, Abe Ittycheriah

Figure 1 for Supervised Attentions for Neural Machine Translation

Figure 2 for Supervised Attentions for Neural Machine Translation

Figure 3 for Supervised Attentions for Neural Machine Translation

Figure 4 for Supervised Attentions for Neural Machine Translation

Abstract:In this paper, we improve the attention or alignment accuracy of neural machine translation by utilizing the alignments of training sentence pairs. We simply compute the distance between the machine attentions and the "true" alignments, and minimize this cost in the training procedure. Our experiments on large-scale Chinese-to-English task show that our model improves both translation and alignment qualities significantly over the large-vocabulary neural machine translation system, and even beats a state-of-the-art traditional syntax-based system.

* 6 pages. In Proceedings of EMNLP 2016. arXiv admin note: text overlap with arXiv:1605.03148

Via

Access Paper or Ask Questions

Vocabulary Manipulation for Neural Machine Translation

May 10, 2016

Haitao Mi, Zhiguo Wang, Abe Ittycheriah

Figure 1 for Vocabulary Manipulation for Neural Machine Translation

Figure 2 for Vocabulary Manipulation for Neural Machine Translation

Figure 3 for Vocabulary Manipulation for Neural Machine Translation

Figure 4 for Vocabulary Manipulation for Neural Machine Translation

Abstract:In order to capture rich language phenomena, neural machine translation models have to use a large vocabulary size, which requires high computing time and large memory usage. In this paper, we alleviate this issue by introducing a sentence-level or batch-level vocabulary, which is only a very small sub-set of the full output vocabulary. For each sentence or batch, we only predict the target words in its sentence-level or batch-level vocabulary. Thus, we reduce both the computing time and the memory usage. Our method simply takes into account the translation options of each word or phrase in the source sentence, and picks a very small target vocabulary for each sentence based on a word-to-word translation model or a bilingual phrase library learned from a traditional machine translation model. Experimental results on the large-scale English-to-French task show that our method achieves better translation performance by 1 BLEU point over the large vocabulary neural machine translation system of Jean et al. (2015).

* 6 pages

Via

Access Paper or Ask Questions