Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Fink

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Mar 08, 2024

Machel Reid, Nikolay Savinov, Denis Teplyashin, Dmitry Lepikhin, Timothy Lillicrap, Jean-baptiste Alayrac, Radu Soricut, Angeliki Lazaridou, Orhan Firat, Julian Schrittwieser(+659 more)

Abstract:In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. Gemini 1.5 Pro achieves near-perfect recall on long-context retrieval tasks across modalities, improves the state-of-the-art in long-document QA, long-video QA and long-context ASR, and matches or surpasses Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5 Pro's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 2.1 (200k) and GPT-4 Turbo (128k). Finally, we highlight surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.

Via

Access Paper or Ask Questions

Gemini: A Family of Highly Capable Multimodal Models

Dec 19, 2023

Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth(+930 more)

Abstract:This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of Gemini models in cross-modal reasoning and language understanding will enable a wide variety of use cases and we discuss our approach toward deploying them responsibly to users.

Via

Access Paper or Ask Questions

Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

Jul 25, 2022

Deborah Cohen, Moonkyung Ryu, Yinlam Chow, Orgad Keller, Ido Greenberg, Avinatan Hassidim, Michael Fink, Yossi Matias, Idan Szpektor, Craig Boutilier(+1 more)

Figure 1 for Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

Figure 2 for Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

Figure 3 for Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

Figure 4 for Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning

Abstract:Despite recent advances in natural language understanding and generation, and decades of research on the development of conversational bots, building automated agents that can carry on rich open-ended conversations with humans "in the wild" remains a formidable challenge. In this work we develop a real-time, open-ended dialogue system that uses reinforcement learning (RL) to power a bot's conversational skill at scale. Our work pairs the succinct embedding of the conversation state generated using SOTA (supervised) language models with RL techniques that are particularly suited to a dynamic action space that changes as the conversation progresses. Trained using crowd-sourced data, our novel system is able to substantially exceeds the (strong) baseline supervised model with respect to several metrics of interest in a live experiment with real users of the Google Assistant.

Via

Access Paper or Ask Questions

Baseline Detection in Historical Documents using Convolutional U-Nets

Oct 22, 2018

Michael Fink, Thomas Layer, Georg Mackenbrock, Michael Sprinzl

Figure 1 for Baseline Detection in Historical Documents using Convolutional U-Nets

Figure 2 for Baseline Detection in Historical Documents using Convolutional U-Nets

Figure 3 for Baseline Detection in Historical Documents using Convolutional U-Nets

Figure 4 for Baseline Detection in Historical Documents using Convolutional U-Nets

Abstract:Baseline detection is still a challenging task for heterogeneous collections of historical documents. We present a novel approach to baseline extraction in such settings, turning out the winning entry to the ICDAR 2017 Competition on Baseline detection (cBAD). It utilizes deep convolutional nets (CNNs) for both, the actual extraction of baselines, as well as for a simple form of layout analysis in a pre-processing step. To the best of our knowledge it is the first CNN-based system for baseline extraction applying a U-net architecture and sliding window detection, profiting from a high local accuracy of the candidate lines extracted. Final baseline post-processing complements our approach, compensating for inaccuracies mainly due to missing context information during sliding window detection. We experimentally evaluate the components of our system individually on the cBAD dataset. Moreover, we investigate how it generalizes to different data by means of the dataset used for the baseline extraction task of the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts (HisDoc). A comparison with the results reported for HisDoc shows that it also outperforms the contestants of the latter.

* Proc. of the 13th IAPR Int. Workshop on Document Analysis Systems (DAS 2018), IEEE Computer Society, pp. 37-42, 2018
* 6 pages, accepted to DAS 2018

Via

Access Paper or Ask Questions

A model building framework for Answer Set Programming with external computations

Jul 11, 2015

Thomas Eiter, Michael Fink, Giovambattista Ianni, Thomas Krennwallner, Christoph Redl, Peter Schüller

Figure 1 for A model building framework for Answer Set Programming with external computations

Figure 2 for A model building framework for Answer Set Programming with external computations

Figure 3 for A model building framework for Answer Set Programming with external computations

Figure 4 for A model building framework for Answer Set Programming with external computations

Abstract:As software systems are getting increasingly connected, there is a need for equipping nonmonotonic logic programs with access to external sources that are possibly remote and may contain information in heterogeneous formats. To cater for this need, HEX programs were designed as a generalization of answer set programs with an API style interface that allows to access arbitrary external sources, providing great flexibility. Efficient evaluation of such programs however is challenging, and it requires to interleave external computation and model building; to decide when to switch between these tasks is difficult, and existing approaches have limited scalability in many real-world application scenarios. We present a new approach for the evaluation of logic programs with external source access, which is based on a configurable framework for dividing the non-ground program into possibly overlapping smaller parts called evaluation units. The latter will be processed by interleaving external evaluation and model building using an evaluation graph and a model graph, respectively, and by combining intermediate results. Experiments with our prototype implementation show a significant improvement compared to previous approaches. While designed for HEX-programs, the new evaluation approach may be deployed to related rule-based formalisms as well.

* Theory and Practice of Logic Programming 16 (2015) 418-464
* 57 pages, 9 figures, 3 tables, 6 algorithms, to appear in Theory and Practice of Logic Programming (accepted in June 2015)

Via

Access Paper or Ask Questions

Towards Ideal Semantics for Analyzing Stream Reasoning

May 20, 2015

Harald Beck, Minh Dao-Tran, Thomas Eiter, Michael Fink

Figure 1 for Towards Ideal Semantics for Analyzing Stream Reasoning

Figure 2 for Towards Ideal Semantics for Analyzing Stream Reasoning

Figure 3 for Towards Ideal Semantics for Analyzing Stream Reasoning

Abstract:The rise of smart applications has drawn interest to logical reasoning over data streams. Recently, different query languages and stream processing/reasoning engines were proposed in different communities. However, due to a lack of theoretical foundations, the expressivity and semantics of these diverse approaches are given only informally. Towards clear specifications and means for analytic study, a formal framework is needed to define their semantics in precise terms. To this end, we present a first step towards an ideal semantics that allows for exact descriptions and comparisons of stream reasoning systems.

* International Workshop on Reactive Concepts in Knowledge Representation (ReactKnow 2014), co-located with the 21st European Conference on Artificial Intelligence (ECAI 2014). Proceedings of the International Workshop on Reactive Concepts in Knowledge Representation (ReactKnow 2014), pages 17-22, technical report, ISSN 1430-3701, Leipzig University, 2014. http://nbn-resolving.de/urn:nbn:de:bsz:15-qucosa-150562 2014,1

Via

Access Paper or Ask Questions

Workshop Notes of the 6th International Workshop on Acquisition, Representation and Reasoning about Context with Logic (ARCOE-Logic 2014)

Dec 30, 2014

Michael Fink, Martin Homola, Alessandra Mileo

Abstract:ARCOE-Logic 2014, the 6th International Workshop on Acquisition, Representation and Reasoning about Context with Logic, was held in co-location with the 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2014) on November 25, 2014 in Link\"oping, Sweden. These notes contain the five papers which were accepted and presented at the workshop.

* ARCOE-Logic 2014, 5 papers

Via

Access Paper or Ask Questions

Causal Graph Justifications of Logic Programs

Sep 25, 2014

Pedro Cabalar, Jorge Fandinno, Michael Fink

Figure 1 for Causal Graph Justifications of Logic Programs

Figure 2 for Causal Graph Justifications of Logic Programs

Figure 3 for Causal Graph Justifications of Logic Programs

Figure 4 for Causal Graph Justifications of Logic Programs

Abstract:In this work we propose a multi-valued extension of logic programs under the stable models semantics where each true atom in a model is associated with a set of justifications. These justifications are expressed in terms of causal graphs formed by rule labels and edges that represent their application ordering. For positive programs, we show that the causal justifications obtained for a given atom have a direct correspon- dence to (relevant) syntactic proofs of that atom using the program rules involved in the graphs. The most interesting contribution is that this causal information is obtained in a purely semantic way, by algebraic op- erations (product, sum and application) on a lattice of causal values whose ordering relation expresses when a justification is stronger than another. Finally, for programs with negation, we define the concept of causal stable model by introducing an analogous transformation to Gelfond and Lifschitz's program reduct. As a result, default negation behaves as "absence of proof" and no justification is derived from negative liter

* Theory and Practice of Logic Programming (2014), volume 14, issue 4-5, pp. 603-618

Via

Access Paper or Ask Questions

Proceedings of Answer Set Programming and Other Computing Paradigms (ASPOCP 2013), 6th International Workshop, August 25, 2013, Istanbul, Turkey

Dec 28, 2013

Michael Fink, Yuliya Lierler

Abstract:This volume contains the papers presented at the sixth workshop on Answer Set Programming and Other Computing Paradigms (ASPOCP 2013) held on August 25th, 2013 in Istanbul, co-located with the 29th International Conference on Logic Programming (ICLP 2013). It thus continues a series of previous events co-located with ICLP, aiming at facilitating the discussion about crossing the boundaries of current ASP techniques in theory, solving, and applications, in combination with or inspired by other computing paradigms.

* Proceedings of Answer Set Programming and Other Computing Paradigms (ASPOCP 2013), 6th International Workshop, August 25, 2013, Istanbul, Turkey

Via

Access Paper or Ask Questions

Proceedings of Answer Set Programming and Other Computing Paradigms , 5th International Workshop, September 4, 2012, Budapest, Hungary

Jan 10, 2013

Michael Fink, Yuliya Lierler

Abstract:This volume contains the papers presented at the fifth workshop on Answer Set Programming and Other Computing Paradigms (ASPOCP 2012) held on September 4th, 2012 in Budapest, co-located with the 28th International Conference on Logic Programming (ICLP 2012). It thus continues a series of previous events co-located with ICLP, aiming at facilitating the discussion about crossing the boundaries of current ASP techniques in theory, solving, and applications, in combination with or inspired by other computing paradigms.

Via

Access Paper or Ask Questions