Abstract:In this report, we present the latest model of the Gemini family, Gemini 1.5 Pro, a highly compute-efficient multimodal mixture-of-experts model capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. Gemini 1.5 Pro achieves near-perfect recall on long-context retrieval tasks across modalities, improves the state-of-the-art in long-document QA, long-video QA and long-context ASR, and matches or surpasses Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5 Pro's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 2.1 (200k) and GPT-4 Turbo (128k). Finally, we highlight surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content.
Abstract:This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultra model advances the state of the art in 30 of 32 of these benchmarks - notably being the first model to achieve human-expert performance on the well-studied exam benchmark MMLU, and improving the state of the art in every one of the 20 multimodal benchmarks we examined. We believe that the new capabilities of Gemini models in cross-modal reasoning and language understanding will enable a wide variety of use cases and we discuss our approach toward deploying them responsibly to users.
Abstract:Despite recent advances in natural language understanding and generation, and decades of research on the development of conversational bots, building automated agents that can carry on rich open-ended conversations with humans "in the wild" remains a formidable challenge. In this work we develop a real-time, open-ended dialogue system that uses reinforcement learning (RL) to power a bot's conversational skill at scale. Our work pairs the succinct embedding of the conversation state generated using SOTA (supervised) language models with RL techniques that are particularly suited to a dynamic action space that changes as the conversation progresses. Trained using crowd-sourced data, our novel system is able to substantially exceeds the (strong) baseline supervised model with respect to several metrics of interest in a live experiment with real users of the Google Assistant.
Abstract:Baseline detection is still a challenging task for heterogeneous collections of historical documents. We present a novel approach to baseline extraction in such settings, turning out the winning entry to the ICDAR 2017 Competition on Baseline detection (cBAD). It utilizes deep convolutional nets (CNNs) for both, the actual extraction of baselines, as well as for a simple form of layout analysis in a pre-processing step. To the best of our knowledge it is the first CNN-based system for baseline extraction applying a U-net architecture and sliding window detection, profiting from a high local accuracy of the candidate lines extracted. Final baseline post-processing complements our approach, compensating for inaccuracies mainly due to missing context information during sliding window detection. We experimentally evaluate the components of our system individually on the cBAD dataset. Moreover, we investigate how it generalizes to different data by means of the dataset used for the baseline extraction task of the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts (HisDoc). A comparison with the results reported for HisDoc shows that it also outperforms the contestants of the latter.
Abstract:As software systems are getting increasingly connected, there is a need for equipping nonmonotonic logic programs with access to external sources that are possibly remote and may contain information in heterogeneous formats. To cater for this need, HEX programs were designed as a generalization of answer set programs with an API style interface that allows to access arbitrary external sources, providing great flexibility. Efficient evaluation of such programs however is challenging, and it requires to interleave external computation and model building; to decide when to switch between these tasks is difficult, and existing approaches have limited scalability in many real-world application scenarios. We present a new approach for the evaluation of logic programs with external source access, which is based on a configurable framework for dividing the non-ground program into possibly overlapping smaller parts called evaluation units. The latter will be processed by interleaving external evaluation and model building using an evaluation graph and a model graph, respectively, and by combining intermediate results. Experiments with our prototype implementation show a significant improvement compared to previous approaches. While designed for HEX-programs, the new evaluation approach may be deployed to related rule-based formalisms as well.
Abstract:The rise of smart applications has drawn interest to logical reasoning over data streams. Recently, different query languages and stream processing/reasoning engines were proposed in different communities. However, due to a lack of theoretical foundations, the expressivity and semantics of these diverse approaches are given only informally. Towards clear specifications and means for analytic study, a formal framework is needed to define their semantics in precise terms. To this end, we present a first step towards an ideal semantics that allows for exact descriptions and comparisons of stream reasoning systems.
Abstract:ARCOE-Logic 2014, the 6th International Workshop on Acquisition, Representation and Reasoning about Context with Logic, was held in co-location with the 19th International Conference on Knowledge Engineering and Knowledge Management (EKAW 2014) on November 25, 2014 in Link\"oping, Sweden. These notes contain the five papers which were accepted and presented at the workshop.
Abstract:In this work we propose a multi-valued extension of logic programs under the stable models semantics where each true atom in a model is associated with a set of justifications. These justifications are expressed in terms of causal graphs formed by rule labels and edges that represent their application ordering. For positive programs, we show that the causal justifications obtained for a given atom have a direct correspon- dence to (relevant) syntactic proofs of that atom using the program rules involved in the graphs. The most interesting contribution is that this causal information is obtained in a purely semantic way, by algebraic op- erations (product, sum and application) on a lattice of causal values whose ordering relation expresses when a justification is stronger than another. Finally, for programs with negation, we define the concept of causal stable model by introducing an analogous transformation to Gelfond and Lifschitz's program reduct. As a result, default negation behaves as "absence of proof" and no justification is derived from negative liter
Abstract:This volume contains the papers presented at the sixth workshop on Answer Set Programming and Other Computing Paradigms (ASPOCP 2013) held on August 25th, 2013 in Istanbul, co-located with the 29th International Conference on Logic Programming (ICLP 2013). It thus continues a series of previous events co-located with ICLP, aiming at facilitating the discussion about crossing the boundaries of current ASP techniques in theory, solving, and applications, in combination with or inspired by other computing paradigms.
Abstract:This volume contains the papers presented at the fifth workshop on Answer Set Programming and Other Computing Paradigms (ASPOCP 2012) held on September 4th, 2012 in Budapest, co-located with the 28th International Conference on Logic Programming (ICLP 2012). It thus continues a series of previous events co-located with ICLP, aiming at facilitating the discussion about crossing the boundaries of current ASP techniques in theory, solving, and applications, in combination with or inspired by other computing paradigms.