Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Raphael Schumann

Text-to-OverpassQL: A Natural Language Interface for Complex Geodata Querying of OpenStreetMap

Aug 30, 2023

Michael Staniek, Raphael Schumann, Maike Züfle, Stefan Riezler

Abstract:We present Text-to-OverpassQL, a task designed to facilitate a natural language interface for querying geodata from OpenStreetMap (OSM). The Overpass Query Language (OverpassQL) allows users to formulate complex database queries and is widely adopted in the OSM ecosystem. Generating Overpass queries from natural language input serves multiple use-cases. It enables novice users to utilize OverpassQL without prior knowledge, assists experienced users with crafting advanced queries, and enables tool-augmented large language models to access information stored in the OSM database. In order to assess the performance of current sequence generation models on this task, we propose OverpassNL, a dataset of 8,352 queries with corresponding natural language inputs. We further introduce task specific evaluation metrics and ground the evaluation of the Text-to-OverpassQL task by executing the queries against the OSM database. We establish strong baselines by finetuning sequence-to-sequence models and adapting large language models with in-context examples. The detailed evaluation reveals strengths and weaknesses of the considered learning strategies, laying the foundations for further research into the Text-to-OverpassQL task.

Via

Access Paper or Ask Questions

VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Jul 12, 2023

Raphael Schumann, Wanrong Zhu, Weixi Feng, Tsu-Jui Fu, Stefan Riezler, William Yang Wang

Figure 1 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Figure 2 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Figure 3 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Figure 4 for VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View

Abstract:Incremental decision making in real-world environments is one of the most challenging tasks in embodied artificial intelligence. One particularly demanding scenario is Vision and Language Navigation~(VLN) which requires visual and natural language understanding as well as spatial and temporal reasoning capabilities. The embodied agent needs to ground its understanding of navigation instructions in observations of a real-world environment like Street View. Despite the impressive results of LLMs in other research areas, it is an ongoing problem of how to best connect them with an interactive visual environment. In this work, we propose VELMA, an embodied LLM agent that uses a verbalization of the trajectory and of visual environment observations as contextual prompt for the next action. Visual information is verbalized by a pipeline that extracts landmarks from the human written navigation instructions and uses CLIP to determine their visibility in the current panorama view. We show that VELMA is able to successfully follow navigation instructions in Street View with only two in-context examples. We further finetune the LLM agent on a few thousand examples and achieve 25%-30% relative improvement in task completion over the previous state-of-the-art for two datasets.

Via

Access Paper or Ask Questions

Backward Compatibility During Data Updates by Weight Interpolation

Jan 25, 2023

Raphael Schumann, Elman Mansimov, Yi-An Lai, Nikolaos Pappas, Xibin Gao, Yi Zhang

Figure 1 for Backward Compatibility During Data Updates by Weight Interpolation

Figure 2 for Backward Compatibility During Data Updates by Weight Interpolation

Figure 3 for Backward Compatibility During Data Updates by Weight Interpolation

Figure 4 for Backward Compatibility During Data Updates by Weight Interpolation

Abstract:Backward compatibility of model predictions is a desired property when updating a machine learning driven application. It allows to seamlessly improve the underlying model without introducing regression bugs. In classification tasks these bugs occur in the form of negative flips. This means an instance that was correctly classified by the old model is now classified incorrectly by the updated model. This has direct negative impact on the user experience of such systems e.g. a frequently used voice assistant query is suddenly misclassified. A common reason to update the model is when new training data becomes available and needs to be incorporated. Simply retraining the model with the updated data introduces the unwanted negative flips. We study the problem of regression during data updates and propose Backward Compatible Weight Interpolation (BCWI). This method interpolates between the weights of the old and new model and we show in extensive experiments that it reduces negative flips without sacrificing the improved accuracy of the new model. BCWI is straight forward to implement and does not increase inference cost. We also explore the use of importance weighting during interpolation and averaging the weights of multiple new models in order to further reduce negative flips.

Via

Access Paper or Ask Questions

Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas

Mar 25, 2022

Raphael Schumann, Stefan Riezler

Figure 1 for Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas

Figure 2 for Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas

Figure 3 for Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas

Figure 4 for Analyzing Generalization of Vision and Language Navigation to Unseen Outdoor Areas

Abstract:Vision and language navigation (VLN) is a challenging visually-grounded language understanding task. Given a natural language navigation instruction, a visual agent interacts with a graph-based environment equipped with panorama images and tries to follow the described route. Most prior work has been conducted in indoor scenarios where best results were obtained for navigation on routes that are similar to the training routes, with sharp drops in performance when testing on unseen environments. We focus on VLN in outdoor scenarios and find that in contrast to indoor VLN, most of the gain in outdoor VLN on unseen data is due to features like junction type embedding or heading delta that are specific to the respective environment graph, while image information plays a very minor role in generalizing VLN to unseen outdoor areas. These findings show a bias to specifics of graph representations of urban environments, demanding that VLN tasks grow in scale and diversity of geographical environments.

* accepted at ACL 2022

Via

Access Paper or Ask Questions

Generating Landmark Navigation Instructions from Maps as a Graph-to-Text Problem

Dec 30, 2020

Raphael Schumann, Stefan Riezler

Figure 1 for Generating Landmark Navigation Instructions from Maps as a Graph-to-Text Problem

Figure 2 for Generating Landmark Navigation Instructions from Maps as a Graph-to-Text Problem

Figure 3 for Generating Landmark Navigation Instructions from Maps as a Graph-to-Text Problem

Figure 4 for Generating Landmark Navigation Instructions from Maps as a Graph-to-Text Problem

Abstract:Car-focused navigation services are based on turns and distances of named streets, whereas navigation instructions naturally used by humans are centered around physical objects called landmarks. We present a neural model that takes OpenStreetMap representations as input and learns to generate navigation instructions that contain visible and salient landmarks from human natural language instructions. Routes on the map are encoded in a location- and rotation-invariant graph representation that is decoded into natural language instructions. Our work is based on a novel dataset of 7,672 crowd-sourced instances that have been verified by human navigation in Street View. Our evaluation shows that the navigation instructions generated by our system have similar properties as human-generated instructions, and lead to successful human navigation in Street View.

Via

Access Paper or Ask Questions

Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction

May 04, 2020

Raphael Schumann, Lili Mou, Yao Lu, Olga Vechtomova, Katja Markert

Figure 1 for Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction

Figure 2 for Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction

Figure 3 for Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction

Figure 4 for Discrete Optimization for Unsupervised Sentence Summarization with Word-Level Extraction

Abstract:Automatic sentence summarization produces a shorter version of a sentence, while preserving its most important information. A good summary is characterized by language fluency and high information overlap with the source sentence. We model these two aspects in an unsupervised objective function, consisting of language modeling and semantic similarity metrics. We search for a high-scoring summary by discrete optimization. Our proposed method achieves a new state-of-the art for unsupervised sentence summarization according to ROUGE scores. Additionally, we demonstrate that the commonly reported ROUGE F1 metric is sensitive to summary length. Since this is unwillingly exploited in recent work, we emphasize that future evaluation should explicitly group summarization systems by output length brackets.

* Accepted at ACL 2020

Via

Access Paper or Ask Questions

Unsupervised Abstractive Sentence Summarization using Length Controlled Variational Autoencoder

Sep 21, 2018

Raphael Schumann

Figure 1 for Unsupervised Abstractive Sentence Summarization using Length Controlled Variational Autoencoder

Figure 2 for Unsupervised Abstractive Sentence Summarization using Length Controlled Variational Autoencoder

Figure 3 for Unsupervised Abstractive Sentence Summarization using Length Controlled Variational Autoencoder

Figure 4 for Unsupervised Abstractive Sentence Summarization using Length Controlled Variational Autoencoder

Abstract:In this work we present an unsupervised approach to summarize sentences in abstractive way using Variational Autoencoder (VAE). VAE are known to learn a semantically rich latent variable, representing high dimensional input. VAEs are trained by learning to reconstruct the input from the probabilistic latent variable. Explicitly providing the information about output length during training influences the VAE to not encode this information and thus can be manipulated during inference. Instructing the decoder to produce a shorter output sequence leads to expressing the input sentence with fewer words. We show on different summarization data sets, that these shorter sentences can not beat a simple baseline but yield higher ROUGE scores than trying to reconstruct the whole sentence.

Via

Access Paper or Ask Questions