Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Petra Poklukar

I Know What I Don't Know: Improving Model Cascades Through Confidence Tuning

Feb 26, 2025

Stephan Rabanser, Nathalie Rauschmayr, Achin Kulshrestha, Petra Poklukar, Wittawat Jitkrittum, Sean Augenstein, Congchao Wang, Federico Tombari

Abstract:Large-scale machine learning models deliver strong performance across a wide range of tasks but come with significant computational and resource constraints. To mitigate these challenges, local smaller models are often deployed alongside larger models, relying on routing and deferral mechanisms to offload complex tasks. However, existing approaches inadequately balance the capabilities of these models, often resulting in unnecessary deferrals or sub-optimal resource usage. In this work we introduce a novel loss function called Gatekeeper for calibrating smaller models in cascade setups. Our approach fine-tunes the smaller model to confidently handle tasks it can perform correctly while deferring complex tasks to the larger model. Moreover, it incorporates a mechanism for managing the trade-off between model performance and deferral accuracy, and is broadly applicable across various tasks and domains without any architectural changes. We evaluate our method on encoder-only, decoder-only, and encoder-decoder architectures. Experiments across image classification, language modeling, and vision-language tasks show that our approach substantially improves deferral performance.

Via

Access Paper or Ask Questions

Hyperbolic Delaunay Geometric Alignment

Apr 12, 2024

Aniss Aiman Medbouhi, Giovanni Luca Marchetti, Vladislav Polianskii, Alexander Kravberg, Petra Poklukar, Anastasia Varava, Danica Kragic

Abstract:Hyperbolic machine learning is an emerging field aimed at representing data with a hierarchical structure. However, there is a lack of tools for evaluation and analysis of the resulting hyperbolic data representations. To this end, we propose Hyperbolic Delaunay Geometric Alignment (HyperDGA) -- a similarity score for comparing datasets in a hyperbolic space. The core idea is counting the edges of the hyperbolic Delaunay graph connecting datapoints across the given sets. We provide an empirical investigation on synthetic and real-life biological data and demonstrate that HyperDGA outperforms the hyperbolic version of classical distances between sets. Furthermore, we showcase the potential of HyperDGA for evaluating latent representations inferred by a Hyperbolic Variational Auto-Encoder.

Via

Access Paper or Ask Questions

BRAVE: Broadening the visual encoding of vision-language models

Apr 10, 2024

Oğuzhan Fatih Kar, Alessio Tonioni, Petra Poklukar, Achin Kulshrestha, Amir Zamir, Federico Tombari

Abstract:Vision-language models (VLMs) are typically composed of a vision encoder, e.g. CLIP, and a language model (LM) that interprets the encoded features to solve downstream tasks. Despite remarkable progress, VLMs are subject to several shortcomings due to the limited capabilities of vision encoders, e.g. "blindness" to certain image features, visual hallucination, etc. To address these issues, we study broadening the visual encoding capabilities of VLMs. We first comprehensively benchmark several vision encoders with different inductive biases for solving VLM tasks. We observe that there is no single encoding configuration that consistently achieves top performance across different tasks, and encoders with different biases can perform surprisingly similarly. Motivated by this, we introduce a method, named BRAVE, that consolidates features from multiple frozen encoders into a more versatile representation that can be directly fed as the input to a frozen LM. BRAVE achieves state-of-the-art performance on a broad range of captioning and VQA benchmarks and significantly reduces the aforementioned issues of VLMs, while requiring a smaller number of trainable parameters than existing methods and having a more compressed representation. Our results highlight the potential of incorporating different visual biases for a more broad and contextualized visual understanding of VLMs.

* Project page at https://brave-vlms.epfl.ch/

Via

Access Paper or Ask Questions

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Apr 18, 2022

Ali Ghadirzadeh, Petra Poklukar, Karol Arndt, Chelsea Finn, Ville Kyrki, Danica Kragic, Mårten Björkman

Figure 1 for Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Figure 2 for Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Figure 3 for Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Figure 4 for Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

Abstract:We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable generative models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basketball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.

* arXiv admin note: substantial text overlap with arXiv:2007.13134

Via

Access Paper or Ask Questions

Augment-Connect-Explore: a Paradigm for Visual Action Planning with Data Scarcity

Mar 24, 2022

Martina Lippi, Michael C. Welle, Petra Poklukar, Alessandro Marino, Danica Kragic

Figure 1 for Augment-Connect-Explore: a Paradigm for Visual Action Planning with Data Scarcity

Figure 2 for Augment-Connect-Explore: a Paradigm for Visual Action Planning with Data Scarcity

Figure 3 for Augment-Connect-Explore: a Paradigm for Visual Action Planning with Data Scarcity

Figure 4 for Augment-Connect-Explore: a Paradigm for Visual Action Planning with Data Scarcity

Abstract:Visual action planning particularly excels in applications where the state of the system cannot be computed explicitly, such as manipulation of deformable objects, as it enables planning directly from raw images. Even though the field has been significantly accelerated by deep learning techniques, a crucial requirement for their success is the availability of a large amount of data. In this work, we propose the Augment-Connect-Explore (ACE) paradigm to enable visual action planning in cases of data scarcity. We build upon the Latent Space Roadmap (LSR) framework which performs planning with a graph built in a low dimensional latent space. In particular, ACE is used to i) Augment the available training dataset by autonomously creating new pairs of datapoints, ii) create new unobserved Connections among representations of states in the latent graph, and iii) Explore new regions of the latent space in a targeted manner. We validate the proposed approach on both simulated box stacking and real-world folding task showing the applicability for rigid and deformable object manipulation tasks, respectively.

Via

Access Paper or Ask Questions

Delaunay Component Analysis for Evaluation of Data Representations

Feb 14, 2022

Petra Poklukar, Vladislav Polianskii, Anastasia Varava, Florian Pokorny, Danica Kragic

Figure 1 for Delaunay Component Analysis for Evaluation of Data Representations

Figure 2 for Delaunay Component Analysis for Evaluation of Data Representations

Figure 3 for Delaunay Component Analysis for Evaluation of Data Representations

Figure 4 for Delaunay Component Analysis for Evaluation of Data Representations

Abstract:Advanced representation learning techniques require reliable and general evaluation methods. Recently, several algorithms based on the common idea of geometric and topological analysis of a manifold approximated from the learned data representations have been proposed. In this work, we introduce Delaunay Component Analysis (DCA) - an evaluation algorithm which approximates the data manifold using a more suitable neighbourhood graph called Delaunay graph. This provides a reliable manifold estimation even for challenging geometric arrangements of representations such as clusters with varying shape and density as well as outliers, which is where existing methods often fail. Furthermore, we exploit the nature of Delaunay graphs and introduce a framework for assessing the quality of individual novel data representations. We experimentally validate the proposed DCA method on representations obtained from neural networks trained with contrastive objective, supervised and generative models, and demonstrate various use cases of our extended single point evaluation framework.

* ICLR 2022 camera ready

Via

Access Paper or Ask Questions

GraphDCA -- a Framework for Node Distribution Comparison in Real and Synthetic Graphs

Feb 09, 2022

Ciwan Ceylan, Petra Poklukar, Hanna Hultin, Alexander Kravchenko, Anastasia Varava, Danica Kragic

Figure 1 for GraphDCA -- a Framework for Node Distribution Comparison in Real and Synthetic Graphs

Figure 2 for GraphDCA -- a Framework for Node Distribution Comparison in Real and Synthetic Graphs

Figure 3 for GraphDCA -- a Framework for Node Distribution Comparison in Real and Synthetic Graphs

Figure 4 for GraphDCA -- a Framework for Node Distribution Comparison in Real and Synthetic Graphs

Abstract:We argue that when comparing two graphs, the distribution of node structural features is more informative than global graph statistics which are often used in practice, especially to evaluate graph generative models. Thus, we present GraphDCA - a framework for evaluating similarity between graphs based on the alignment of their respective node representation sets. The sets are compared using a recently proposed method for comparing representation spaces, called Delaunay Component Analysis (DCA), which we extend to graph data. To evaluate our framework, we generate a benchmark dataset of graphs exhibiting different structural patterns and show, using three node structure feature extractors, that GraphDCA recognizes graphs with both similar and dissimilar local structure. We then apply our framework to evaluate three publicly available real-world graph datasets and demonstrate, using gradual edge perturbations, that GraphDCA satisfyingly captures gradually decreasing similarity, unlike global statistics. Finally, we use GraphDCA to evaluate two state-of-the-art graph generative models, NetGAN and CELL, and conclude that further improvements are needed for these models to adequately reproduce local structural features.

Via

Access Paper or Ask Questions

GMC -- Geometric Multimodal Contrastive Representation Learning

Feb 08, 2022

Petra Poklukar, Miguel Vasco, Hang Yin, Francisco S. Melo, Ana Paiva, Danica Kragic

Figure 1 for GMC -- Geometric Multimodal Contrastive Representation Learning

Figure 2 for GMC -- Geometric Multimodal Contrastive Representation Learning

Figure 3 for GMC -- Geometric Multimodal Contrastive Representation Learning

Figure 4 for GMC -- Geometric Multimodal Contrastive Representation Learning

Abstract:Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method comprised of two main components: i) a two-level architecture consisting of modality-specific base encoder, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.

Via

Access Paper or Ask Questions

Batch Curation for Unsupervised Contrastive Representation Learning

Aug 19, 2021

Michael C. Welle, Petra Poklukar, Danica Kragic

Figure 1 for Batch Curation for Unsupervised Contrastive Representation Learning

Figure 2 for Batch Curation for Unsupervised Contrastive Representation Learning

Figure 3 for Batch Curation for Unsupervised Contrastive Representation Learning

Figure 4 for Batch Curation for Unsupervised Contrastive Representation Learning

Abstract:The state-of-the-art unsupervised contrastive visual representation learning methods that have emerged recently (SimCLR, MoCo, SwAV) all make use of data augmentations in order to construct a pretext task of instant discrimination consisting of similar and dissimilar pairs of images. Similar pairs are constructed by randomly extracting patches from the same image and applying several other transformations such as color jittering or blurring, while transformed patches from different image instances in a given batch are regarded as dissimilar pairs. We argue that this approach can result similar pairs that are \textit{semantically} dissimilar. In this work, we address this problem by introducing a \textit{batch curation} scheme that selects batches during the training process that are more inline with the underlying contrastive objective. We provide insights into what constitutes beneficial similar and dissimilar pairs as well as validate \textit{batch curation} on CIFAR10 by integrating it in the SimCLR model.

Via

Access Paper or Ask Questions

GeomCA: Geometric Evaluation of Data Representations

May 26, 2021

Petra Poklukar, Anastasia Varava, Danica Kragic

Figure 1 for GeomCA: Geometric Evaluation of Data Representations

Figure 2 for GeomCA: Geometric Evaluation of Data Representations

Figure 3 for GeomCA: Geometric Evaluation of Data Representations

Figure 4 for GeomCA: Geometric Evaluation of Data Representations

Abstract:Evaluating the quality of learned representations without relying on a downstream task remains one of the challenges in representation learning. In this work, we present Geometric Component Analysis (GeomCA) algorithm that evaluates representation spaces based on their geometric and topological properties. GeomCA can be applied to representations of any dimension, independently of the model that generated them. We demonstrate its applicability by analyzing representations obtained from a variety of scenarios, such as contrastive learning models, generative models and supervised learning models.

* ICML2021 camera ready version

Via

Access Paper or Ask Questions