Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Arnas Uselis

Kaunas University of Technology

CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally

Feb 05, 2025

Darina Koishigarina, Arnas Uselis, Seong Joon Oh

Abstract:CLIP (Contrastive Language-Image Pretraining) has become a popular choice for various downstream tasks. However, recent studies have questioned its ability to represent compositional concepts effectively. These works suggest that CLIP often acts like a bag-of-words (BoW) model, interpreting images and text as sets of individual concepts without grasping the structural relationships. In particular, CLIP struggles to correctly bind attributes to their corresponding objects when multiple objects are present in an image or text. In this work, we investigate why CLIP exhibits this BoW-like behavior. We find that the correct attribute-object binding information is already present in individual text and image modalities. Instead, the issue lies in the cross-modal alignment, which relies on cosine similarity. To address this, we propose Linear Attribute Binding CLIP or LABCLIP. It applies a linear transformation to text embeddings before computing cosine similarity. This approach significantly improves CLIP's ability to bind attributes to correct objects, thereby enhancing its compositional understanding.

Via

Access Paper or Ask Questions

Time-Adaptive Recurrent Neural Networks

Apr 11, 2022

Mantas Lukoševičius, Arnas Uselis

Figure 1 for Time-Adaptive Recurrent Neural Networks

Figure 2 for Time-Adaptive Recurrent Neural Networks

Figure 3 for Time-Adaptive Recurrent Neural Networks

Figure 4 for Time-Adaptive Recurrent Neural Networks

Abstract:Data are often sampled irregularly in time. Dealing with this using Recurrent Neural Networks (RNNs) traditionally involved ignoring the fact, feeding the time differences as additional inputs, or resampling the data. All these methods have their shortcomings. We propose an elegant alternative approach where instead the RNN is in effect resampled in time to match the time of the data. We use Echo State Network (ESN) and Gated Recurrent Unit (GRU) as the basis for our solution. Such RNNs can be seen as discretizations of continuous-time dynamical systems, which gives a solid theoretical ground for our approach. Similar recent observations have been made in feed-forward neural networks as neural ordinary differential equations. Our Time-Adaptive ESN (TAESN) and GRU (TAGRU) models allow for a direct model time setting and require no additional training, parameter tuning, or computation compared to the regular counterparts, thus retaining their original efficiency. We confirm empirically that our models can effectively compensate for the time-non-uniformity of the data and demonstrate that they compare favorably to data resampling, classical RNN methods, and alternative RNN models proposed to deal with time irregularities on several real-world nonuniform-time datasets.

* Originally written in May 2019

Via

Access Paper or Ask Questions

Efficient implementations of echo state network cross-validation

Jun 19, 2020

Mantas Lukoševičius, Arnas Uselis

Figure 1 for Efficient implementations of echo state network cross-validation

Figure 2 for Efficient implementations of echo state network cross-validation

Figure 3 for Efficient implementations of echo state network cross-validation

Figure 4 for Efficient implementations of echo state network cross-validation

Abstract:Background/introduction: Cross-validation is still uncommon in time series modeling. Echo State Networks (ESNs), as a prime example of Reservoir Computing (RC) models, are known for their fast and precise one-shot learning, that often benefit from good hyper-parameter tuning. This makes them ideal to change the status quo. Methods: We suggest several schemes for cross-validating ESNs and introduce an efficient algorithm for implementing them. This algorithm is presented as two levels of optimizations of doing $k$-fold cross-validation. Training an RC model typically consists of two stages: (i) running the reservoir with the data and (ii) computing the optimal readouts. The first level of our proposed optimization addresses the most computationally expensive part (i) and makes it remain constant irrespective of $k$. It dramatically reduces reservoir computations in any type of RC system and is enough if $k$ is small. The second level of optimization also makes the (ii) part remain constant irrespective of large $k$, as long as the dimension of the output is low. We discuss when the proposed validation schemes for ESNs could be beneficial, three options for producing the final model and empirically investigate them on six different real-world datasets, as well as do empirical computation time experiments. We provide the code in an online repository. Results: Proposed cross-validation schemes give better and more stable test performance in all the six different real-world datasets, three task types. Empirical run times confirm our complexity analysis. Conclusions: In most situations $k$-fold cross-validation of ESNs and many other RC models can be done for virtually the same time complexity as a simple single-split validation. Space complexity can also remain the same in all the cases. This enables cross-validation to become a standard practice in reservoir computing.

* arXiv admin note: substantial text overlap with arXiv:1908.08450

Via

Access Paper or Ask Questions

Localized convolutional neural networks for geospatial wind forecasting

May 13, 2020

Arnas Uselis, Mantas Lukoševičius, Lukas Stasytis

Figure 1 for Localized convolutional neural networks for geospatial wind forecasting

Figure 2 for Localized convolutional neural networks for geospatial wind forecasting

Figure 3 for Localized convolutional neural networks for geospatial wind forecasting

Figure 4 for Localized convolutional neural networks for geospatial wind forecasting

Abstract:Convolutional Neural Networks (CNN) possess many positive qualities when it comes to spatial raster data. Translation invariance enables CNNs to detect features regardless of their position in the scene. But in some domains, like geospatial, not all locations are exactly equal. In this work we propose localized convolutional neural networks that enable convolutional architectures to learn local features in addition to the global ones. We investigate their instantiations in the form of learnable inputs, local weights, and a more general form. They can be added to any convolutional layers, easily end-to-end trained, introduce minimal additional complexity, and let CNNs retain most of their benefits to the extent that they are needed. In this work we address spatio-temporal prediction: test the effectiveness of our methods on a synthetic benchmark dataset and tackle three real-world wind prediction datasets. For one of them we propose a method to spatially order the unordered data. We compare against the recent state-of-the-art spatio-temporal prediction models on the same data. Models that use convolutional layers can be and are extended with our localizations. In all these cases our extensions improve the results, and thus often the state-of-the-art. We share all the code at a public repository.

Via

Access Paper or Ask Questions

Efficient Cross-Validation of Echo State Networks

Aug 22, 2019

Mantas Lukoševičius, Arnas Uselis

Figure 1 for Efficient Cross-Validation of Echo State Networks

Figure 2 for Efficient Cross-Validation of Echo State Networks

Figure 3 for Efficient Cross-Validation of Echo State Networks

Figure 4 for Efficient Cross-Validation of Echo State Networks

Abstract:Echo State Networks (ESNs) are known for their fast and precise one-shot learning of time series. But they often need good hyper-parameter tuning for best performance. For this good validation is key, but usually, a single validation split is used. In this rather practical contribution we suggest several schemes for cross-validating ESNs and introduce an efficient algorithm for implementing them. The component that dominates the time complexity of the already quite fast ESN training remains constant (does not scale up with $k$) in our proposed method of doing $k$-fold cross-validation. The component that does scale linearly with $k$ starts dominating only in some not very common situations. Thus in many situations $k$-fold cross-validation of ESNs can be done for virtually the same time complexity as a simple single split validation. Space complexity can also remain the same. We also discuss when the proposed validation schemes for ESNs could be beneficial and empirically investigate them on several different real-world datasets.

* Accepted in ICANN'19 Workshop on Reservoir Computing

Via

Access Paper or Ask Questions