Abstract:The Catalan Language Understanding Benchmark (CLUB) encompasses various datasets representative of different NLU tasks that enable accurate evaluations of language models, following the General Language Understanding Evaluation (GLUE) example. It is part of AINA and PlanTL, two public funding initiatives to empower the Catalan language in the Artificial Intelligence era.
Abstract:The amount and variety of data is increasing drastically for several years. These data are often represented as networks, which are then explored with approaches arising from network theory. Recent years have witnessed the extension of network exploration methods to leverage more complex and richer network frameworks. Random walks, for instance, have been extended to explore multilayer networks. However, current random walk approaches are limited in the combination and heterogeneity of network layers they can handle. New analytical and numerical random walk methods are needed to cope with the increasing diversity and complexity of multilayer networks. We propose here MultiXrank, a Python package that enables Random Walk with Restart (RWR) on any kind of multilayer network with an optimized implementation. This package is supported by a universal mathematical formulation of the RWR. We evaluated MultiXrank with leave-one-out cross-validation and link prediction, and introduced protocols to measure the impact of the addition or removal of multilayer network data on prediction performances. We further measured the sensitivity of MultiXrank to input parameters by in-depth exploration of the parameter space. Finally, we illustrate the versatility of MultiXrank with different use-cases of unsupervised node prioritization and supervised classification in the context of human genetic diseases.