Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marius Leordeanu

Institute of Mathematics of the Romanian Academy, University "Politehnica" of Bucharest

Learning from Random Subspace Exploration: Generalized Test-Time Augmentation with Self-supervised Distillation

Jul 02, 2025

Andrei Jelea, Ahmed Nabil Belbachir, Marius Leordeanu

Abstract:We introduce Generalized Test-Time Augmentation (GTTA), a highly effective method for improving the performance of a trained model, which unlike other existing Test-Time Augmentation approaches from the literature is general enough to be used off-the-shelf for many vision and non-vision tasks, such as classification, regression, image segmentation and object detection. By applying a new general data transformation, that randomly perturbs multiple times the PCA subspace projection of a test input, GTTA forms robust ensembles at test time in which, due to sound statistical properties, the structural and systematic noises in the initial input data is filtered out and final estimator errors are reduced. Different from other existing methods, we also propose a final self-supervised learning stage in which the ensemble output, acting as an unsupervised teacher, is used to train the initial single student model, thus reducing significantly the test time computational cost, at no loss in accuracy. Our tests and comparisons to strong TTA approaches and SoTA models on various vision and non-vision well-known datasets and tasks, such as image classification and segmentation, speech recognition and house price prediction, validate the generality of the proposed GTTA. Furthermore, we also prove its effectiveness on the more specific real-world task of salmon segmentation and detection in low-visibility underwater videos, for which we introduce DeepSalmon, the largest dataset of its kind in the literature.

Via

Access Paper or Ask Questions

Closer to Ground Truth: Realistic Shape and Appearance Labeled Data Generation for Unsupervised Underwater Image Segmentation

Mar 20, 2025

Andrei Jelea, Ahmed Nabil Belbachir, Marius Leordeanu

Abstract:Solving fish segmentation in underwater videos, a real-world problem of great practical value in marine and aquaculture industry, is a challenging task due to the difficulty of the filming environment, poor visibility and limited existing annotated underwater fish data. In order to overcome these obstacles, we introduce a novel two stage unsupervised segmentation approach that requires no human annotations and combines artificially created and real images. Our method generates challenging synthetic training data, by placing virtual fish in real-world underwater habitats, after performing fish transformations such as Thin Plate Spline shape warping and color Histogram Matching, which realistically integrate synthetic fish into the backgrounds, making the generated images increasingly closer to the real world data with every stage of our approach. While we validate our unsupervised method on the popular DeepFish dataset, obtaining a performance close to a fully-supervised SoTA model, we further show its effectiveness on the specific case of salmon segmentation in underwater videos, for which we introduce DeepSalmon, the largest dataset of its kind in the literature (30 GB). Moreover, on both datasets we prove the capability of our approach to boost the performance of the fully-supervised SoTA model.

* Proceedings of ECCVW 2024

Via

Access Paper or Ask Questions

A self-supervised cyclic neural-analytic approach for novel view synthesis and 3D reconstruction

Mar 05, 2025

Dragos Costea, Alina Marcu, Marius Leordeanu

Abstract:Generating novel views from recorded videos is crucial for enabling autonomous UAV navigation. Recent advancements in neural rendering have facilitated the rapid development of methods capable of rendering new trajectories. However, these methods often fail to generalize well to regions far from the training data without an optimized flight path, leading to suboptimal reconstructions. We propose a self-supervised cyclic neural-analytic pipeline that combines high-quality neural rendering outputs with precise geometric insights from analytical methods. Our solution improves RGB and mesh reconstructions for novel view synthesis, especially in undersampled areas and regions that are completely different from the training dataset. We use an effective transformer-based architecture for image reconstruction to refine and adapt the synthesis process, enabling effective handling of novel, unseen poses without relying on extensive labeled datasets. Our findings demonstrate substantial improvements in rendering views of novel and also 3D reconstruction, which to the best of our knowledge is a first, setting a new standard for autonomous navigation in complex outdoor environments.

* British Machine Vision Conference (BMVC), 2024
* Published in BMVC 2024, 10 pages, 4 figures

Via

Access Paper or Ask Questions

Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time

Jan 14, 2025

Mihai Masala, Marius Leordeanu

Figure 1 for Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time

Figure 2 for Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time

Figure 3 for Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time

Figure 4 for Towards Zero-Shot & Explainable Video Description by Reasoning over Graphs of Events in Space and Time

Abstract:In the current era of Machine Learning, Transformers have become the de facto approach across a variety of domains, such as computer vision and natural language processing. Transformer-based solutions are the backbone of current state-of-the-art methods for language generation, image and video classification, segmentation, action and object recognition, among many others. Interestingly enough, while these state-of-the-art methods produce impressive results in their respective domains, the problem of understanding the relationship between vision and language is still beyond our reach. In this work, we propose a common ground between vision and language based on events in space and time in an explainable and programmatic way, to connect learning-based vision and language state of the art models and provide a solution to the long standing problem of describing videos in natural language. We validate that our algorithmic approach is able to generate coherent, rich and relevant textual descriptions on videos collected from a variety of datasets, using both standard metrics (e.g. Bleu, ROUGE) and the modern LLM-as-a-Jury approach.

Via

Access Paper or Ask Questions

Label up: Learning Pulmonary Embolism Segmentation from Image Level Annotation through Model Explainability

Dec 10, 2024

Florin Condrea, Saikiran Rapaka, Marius Leordeanu

Figure 1 for Label up: Learning Pulmonary Embolism Segmentation from Image Level Annotation through Model Explainability

Figure 2 for Label up: Learning Pulmonary Embolism Segmentation from Image Level Annotation through Model Explainability

Figure 3 for Label up: Learning Pulmonary Embolism Segmentation from Image Level Annotation through Model Explainability

Figure 4 for Label up: Learning Pulmonary Embolism Segmentation from Image Level Annotation through Model Explainability

Abstract:Pulmonary Embolisms (PE) are a leading cause of cardiovascular death. Computed tomographic pulmonary angiography (CTPA) stands as the gold standard for diagnosing pulmonary embolisms (PE) and there has been a lot of interest in developing AI-based models for assisting in PE diagnosis. Performance of these algorithms has been hindered by the scarcity of annotated data, especially those with fine-grained delineation of the thromboembolic burden. In this paper we attempt to address this issue by introducing a weakly supervised learning pipeline, that leverages model explainability to generate fine-grained (pixel level) masks for embolisms starting from more coarse-grained (binary, image level) PE annotations. Furthermore, we show that training models using the automatically generated pixel annotations yields good PE localization performance. We demonstrate the effectiveness of our pipeline on the large-scale, multi-center RSPECT augmented dataset for PE detection and localization.

Via

Access Paper or Ask Questions

"Vorbeşti Româneşte?" A Recipe to Train Powerful Romanian LLMs with English Instructions

Jun 26, 2024

Mihai Masala, Denis C. Ilie-Ablachim, Alexandru Dima, Dragos Corlatescu, Miruna Zavelca, Ovio Olaru, Simina Terian-Dan, Andrei Terian-Dan, Marius Leordeanu, Horia Velicu(+3 more)

Abstract:In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English; hence, their performance in English greatly exceeds other languages. To our knowledge, we are the first to collect and translate a large collection of texts, instructions, and benchmarks and train, evaluate, and release open-source LLMs tailored for Romanian. We evaluate our methods on four different categories, including academic benchmarks, MT-Bench (manually translated), and a professionally built historical, cultural, and social benchmark adapted to Romanian. We argue for the usefulness and high performance of RoLLMs by obtaining state-of-the-art results across the board. We publicly release all resources (i.e., data, training and evaluation code, models) to support and encourage research on Romanian LLMs while concurrently creating a generalizable recipe, adequate for other low or less-resourced languages.

* arXiv admin note: text overlap with arXiv:2405.07703

Via

Access Paper or Ask Questions

OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs

May 17, 2024

Mihai Masala, Denis C. Ilie-Ablachim, Dragos Corlatescu, Miruna Zavelca, Marius Leordeanu, Horia Velicu, Marius Popescu, Mihai Dascalu, Traian Rebedea

Figure 1 for OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs

Figure 2 for OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs

Figure 3 for OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs

Figure 4 for OpenLLM-Ro -- Technical Report on Open-source Romanian LLMs

Via

Access Paper or Ask Questions

Multiple Random Masking Autoencoder Ensembles for Robust Multimodal Semi-supervised Learning

Feb 12, 2024

Alexandru-Raul Todoran, Marius Leordeanu

Figure 1 for Multiple Random Masking Autoencoder Ensembles for Robust Multimodal Semi-supervised Learning

Figure 2 for Multiple Random Masking Autoencoder Ensembles for Robust Multimodal Semi-supervised Learning

Figure 3 for Multiple Random Masking Autoencoder Ensembles for Robust Multimodal Semi-supervised Learning

Figure 4 for Multiple Random Masking Autoencoder Ensembles for Robust Multimodal Semi-supervised Learning

Abstract:There is an increasing number of real-world problems in computer vision and machine learning requiring to take into consideration multiple interpretation layers (modalities or views) of the world and learn how they relate to each other. For example, in the case of Earth Observations from satellite data, it is important to be able to predict one observation layer (e.g. vegetation index) from other layers (e.g. water vapor, snow cover, temperature etc), in order to best understand how the Earth System functions and also be able to reliably predict information for one layer when the data is missing (e.g. due to measurement failure or error).

* 17 pages, 11 figures

Via

Access Paper or Ask Questions

Maia: A Real-time Non-Verbal Chat for Human-AI Interaction

Feb 09, 2024

Dragos Costea, Alina Marcu, Cristina Lazar, Marius Leordeanu

Figure 1 for Maia: A Real-time Non-Verbal Chat for Human-AI Interaction

Figure 2 for Maia: A Real-time Non-Verbal Chat for Human-AI Interaction

Figure 3 for Maia: A Real-time Non-Verbal Chat for Human-AI Interaction

Figure 4 for Maia: A Real-time Non-Verbal Chat for Human-AI Interaction

Abstract:Face-to-face communication modeling in computer vision is an area of research focusing on developing algorithms that can recognize and analyze non-verbal cues and behaviors during face-to-face interactions. We propose an alternative to text chats for Human-AI interaction, based on non-verbal visual communication only, using facial expressions and head movements that mirror, but also improvise over the human user, to efficiently engage with the users, and capture their attention in a low-cost and real-time fashion. Our goal is to track and analyze facial expressions, and other non-verbal cues in real-time, and use this information to build models that can predict and understand human behavior. We offer three different complementary approaches, based on retrieval, statistical, and deep learning techniques. We provide human as well as automatic evaluations and discuss the advantages and disadvantages of each direction.

* 5 pages, 3 figures

Via

Access Paper or Ask Questions

Multi-Task Hypergraphs for Semi-supervised Learning using Earth Observations

Aug 21, 2023

Mihai Pirvu, Alina Marcu, Alexandra Dobrescu, Nabil Belbachir, Marius Leordeanu

Figure 1 for Multi-Task Hypergraphs for Semi-supervised Learning using Earth Observations

Figure 2 for Multi-Task Hypergraphs for Semi-supervised Learning using Earth Observations

Figure 3 for Multi-Task Hypergraphs for Semi-supervised Learning using Earth Observations

Figure 4 for Multi-Task Hypergraphs for Semi-supervised Learning using Earth Observations

Abstract:There are many ways of interpreting the world and they are highly interdependent. We exploit such complex dependencies and introduce a powerful multi-task hypergraph, in which every node is a task and different paths through the hypergraph reaching a given task become unsupervised teachers, by forming ensembles that learn to generate reliable pseudolabels for that task. Each hyperedge is part of an ensemble teacher for a given task and it is also a student of the self-supervised hypergraph system. We apply our model to one of the most important problems of our times, that of Earth Observation, which is highly multi-task and it often suffers from missing ground-truth data. By performing extensive experiments on the NASA NEO Dataset, spanning a period of 22 years, we demonstrate the value of our multi-task semi-supervised approach, by consistent improvements over strong baselines and recent work. We also show that the hypergraph can adapt unsupervised to gradual data distribution shifts and reliably recover, through its multi-task self-supervision process, the missing data for several observational layers for up to seven years.

* Accepted in ICCV 2023 Workshops

Via

Access Paper or Ask Questions