Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gennaro Vessio

I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting

Nov 28, 2024

Nicola Fanelli, Gennaro Vessio, Giovanna Castellano

Figure 1 for I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting

Figure 2 for I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting

Figure 3 for I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting

Figure 4 for I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting

Abstract:Inpainting focuses on filling missing or corrupted regions of an image to blend seamlessly with its surrounding content and style. While conditional diffusion models have proven effective for text-guided inpainting, we introduce the novel task of multi-mask inpainting, where multiple regions are simultaneously inpainted using distinct prompts. Furthermore, we design a fine-tuning procedure for multimodal LLMs, such as LLaVA, to generate multi-mask prompts automatically using corrupted images as inputs. These models can generate helpful and detailed prompt suggestions for filling the masked regions. The generated prompts are then fed to Stable Diffusion, which is fine-tuned for the multi-mask inpainting problem using rectified cross-attention, enforcing prompts onto their designated regions for filling. Experiments on digitized paintings from WikiArt and the Densely Captioned Images dataset demonstrate that our pipeline delivers creative and accurate inpainting results. Our code, data, and trained models are available at https://cilabuniba.github.io/i-dream-my-painting.

* Accepted at WACV 2025

Via

Access Paper or Ask Questions

Neural network modelling of kinematic and dynamic features for signature verification

Nov 26, 2024

Moises Diaz, Miguel A. Ferrer, Jose Juan Quintana, Adam Wolniakowski, Roman Trochimczuk, Konstantsin Miatliuk, Giovanna Castellano, Gennaro Vessio

Abstract:Online signature parameters, which are based on human characteristics, broaden the applicability of an automatic signature verifier. Although kinematic and dynamic features have previously been suggested, accurately measuring features such as arm and forearm torques remains challenging. We present two approaches for estimating angular velocities, angular positions, and force torques. The first approach involves using a physical UR5e robotic arm to reproduce a signature while capturing those parameters over time. The second method, a cost effective approach, uses a neural network to estimate the same parameters. Our findings demonstrate that a simple neural network model can extract effective parameters for signature verification. Training the neural network with the MCYT300 dataset and cross validating with other databases, namely, BiosecurID, Visual, Blind, OnOffSigDevanagari 75 and OnOffSigBengali 75 confirm the models generalization capability.

* Procedia Computer Science, Volume 3, 2011, Pages 155-161

Via

Access Paper or Ask Questions

Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation

Oct 07, 2024

Ivan Rinaldi, Nicola Fanelli, Giovanna Castellano, Gennaro Vessio

Figure 1 for Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation

Figure 2 for Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation

Figure 3 for Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation

Figure 4 for Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation

Abstract:Artificial Intelligence and generative models have revolutionized music creation, with many models leveraging textual or visual prompts for guidance. However, existing image-to-music models are limited to simple images, lacking the capability to generate music from complex digitized artworks. To address this gap, we introduce $\mathcal{A}\textit{rt2}\mathcal{M}\textit{us}$, a novel model designed to create music from digitized artworks or text inputs. $\mathcal{A}\textit{rt2}\mathcal{M}\textit{us}$ extends the AudioLDM~2 architecture, a text-to-audio model, and employs our newly curated datasets, created via ImageBind, which pair digitized artworks with music. Experimental results demonstrate that $\mathcal{A}\textit{rt2}\mathcal{M}\textit{us}$ can generate music that resonates with the input stimuli. These findings suggest promising applications in multimedia art, interactive installations, and AI-driven creative tools.

* Presented at the AI for Visual Arts (AI4VA) workshop at ECCV 2024

Via

Access Paper or Ask Questions

What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?

Sep 12, 2024

Lorenzo Loconte, Antonio Mari, Gennaro Gala, Robert Peharz, Cassio de Campos, Erik Quaeghebeur, Gennaro Vessio, Antonio Vergari

Figure 1 for What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?

Figure 2 for What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?

Figure 3 for What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?

Figure 4 for What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?

Abstract:This paper establishes a rigorous connection between circuit representations and tensor factorizations, two seemingly distinct yet fundamentally related areas. By connecting these fields, we highlight a series of opportunities that can benefit both communities. Our work generalizes popular tensor factorizations within the circuit language, and unifies various circuit learning algorithms under a single, generalized hierarchical factorization framework. Specifically, we introduce a modular "Lego block" approach to build tensorized circuit architectures. This, in turn, allows us to systematically construct and explore various circuit and tensor factorization models while maintaining tractability. This connection not only clarifies similarities and differences in existing models, but also enables the development of a comprehensive pipeline for building and optimizing new circuit/tensor factorization architectures. We show the effectiveness of our framework through extensive empirical evaluations, and highlight new research opportunities for tensor factorizations in probabilistic modeling.

Via

Access Paper or Ask Questions

Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

Jul 02, 2024

Pasquale De Marinis, Nicola Fanelli, Raffaele Scaringi, Emanuele Colonna, Giuseppe Fiameni, Gennaro Vessio, Giovanna Castellano

Figure 1 for Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

Figure 2 for Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

Figure 3 for Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

Figure 4 for Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

Abstract:We present Label Anything, an innovative neural network architecture designed for few-shot semantic segmentation (FSS) that demonstrates remarkable generalizability across multiple classes with minimal examples required per class. Diverging from traditional FSS methods that predominantly rely on masks for annotating support images, Label Anything introduces varied visual prompts -- points, bounding boxes, and masks -- thereby enhancing the framework's versatility and adaptability. Unique to our approach, Label Anything is engineered for end-to-end training across multi-class FSS scenarios, efficiently learning from diverse support set configurations without retraining. This approach enables a "universal" application to various FSS challenges, ranging from $1$-way $1$-shot to complex $N$-way $K$-shot configurations while remaining agnostic to the specific number of class examples. This innovative training strategy reduces computational requirements and substantially improves the model's adaptability and generalization across diverse segmentation tasks. Our comprehensive experimental validation, particularly achieving state-of-the-art results on the COCO-$20^i$ benchmark, underscores Label Anything's robust generalization and flexibility. The source code is publicly available at: https://github.com/pasqualedem/LabelAnything.

Via

Access Paper or Ask Questions

Dynamically enhanced static handwriting representation for Parkinson's disease detection

May 22, 2024

Moises Diaz, Miguel Angel Ferrer, Donato Impedovo, Giuseppe Pirlo, Gennaro Vessio

Abstract:Computer aided diagnosis systems can provide non-invasive, low-cost tools to support clinicians. These systems have the potential to assist the diagnosis and monitoring of neurodegenerative disorders, in particular Parkinson's disease (PD). Handwriting plays a special role in the context of PD assessment. In this paper, the discriminating power of "dynamically enhanced" static images of handwriting is investigated. The enhanced images are synthetically generated by exploiting simultaneously the static and dynamic properties of handwriting. Specifically, we propose a static representation that embeds dynamic information based on: (i) drawing the points of the samples, instead of linking them, so as to retain temporal/velocity information; and (ii) adding pen-ups for the same purpose. To evaluate the effectiveness of the new handwriting representation, a fair comparison between this approach and state-of-the-art methods based on static and dynamic handwriting is conducted on the same dataset, i.e. PaHaW. The classification workflow employs transfer learning to extract meaningful features from multiple representations of the input data. An ensemble of different classifiers is used to achieve the final predictions. Dynamically enhanced static handwriting is able to outperform the results obtained by using static and dynamic handwriting separately.

* Pattern Recognition Letters, vol. 128, pp. 204-210 (2019)

Via

Access Paper or Ask Questions

Explainable offline automatic signature verifier to support forensic handwriting examiners

May 21, 2024

Moises Diaz, Miguel A. Ferrer, Gennaro Vessio

Abstract:Signature verification is a critical task in many applications, including forensic science, legal judgments, and financial markets. However, current signature verification systems are often difficult to explain, which can limit their acceptance in these applications. In this paper, we propose a novel explainable offline automatic signature verifier (ASV) to support forensic handwriting examiners. Our ASV is based on a universal background model (UBM) constructed from offline signature images. It allows us to assign a questioned signature to the UBM and to a reference set of known signatures using simple distance measures. This makes it possible to explain the verifier's decision in a way that is understandable to non experts. We evaluated our ASV on publicly available databases and found that it achieves competitive performance with state of the art ASVs, even when challenging 1 versus 1 comparison are considered. Our results demonstrate that it is possible to develop an explainable ASV that is also competitive in terms of performance. We believe that our ASV has the potential to improve the acceptance of signature verification in critical applications such as forensic science and legal judgments.

* Neural Computing and Applications, Volume 36, pages 2411 to 2427 (2024)

Via

Access Paper or Ask Questions

Density-based clustering with fully-convolutional networks for crowd flow detection from drones

Jan 12, 2023

Giovanna Castellano, Eugenio Cotardo, Corrado Mencar, Gennaro Vessio

Figure 1 for Density-based clustering with fully-convolutional networks for crowd flow detection from drones

Figure 2 for Density-based clustering with fully-convolutional networks for crowd flow detection from drones

Figure 3 for Density-based clustering with fully-convolutional networks for crowd flow detection from drones

Figure 4 for Density-based clustering with fully-convolutional networks for crowd flow detection from drones

Abstract:Crowd analysis from drones has attracted increasing attention in recent times due to the ease of use and affordable cost of these devices. However, how this technology can provide a solution to crowd flow detection is still an unexplored research question. To this end, we propose a crowd flow detection method for video sequences shot by a drone. The method is based on a fully-convolutional network that learns to perform crowd clustering in order to detect the centroids of crowd-dense areas and track their movement in consecutive frames. The proposed method proved effective and efficient when tested on the Crowd Counting datasets of the VisDrone challenge, characterized by video sequences rather than still images. The encouraging results show that the proposed method could open up new ways of analyzing high-level crowd behavior from drones.

* Neurocomputing (2023)
* Accepted manuscript

Via

Access Paper or Ask Questions

VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Jul 19, 2021

Dawei Du, Longyin Wen, Pengfei Zhu, Heng Fan, Qinghua Hu, Haibin Ling, Mubarak Shah, Junwen Pan, Ali Al-Ali, Amr Mohamed(+45 more)

Figure 1 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Figure 2 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Figure 3 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Figure 4 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Abstract:Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint. However, there are few algorithms focusing on crowd counting on the drone-captured data due to the lack of comprehensive datasets. To this end, we collect a large-scale dataset and organize the Vision Meets Drone Crowd Counting Challenge (VisDrone-CC2020) in conjunction with the 16th European Conference on Computer Vision (ECCV 2020) to promote the developments in the related fields. The collected dataset is formed by $3,360$ images, including $2,460$ images for training, and $900$ images for testing. Specifically, we manually annotate persons with points in each video frame. There are $14$ algorithms from $15$ institutes submitted to the VisDrone-CC2020 Challenge. We provide a detailed analysis of the evaluation results and conclude the challenge. More information can be found at the website: \url{http://www.aiskyeye.com/}.

* European Conference on Computer Vision. Springer, Cham, 2020: 675-691
* The method description of A7 Mutil-Scale Aware based SFANet (M-SFANet) is updated and missing references are added

Via

Access Paper or Ask Questions

A deep learning approach to clustering visual arts

Jun 11, 2021

Giovanna Castellano, Gennaro Vessio

Figure 1 for A deep learning approach to clustering visual arts

Figure 2 for A deep learning approach to clustering visual arts

Figure 3 for A deep learning approach to clustering visual arts

Figure 4 for A deep learning approach to clustering visual arts

Abstract:Clustering artworks is difficult for several reasons. On the one hand, recognizing meaningful patterns based on domain knowledge and visual perception is extremely hard. On the other hand, applying traditional clustering and feature reduction techniques to the highly dimensional pixel space can be ineffective. To address these issues, in this paper we propose DELIUS: a DEep learning approach to cLustering vIsUal artS. The method uses a pre-trained convolutional network to extract features and then feeds these features into a deep embedded clustering model, where the task of mapping the raw input data to a latent space is jointly optimized with the task of finding a set of cluster centroids in this latent space. Quantitative and qualitative experimental results show the effectiveness of the proposed method. DELIUS can be useful for several tasks related to art analysis, in particular visual link retrieval and historical knowledge discovery in painting datasets.

* Submitted to IJCV

Via

Access Paper or Ask Questions