Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Giovanna Castellano

I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting

Nov 28, 2024

Nicola Fanelli, Gennaro Vessio, Giovanna Castellano

Figure 1 for I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting

Figure 2 for I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting

Figure 3 for I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting

Figure 4 for I Dream My Painting: Connecting MLLMs and Diffusion Models via Prompt Generation for Text-Guided Multi-Mask Inpainting

Abstract:Inpainting focuses on filling missing or corrupted regions of an image to blend seamlessly with its surrounding content and style. While conditional diffusion models have proven effective for text-guided inpainting, we introduce the novel task of multi-mask inpainting, where multiple regions are simultaneously inpainted using distinct prompts. Furthermore, we design a fine-tuning procedure for multimodal LLMs, such as LLaVA, to generate multi-mask prompts automatically using corrupted images as inputs. These models can generate helpful and detailed prompt suggestions for filling the masked regions. The generated prompts are then fed to Stable Diffusion, which is fine-tuned for the multi-mask inpainting problem using rectified cross-attention, enforcing prompts onto their designated regions for filling. Experiments on digitized paintings from WikiArt and the Densely Captioned Images dataset demonstrate that our pipeline delivers creative and accurate inpainting results. Our code, data, and trained models are available at https://cilabuniba.github.io/i-dream-my-painting.

* Accepted at WACV 2025

Via

Access Paper or Ask Questions

Neural network modelling of kinematic and dynamic features for signature verification

Nov 26, 2024

Moises Diaz, Miguel A. Ferrer, Jose Juan Quintana, Adam Wolniakowski, Roman Trochimczuk, Konstantsin Miatliuk, Giovanna Castellano, Gennaro Vessio

Abstract:Online signature parameters, which are based on human characteristics, broaden the applicability of an automatic signature verifier. Although kinematic and dynamic features have previously been suggested, accurately measuring features such as arm and forearm torques remains challenging. We present two approaches for estimating angular velocities, angular positions, and force torques. The first approach involves using a physical UR5e robotic arm to reproduce a signature while capturing those parameters over time. The second method, a cost effective approach, uses a neural network to estimate the same parameters. Our findings demonstrate that a simple neural network model can extract effective parameters for signature verification. Training the neural network with the MCYT300 dataset and cross validating with other databases, namely, BiosecurID, Visual, Blind, OnOffSigDevanagari 75 and OnOffSigBengali 75 confirm the models generalization capability.

* Procedia Computer Science, Volume 3, 2011, Pages 155-161

Via

Access Paper or Ask Questions

Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation

Oct 07, 2024

Ivan Rinaldi, Nicola Fanelli, Giovanna Castellano, Gennaro Vessio

Figure 1 for Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation

Figure 2 for Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation

Figure 3 for Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation

Figure 4 for Art2Mus: Bridging Visual Arts and Music through Cross-Modal Generation

Abstract:Artificial Intelligence and generative models have revolutionized music creation, with many models leveraging textual or visual prompts for guidance. However, existing image-to-music models are limited to simple images, lacking the capability to generate music from complex digitized artworks. To address this gap, we introduce $\mathcal{A}\textit{rt2}\mathcal{M}\textit{us}$, a novel model designed to create music from digitized artworks or text inputs. $\mathcal{A}\textit{rt2}\mathcal{M}\textit{us}$ extends the AudioLDM~2 architecture, a text-to-audio model, and employs our newly curated datasets, created via ImageBind, which pair digitized artworks with music. Experimental results demonstrate that $\mathcal{A}\textit{rt2}\mathcal{M}\textit{us}$ can generate music that resonates with the input stimuli. These findings suggest promising applications in multimedia art, interactive installations, and AI-driven creative tools.

* Presented at the AI for Visual Arts (AI4VA) workshop at ECCV 2024

Via

Access Paper or Ask Questions

RoWeeder: Unsupervised Weed Mapping through Crop-Row Detection

Oct 07, 2024

Pasquale De Marinis, Rino Vessio, Giovanna Castellano

Abstract:Precision agriculture relies heavily on effective weed management to ensure robust crop yields. This study presents RoWeeder, an innovative framework for unsupervised weed mapping that combines crop-row detection with a noise-resilient deep learning model. By leveraging crop-row information to create a pseudo-ground truth, our method trains a lightweight deep learning model capable of distinguishing between crops and weeds, even in the presence of noisy data. Evaluated on the WeedMap dataset, RoWeeder achieves an F1 score of 75.3, outperforming several baselines. Comprehensive ablation studies further validated the model's performance. By integrating RoWeeder with drone technology, farmers can conduct real-time aerial surveys, enabling precise weed management across large fields. The code is available at: \url{https://github.com/pasqualedem/RoWeeder}.

* Computer Vision for Plant Phenotyping and Agriculture (CVPPA) workshop at ECCV 2024

Via

Access Paper or Ask Questions

Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

Jul 02, 2024

Pasquale De Marinis, Nicola Fanelli, Raffaele Scaringi, Emanuele Colonna, Giuseppe Fiameni, Gennaro Vessio, Giovanna Castellano

Figure 1 for Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

Figure 2 for Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

Figure 3 for Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

Figure 4 for Label Anything: Multi-Class Few-Shot Semantic Segmentation with Visual Prompts

Abstract:We present Label Anything, an innovative neural network architecture designed for few-shot semantic segmentation (FSS) that demonstrates remarkable generalizability across multiple classes with minimal examples required per class. Diverging from traditional FSS methods that predominantly rely on masks for annotating support images, Label Anything introduces varied visual prompts -- points, bounding boxes, and masks -- thereby enhancing the framework's versatility and adaptability. Unique to our approach, Label Anything is engineered for end-to-end training across multi-class FSS scenarios, efficiently learning from diverse support set configurations without retraining. This approach enables a "universal" application to various FSS challenges, ranging from $1$-way $1$-shot to complex $N$-way $K$-shot configurations while remaining agnostic to the specific number of class examples. This innovative training strategy reduces computational requirements and substantially improves the model's adaptability and generalization across diverse segmentation tasks. Our comprehensive experimental validation, particularly achieving state-of-the-art results on the COCO-$20^i$ benchmark, underscores Label Anything's robust generalization and flexibility. The source code is publicly available at: https://github.com/pasqualedem/LabelAnything.

Via

Access Paper or Ask Questions

Density-based clustering with fully-convolutional networks for crowd flow detection from drones

Jan 12, 2023

Giovanna Castellano, Eugenio Cotardo, Corrado Mencar, Gennaro Vessio

Figure 1 for Density-based clustering with fully-convolutional networks for crowd flow detection from drones

Figure 2 for Density-based clustering with fully-convolutional networks for crowd flow detection from drones

Figure 3 for Density-based clustering with fully-convolutional networks for crowd flow detection from drones

Figure 4 for Density-based clustering with fully-convolutional networks for crowd flow detection from drones

Abstract:Crowd analysis from drones has attracted increasing attention in recent times due to the ease of use and affordable cost of these devices. However, how this technology can provide a solution to crowd flow detection is still an unexplored research question. To this end, we propose a crowd flow detection method for video sequences shot by a drone. The method is based on a fully-convolutional network that learns to perform crowd clustering in order to detect the centroids of crowd-dense areas and track their movement in consecutive frames. The proposed method proved effective and efficient when tested on the Crowd Counting datasets of the VisDrone challenge, characterized by video sequences rather than still images. The encouraging results show that the proposed method could open up new ways of analyzing high-level crowd behavior from drones.

* Neurocomputing (2023)
* Accepted manuscript

Via

Access Paper or Ask Questions

VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Jul 19, 2021

Dawei Du, Longyin Wen, Pengfei Zhu, Heng Fan, Qinghua Hu, Haibin Ling, Mubarak Shah, Junwen Pan, Ali Al-Ali, Amr Mohamed(+45 more)

Figure 1 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Figure 2 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Figure 3 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Figure 4 for VisDrone-CC2020: The Vision Meets Drone Crowd Counting Challenge Results

Abstract:Crowd counting on the drone platform is an interesting topic in computer vision, which brings new challenges such as small object inference, background clutter and wide viewpoint. However, there are few algorithms focusing on crowd counting on the drone-captured data due to the lack of comprehensive datasets. To this end, we collect a large-scale dataset and organize the Vision Meets Drone Crowd Counting Challenge (VisDrone-CC2020) in conjunction with the 16th European Conference on Computer Vision (ECCV 2020) to promote the developments in the related fields. The collected dataset is formed by $3,360$ images, including $2,460$ images for training, and $900$ images for testing. Specifically, we manually annotate persons with points in each video frame. There are $14$ algorithms from $15$ institutes submitted to the VisDrone-CC2020 Challenge. We provide a detailed analysis of the evaluation results and conclude the challenge. More information can be found at the website: \url{http://www.aiskyeye.com/}.

* European Conference on Computer Vision. Springer, Cham, 2020: 675-691
* The method description of A7 Mutil-Scale Aware based SFANet (M-SFANet) is updated and missing references are added

Via

Access Paper or Ask Questions

A deep learning approach to clustering visual arts

Jun 11, 2021

Giovanna Castellano, Gennaro Vessio

Figure 1 for A deep learning approach to clustering visual arts

Figure 2 for A deep learning approach to clustering visual arts

Figure 3 for A deep learning approach to clustering visual arts

Figure 4 for A deep learning approach to clustering visual arts

Abstract:Clustering artworks is difficult for several reasons. On the one hand, recognizing meaningful patterns based on domain knowledge and visual perception is extremely hard. On the other hand, applying traditional clustering and feature reduction techniques to the highly dimensional pixel space can be ineffective. To address these issues, in this paper we propose DELIUS: a DEep learning approach to cLustering vIsUal artS. The method uses a pre-trained convolutional network to extract features and then feeds these features into a deep embedded clustering model, where the task of mapping the raw input data to a latent space is jointly optimized with the task of finding a set of cluster centroids in this latent space. Quantitative and qualitative experimental results show the effectiveness of the proposed method. DELIUS can be useful for several tasks related to art analysis, in particular visual link retrieval and historical knowledge discovery in painting datasets.

* Submitted to IJCV

Via

Access Paper or Ask Questions

ArtGraph: Towards an Artistic Knowledge Graph

May 31, 2021

Giovanna Castellano, Giovanni Sansaro, Gennaro Vessio

Figure 1 for ArtGraph: Towards an Artistic Knowledge Graph

Figure 2 for ArtGraph: Towards an Artistic Knowledge Graph

Figure 3 for ArtGraph: Towards an Artistic Knowledge Graph

Figure 4 for ArtGraph: Towards an Artistic Knowledge Graph

Abstract:This paper presents our ongoing work towards ArtGraph: an artistic knowledge graph based on WikiArt and DBpedia. Automatic art analysis has seen an ever-increasing interest from the pattern recognition and computer vision community. However, most of the current work is mainly based solely on digitized artwork images, sometimes supplemented with some metadata and textual comments. A knowledge graph that integrates a rich body of information about artworks, artists, painting schools, etc., in a unified structured framework can provide a valuable resource for more powerful information retrieval and knowledge discovery tools in the artistic domain.

* Submitted to DS2021

Via

Access Paper or Ask Questions

Deep convolutional embedding for digitized painting clustering

Mar 19, 2020

Giovanna Castellano, Gennaro Vessio

Figure 1 for Deep convolutional embedding for digitized painting clustering

Figure 2 for Deep convolutional embedding for digitized painting clustering

Figure 3 for Deep convolutional embedding for digitized painting clustering

Figure 4 for Deep convolutional embedding for digitized painting clustering

Abstract:Clustering artworks is difficult because of several reasons. On one hand, recognizing meaningful patterns in accordance with domain knowledge and visual perception is extremely hard. On the other hand, the application of traditional clustering and feature reduction techniques to the highly dimensional pixel space can be ineffective. To address these issues, we propose a deep convolutional embedding model for clustering digital paintings, in which the task of mapping the input raw data to an abstract, latent space is optimized jointly with the task of finding a set of cluster centroids in this latent feature space. Quantitative and qualitative experimental results show the effectiveness of the proposed method. The model is also able to outperform other state-of-the-art deep clustering approaches to the same problem. The proposed method may be beneficial to several art-related tasks, particularly visual link retrieval and historical knowledge discovery in painting datasets.

Via

Access Paper or Ask Questions