Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Subhadeep Koley

Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection

Jul 10, 2025

Subhajit Maity, Ayan Kumar Bhunia, Subhadeep Koley, Pinaki Nath Chowdhury, Aneeshan Sain, Yi-Zhe Song

Figure 1 for Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection

Figure 2 for Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection

Figure 3 for Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection

Figure 4 for Doodle Your Keypoints: Sketch-Based Few-Shot Keypoint Detection

Abstract:Keypoint detection, integral to modern machine perception, faces challenges in few-shot learning, particularly when source data from the same distribution as the query is unavailable. This gap is addressed by leveraging sketches, a popular form of human expression, providing a source-free alternative. However, challenges arise in mastering cross-modal embeddings and handling user-specific sketch styles. Our proposed framework overcomes these hurdles with a prototypical setup, combined with a grid-based locator and prototypical domain adaptation. We also demonstrate success in few-shot convergence across novel keypoints and classes through extensive experiments.

* Accepted at ICCV 2025. Project Page: https://subhajitmaity.me/DYKp

Via

Access Paper or Ask Questions

Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch

May 29, 2025

Aneeshan Sain, Subhajit Maity, Pinaki Nath Chowdhury, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song

Figure 1 for Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch

Figure 2 for Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch

Figure 3 for Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch

Figure 4 for Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch

Abstract:As sketch research has collectively matured over time, its adaptation for at-mass commercialisation emerges on the immediate horizon. Despite an already mature research endeavour for photos, there is no research on the efficient inference specifically designed for sketch data. In this paper, we first demonstrate existing state-of-the-art efficient light-weight models designed for photos do not work on sketches. We then propose two sketch-specific components which work in a plug-n-play manner on any photo efficient network to adapt them to work on sketch data. We specifically chose fine-grained sketch-based image retrieval (FG-SBIR) as a demonstrator as the most recognised sketch problem with immediate commercial value. Technically speaking, we first propose a cross-modal knowledge distillation network to transfer existing photo efficient networks to be compatible with sketch, which brings down number of FLOPs and model parameters by 97.96% percent and 84.89% respectively. We then exploit the abstract trait of sketch to introduce a RL-based canvas selector that dynamically adjusts to the abstraction level which further cuts down number of FLOPs by two thirds. The end result is an overall reduction of 99.37% of FLOPs (from 40.18G to 0.254G) when compared with a full network, while retaining the accuracy (33.03% vs 32.77%) -- finally making an efficient network for the sparse sketch data that exhibit even fewer FLOPs than the best photo counterpart.

* Accepted at CVPR 2025, Project Page: https://subhajitmaity.me/SketchDownTheFLOPs

Via

Access Paper or Ask Questions

SketchFusion: Learning Universal Sketch Features through Fusing Foundation Models

Mar 18, 2025

Subhadeep Koley, Tapas Kumar Dutta, Aneeshan Sain, Pinaki Nath Chowdhury, Ayan Kumar Bhunia, Yi-Zhe Song

Abstract:While foundation models have revolutionised computer vision, their effectiveness for sketch understanding remains limited by the unique challenges of abstract, sparse visual inputs. Through systematic analysis, we uncover two fundamental limitations: Stable Diffusion (SD) struggles to extract meaningful features from abstract sketches (unlike its success with photos), and exhibits a pronounced frequency-domain bias that suppresses essential low-frequency components needed for sketch understanding. Rather than costly retraining, we address these limitations by strategically combining SD with CLIP, whose strong semantic understanding naturally compensates for SD's spatial-frequency biases. By dynamically injecting CLIP features into SD's denoising process and adaptively aggregating features across semantic levels, our method achieves state-of-the-art performance in sketch retrieval (+3.35%), recognition (+1.06%), segmentation (+29.42%), and correspondence learning (+21.22%), demonstrating the first truly universal sketch feature representation in the era of foundation models.

* Accepted in CVPR 2025. Project page available at https://subhadeepkoley.github.io/SketchFusion/

Via

Access Paper or Ask Questions

Freestyle Sketch-in-the-Loop Image Segmentation

Jan 27, 2025

Subhadeep Koley, Viswanatha Reddy Gajjala, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Ayan Kumar Bhunia, Yi-Zhe Song

Abstract:In this paper, we expand the domain of sketch research into the field of image segmentation, aiming to establish freehand sketches as a query modality for subjective image segmentation. Our innovative approach introduces a "sketch-in-the-loop" image segmentation framework, enabling the segmentation of visual concepts partially, completely, or in groupings - a truly "freestyle" approach - without the need for a purpose-made dataset (i.e., mask-free). This framework capitalises on the synergy between sketch-based image retrieval (SBIR) models and large-scale pre-trained models (CLIP or DINOv2). The former provides an effective training signal, while fine-tuned versions of the latter execute the subjective segmentation. Additionally, our purpose-made augmentation strategy enhances the versatility of our sketch-guided mask generation, allowing segmentation at multiple granularity levels. Extensive evaluations across diverse benchmark datasets underscore the superior performance of our method in comparison to existing approaches across various evaluation scenarios.

Via

Access Paper or Ask Questions

DreamColour: Controllable Video Colour Editing without Training

Dec 06, 2024

Chaitat Utintu, Pinaki Nath Chowdhury, Aneeshan Sain, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song

Figure 1 for DreamColour: Controllable Video Colour Editing without Training

Figure 2 for DreamColour: Controllable Video Colour Editing without Training

Figure 3 for DreamColour: Controllable Video Colour Editing without Training

Figure 4 for DreamColour: Controllable Video Colour Editing without Training

Abstract:Video colour editing is a crucial task for content creation, yet existing solutions either require painstaking frame-by-frame manipulation or produce unrealistic results with temporal artefacts. We present a practical, training-free framework that makes precise video colour editing accessible through an intuitive interface while maintaining professional-quality output. Our key insight is that by decoupling spatial and temporal aspects of colour editing, we can better align with users' natural workflow -- allowing them to focus on precise colour selection in key frames before automatically propagating changes across time. We achieve this through a novel technical framework that combines: (i) a simple point-and-click interface merging grid-based colour selection with automatic instance segmentation for precise spatial control, (ii) bidirectional colour propagation that leverages inherent video motion patterns, and (iii) motion-aware blending that ensures smooth transitions even with complex object movements. Through extensive evaluation on diverse scenarios, we demonstrate that our approach matches or exceeds state-of-the-art methods while eliminating the need for training or specialized hardware, making professional-quality video colour editing accessible to everyone.

* Project page available at https://chaitron.github.io/DreamColour-demo

Via

Access Paper or Ask Questions

Do Generalised Classifiers really work on Human Drawn Sketches?

Jul 04, 2024

Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Aneeshan Sain, Subhadeep Koley, Tao Xiang, Ayan Kumar Bhunia, Yi-Zhe Song

Figure 1 for Do Generalised Classifiers really work on Human Drawn Sketches?

Figure 2 for Do Generalised Classifiers really work on Human Drawn Sketches?

Figure 3 for Do Generalised Classifiers really work on Human Drawn Sketches?

Abstract:This paper, for the first time, marries large foundation models with human sketch understanding. We demonstrate what this brings -- a paradigm shift in terms of generalised sketch representation learning (e.g., classification). This generalisation happens on two fronts: (i) generalisation across unknown categories (i.e., open-set), and (ii) generalisation traversing abstraction levels (i.e., good and bad sketches), both being timely challenges that remain unsolved in the sketch literature. Our design is intuitive and centred around transferring the already stellar generalisation ability of CLIP to benefit generalised learning for sketches. We first "condition" the vanilla CLIP model by learning sketch-specific prompts using a novel auxiliary head of raster to vector sketch conversion. This importantly makes CLIP "sketch-aware". We then make CLIP acute to the inherently different sketch abstraction levels. This is achieved by learning a codebook of abstraction-specific prompt biases, a weighted combination of which facilitates the representation of sketches across abstraction levels -- low abstract edge-maps, medium abstract sketches in TU-Berlin, and highly abstract doodles in QuickDraw. Our framework surpasses popular sketch representation learning algorithms in both zero-shot and few-shot setups and in novel settings across different abstraction boundaries.

* ECCV 2024

Via

Access Paper or Ask Questions

Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval

Jul 01, 2024

Aneeshan Sain, Pinaki Nath Chowdhury, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song

Figure 1 for Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval

Figure 2 for Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval

Figure 3 for Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval

Figure 4 for Freeview Sketching: View-Aware Fine-Grained Sketch-Based Image Retrieval

Abstract:In this paper, we delve into the intricate dynamics of Fine-Grained Sketch-Based Image Retrieval (FG-SBIR) by addressing a critical yet overlooked aspect -- the choice of viewpoint during sketch creation. Unlike photo systems that seamlessly handle diverse views through extensive datasets, sketch systems, with limited data collected from fixed perspectives, face challenges. Our pilot study, employing a pre-trained FG-SBIR model, highlights the system's struggle when query-sketches differ in viewpoint from target instances. Interestingly, a questionnaire however shows users desire autonomy, with a significant percentage favouring view-specific retrieval. To reconcile this, we advocate for a view-aware system, seamlessly accommodating both view-agnostic and view-specific tasks. Overcoming dataset limitations, our first contribution leverages multi-view 2D projections of 3D objects, instilling cross-modal view awareness. The second contribution introduces a customisable cross-modal feature through disentanglement, allowing effortless mode switching. Extensive experiments on standard datasets validate the effectiveness of our method.

* Accepted in European Conference on Computer Vision (ECCV) 2024

Via

Access Paper or Ask Questions

SketchDeco: Decorating B&W Sketches with Colour

May 29, 2024

Chaitat Utintu, Pinaki Nath Chowdhury, Aneeshan Sain, Subhadeep Koley, Ayan Kumar Bhunia, Yi-Zhe Song

Figure 1 for SketchDeco: Decorating B&W Sketches with Colour

Figure 2 for SketchDeco: Decorating B&W Sketches with Colour

Figure 3 for SketchDeco: Decorating B&W Sketches with Colour

Figure 4 for SketchDeco: Decorating B&W Sketches with Colour

Abstract:This paper introduces a novel approach to sketch colourisation, inspired by the universal childhood activity of colouring and its professional applications in design and story-boarding. Striking a balance between precision and convenience, our method utilises region masks and colour palettes to allow intuitive user control, steering clear of the meticulousness of manual colour assignments or the limitations of textual prompts. By strategically combining ControlNet and staged generation, incorporating Stable Diffusion v1.5, and leveraging BLIP-2 text prompts, our methodology facilitates faithful image generation and user-directed colourisation. Addressing challenges of local and global consistency, we employ inventive solutions such as an inversion scheme, guided sampling, and a self-attention mechanism with a scaling factor. The resulting tool is not only fast and training-free but also compatible with consumer-grade Nvidia RTX 4090 Super GPUs, making it a valuable asset for both creative professionals and enthusiasts in various fields. Project Page: \url{https://chaitron.github.io/SketchDeco/}

Via

Access Paper or Ask Questions

You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

Mar 20, 2024

Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song

Figure 1 for You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

Figure 2 for You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

Figure 3 for You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

Figure 4 for You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval

Abstract:Two primary input modalities prevail in image retrieval: sketch and text. While text is widely used for inter-category retrieval tasks, sketches have been established as the sole preferred modality for fine-grained image retrieval due to their ability to capture intricate visual details. In this paper, we question the reliance on sketches alone for fine-grained image retrieval by simultaneously exploring the fine-grained representation capabilities of both sketch and text, orchestrating a duet between the two. The end result enables precise retrievals previously unattainable, allowing users to pose ever-finer queries and incorporate attributes like colour and contextual cues from text. For this purpose, we introduce a novel compositionality framework, effectively combining sketches and text using pre-trained CLIP models, while eliminating the need for extensive fine-grained textual descriptions. Last but not least, our system extends to novel applications in composed image retrieval, domain attribute transfer, and fine-grained generation, providing solutions for various real-world scenarios.

* Accepted in CVPR 2024. Project page available at https://subhadeepkoley.github.io/Sketch2Word

Via

Access Paper or Ask Questions

Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers

Mar 20, 2024

Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain, Pinaki Nath Chowdhury, Tao Xiang, Yi-Zhe Song

Figure 1 for Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers

Figure 2 for Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers

Figure 3 for Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers

Figure 4 for Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers

Abstract:This paper, for the first time, explores text-to-image diffusion models for Zero-Shot Sketch-based Image Retrieval (ZS-SBIR). We highlight a pivotal discovery: the capacity of text-to-image diffusion models to seamlessly bridge the gap between sketches and photos. This proficiency is underpinned by their robust cross-modal capabilities and shape bias, findings that are substantiated through our pilot studies. In order to harness pre-trained diffusion models effectively, we introduce a straightforward yet powerful strategy focused on two key aspects: selecting optimal feature layers and utilising visual and textual prompts. For the former, we identify which layers are most enriched with information and are best suited for the specific retrieval requirements (category-level or fine-grained). Then we employ visual and textual prompts to guide the model's feature extraction process, enabling it to generate more discriminative and contextually relevant cross-modal representations. Extensive experiments on several benchmark datasets validate significant performance improvements.

* Accepted in CVPR 2024. Project page available at https://subhadeepkoley.github.io/DiffusionZSSBIR

Via

Access Paper or Ask Questions