Abstract:Traditional Chinese Painting (TCP) is an invaluable cultural heritage resource and a unique visual art style. In recent years, increasing interest has been placed on digitalizing TCPs to preserve and revive the culture. The resulting digital copies have enabled the advancement of computational methods for structured and systematic understanding of TCPs. To explore this topic, we conducted an in-depth analysis of 92 pieces of literature. We examined the current use of computer technologies on TCPs from three perspectives, based on numerous conversations with specialists. First, in light of the "Six Principles of Painting" theory, we categorized the articles according to their research focus on artistic elements. Second, we created a four-stage framework to illustrate the purposes of TCP applications. Third, we summarized the popular computational techniques applied to TCPs. The framework also provides insights into potential applications and future prospects, with professional opinion. The list of surveyed publications and related information is available online at https://ca4tcp.com.
Abstract:Digital humanities research has flourished due to the diverse artifacts available in cultural heritage databases. However, over-reliance on a single artifact type can result in poor contextualization and a constrained understanding of historical context. We collaborated with art historians to examine handscrolls, a form of traditional Chinese painting which offers a wealth of data for historical analysis and provides a unique opportunity for understanding history through artwork. We propose ScrollTimes, a visual analysis system for tracing handscroll historic context by linking multiple data sources. Specifically, a unique layout is developed for efficiently viewing long handscrolls. Using image processing techniques and language models, we extract, verify, and supplement elements in handscrolls with different cultural heritage databases. Furthermore, interactive biographies are constructed for handscrolls to uncover their historical narratives, provenance trajectories, and artistic legacies. Validated through case studies and expert interviews, our approach offers a window into history, fostering a holistic understanding of handscroll provenance and historical significance.
Abstract:Few-shot segmentation (FSS) aims at performing semantic segmentation on novel classes given a few annotated support samples. With a rethink of recent advances, we find that the current FSS framework has deviated far from the supervised segmentation framework: Given the deep features, FSS methods typically use an intricate decoder to perform sophisticated pixel-wise matching, while the supervised segmentation methods use a simple linear classification head. Due to the intricacy of the decoder and its matching pipeline, it is not easy to follow such an FSS framework. This paper revives the straightforward framework of "feature extractor $+$ linear classification head" and proposes a novel Feature-Proxy Transformer (FPTrans) method, in which the "proxy" is the vector representing a semantic class in the linear classification head. FPTrans has two keypoints for learning discriminative features and representative proxies: 1) To better utilize the limited support samples, the feature extractor makes the query interact with the support features from the bottom to top layers using a novel prompting strategy. 2) FPTrans uses multiple local background proxies (instead of a single one) because the background is not homogeneous and may contain some novel foreground regions. These two keypoints are easily integrated into the vision transformer backbone with the prompting mechanism in the transformer. Given the learned features and proxies, FPTrans directly compares their cosine similarity for segmentation. Although the framework is straightforward, we show that FPTrans achieves competitive FSS accuracy on par with state-of-the-art decoder-based methods.
Abstract:Neurofibromatosis type 1 (NF1) is an autosomal dominant tumor predisposition syndrome that involves the central and peripheral nervous systems. Accurate detection and segmentation of neurofibromas are essential for assessing tumor burden and longitudinal tumor size changes. Automatic convolutional neural networks (CNNs) are sensitive and vulnerable as tumors' variable anatomical location and heterogeneous appearance on MRI. In this study, we propose deep interactive networks (DINs) to address the above limitations. User interactions guide the model to recognize complicated tumors and quickly adapt to heterogeneous tumors. We introduce a simple but effective Exponential Distance Transform (ExpDT) that converts user interactions into guide maps regarded as the spatial and appearance prior. Comparing with popular Euclidean and geodesic distances, ExpDT is more robust to various image sizes, which reserves the distribution of interactive inputs. Furthermore, to enhance the tumor-related features, we design a deep interactive module to propagate the guides into deeper layers. We train and evaluate DINs on three MRI data sets from NF1 patients. The experiment results yield significant improvements of 44% and 14% in DSC comparing with automated and other interactive methods, respectively. We also experimentally demonstrate the efficiency of DINs in reducing user burden when comparing with conventional interactive methods. The source code of our method is available at \url{https://github.com/Jarvis73/DINs}.
Abstract:Few-shot segmentation~(FSS) performance has been extensively promoted by introducing episodic training and class-wise prototypes. However, the FSS problem remains challenging due to three limitations: (1) Models are distracted by task-unrelated information; (2) The representation ability of a single prototype is limited; (3) Class-related prototypes ignore the prior knowledge of base classes. We propose the Prior-Enhanced network with Meta-Prototypes to tackle these limitations. The prior-enhanced network leverages the support and query (pseudo-) labels in feature extraction, which guides the model to focus on the task-related features of the foreground objects, and suppress much noise due to the lack of supervised knowledge. Moreover, we introduce multiple meta-prototypes to encode hierarchical features and learn class-agnostic structural information. The hierarchical features help the model highlight the decision boundary and focus on hard pixels, and the structural information learned from base classes is treated as the prior knowledge for novel classes. Experiments show that our method achieves the mean-IoU scores of 60.79% and 41.16% on PASCAL-$5^i$ and COCO-$20^i$, outperforming the state-of-the-art method by 3.49% and 5.64% in the 5-shot setting. Moreover, comparing with 1-shot results, our method promotes 5-shot accuracy by 3.73% and 10.32% on the above two benchmarks. The source code of our method is available at https://github.com/Jarvis73/PEMP.