Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chengze Li

NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes

Apr 15, 2025

Tianyang Xu, Haojie Zheng, Chengze Li, Haoxiang Chen, Yixin Liu, Ruoxi Chen, Lichao Sun

Abstract:Retrieval-augmented generation (RAG) empowers large language models to access external and private corpus, enabling factually consistent responses in specific domains. By exploiting the inherent structure of the corpus, graph-based RAG methods further enrich this process by building a knowledge graph index and leveraging the structural nature of graphs. However, current graph-based RAG approaches seldom prioritize the design of graph structures. Inadequately designed graph not only impede the seamless integration of diverse graph algorithms but also result in workflow inconsistencies and degraded performance. To further unleash the potential of graph for RAG, we propose NodeRAG, a graph-centric framework introducing heterogeneous graph structures that enable the seamless and holistic integration of graph-based methodologies into the RAG workflow. By aligning closely with the capabilities of LLMs, this framework ensures a fully cohesive and efficient end-to-end process. Through extensive experiments, we demonstrate that NodeRAG exhibits performance advantages over previous methods, including GraphRAG and LightRAG, not only in indexing time, query time, and storage efficiency but also in delivering superior question-answering performance on multi-hop benchmarks and open-ended head-to-head evaluations with minimal retrieval tokens. Our GitHub repository could be seen at https://github.com/Terry-Xu-666/NodeRAG.

Via

Access Paper or Ask Questions

Hyperstroke: A Novel High-quality Stroke Representation for Assistive Artistic Drawing

Aug 18, 2024

Haoyun Qin, Jian Lin, Hanyuan Liu, Xueting Liu, Chengze Li

Abstract:Assistive drawing aims to facilitate the creative process by providing intelligent guidance to artists. Existing solutions often fail to effectively model intricate stroke details or adequately address the temporal aspects of drawing. We introduce hyperstroke, a novel stroke representation designed to capture precise fine stroke details, including RGB appearance and alpha-channel opacity. Using a Vector Quantization approach, hyperstroke learns compact tokenized representations of strokes from real-life drawing videos of artistic drawing. With hyperstroke, we propose to model assistive drawing via a transformer-based architecture, to enable intuitive and user-friendly drawing applications, which are experimented in our exploratory evaluation.

* 11 pages, 10 figures

Via

Access Paper or Ask Questions

Sketch2Manga: Shaded Manga Screening from Sketch with Diffusion Models

Mar 13, 2024

Jian Lin, Xueting Liu, Chengze Li, Minshan Xie, Tien-Tsin Wong

Abstract:While manga is a popular entertainment form, creating manga is tedious, especially adding screentones to the created sketch, namely manga screening. Unfortunately, there is no existing method that tailors for automatic manga screening, probably due to the difficulty of generating high-quality shaded high-frequency screentones. The classic manga screening approaches generally require user input to provide screentone exemplars or a reference manga image. The recent deep learning models enables the automatic generation by learning from a large-scale dataset. However, the state-of-the-art models still fail to generate high-quality shaded screentones due to the lack of a tailored model and high-quality manga training data. In this paper, we propose a novel sketch-to-manga framework that first generates a color illustration from the sketch and then generates a screentoned manga based on the intensity guidance. Our method significantly outperforms existing methods in generating high-quality manga with shaded high-frequency screentones.

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

Instance-guided Cartoon Editing with a Large-scale Dataset

Dec 04, 2023

Jian Lin, Chengze Li, Xueting Liu, Zhongping Ge

Abstract:Cartoon editing, appreciated by both professional illustrators and hobbyists, allows extensive creative freedom and the development of original narratives within the cartoon domain. However, the existing literature on cartoon editing is complex and leans heavily on manual operations, owing to the challenge of automatic identification of individual character instances. Therefore, an automated segmentation of these elements becomes imperative to facilitate a variety of cartoon editing applications such as visual style editing, motion decomposition and transfer, and the computation of stereoscopic depths for an enriched visual experience. Unfortunately, most current segmentation methods are designed for natural photographs, failing to recognize from the intricate aesthetics of cartoon subjects, thus lowering segmentation quality. The major challenge stems from two key shortcomings: the rarity of high-quality cartoon dedicated datasets and the absence of competent models for high-resolution instance extraction on cartoons. To address this, we introduce a high-quality dataset of over 100k paired high-resolution cartoon images and their instance labeling masks. We also present an instance-aware image segmentation model that can generate accurate, high-resolution segmentation masks for characters in cartoon images. We present that the proposed approach enables a range of segmentation-dependent cartoon editing applications like 3D Ken Burns parallax effects, text-guided cartoon style editing, and puppet animation from illustrations and manga.

* Project page: https://cartoonsegmentation.github.io/ 10 pages, 10 figures

Via

Access Paper or Ask Questions

Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion

Nov 24, 2023

Minshan Xie, Hanyuan Liu, Chengze Li, Tien-Tsin Wong

Figure 1 for Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion

Figure 2 for Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion

Figure 3 for Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion

Figure 4 for Highly Detailed and Temporal Consistent Video Stylization via Synchronized Multi-Frame Diffusion

Abstract:Text-guided video-to-video stylization transforms the visual appearance of a source video to a different appearance guided on textual prompts. Existing text-guided image diffusion models can be extended for stylized video synthesis. However, they struggle to generate videos with both highly detailed appearance and temporal consistency. In this paper, we propose a synchronized multi-frame diffusion framework to maintain both the visual details and the temporal consistency. Frames are denoised in a synchronous fashion, and more importantly, information of different frames is shared since the beginning of the denoising process. Such information sharing ensures that a consensus, in terms of the overall structure and color distribution, among frames can be reached in the early stage of the denoising process before it is too late. The optical flow from the original video serves as the connection, and hence the venue for information sharing, among frames. We demonstrate the effectiveness of our method in generating high-quality and diverse results in extensive experiments. Our method shows superior qualitative and quantitative results compared to state-of-the-art video editing methods.

* 11 pages, 11 figures

Via

Access Paper or Ask Questions

Manga Rescreening with Interpretable Screentone Representation

Jun 07, 2023

Minshan Xie, Chengze Li, Tien-Tsin Wong

Abstract:The process of adapting or repurposing manga pages is a time-consuming task that requires manga artists to manually work on every single screentone region and apply new patterns to create novel screentones across multiple panels. To address this issue, we propose an automatic manga rescreening pipeline that aims to minimize the human effort involved in manga adaptation. Our pipeline automatically recognizes screentone regions and generates novel screentones with newly specified characteristics (e.g., intensity or type). Existing manga generation methods have limitations in understanding and synthesizing complex tone- or intensity-varying regions. To overcome these limitations, we propose a novel interpretable representation of screentones that disentangles their intensity and type features, enabling better recognition and synthesis of screentones. This interpretable screentone representation reduces ambiguity in recognizing intensity-varying regions and provides fine-grained controls during screentone synthesis by decoupling and anchoring the type or the intensity feature. Our proposed method is demonstrated to be effective and convenient through various experiments, showcasing the superiority of the newly proposed pipeline with the interpretable screentone representations.

* 10 pages, 11 figures

Via

Access Paper or Ask Questions

Video Colorization with Pre-trained Text-to-Image Diffusion Models

Jun 02, 2023

Hanyuan Liu, Minshan Xie, Jinbo Xing, Chengze Li, Tien-Tsin Wong

Figure 1 for Video Colorization with Pre-trained Text-to-Image Diffusion Models

Figure 2 for Video Colorization with Pre-trained Text-to-Image Diffusion Models

Figure 3 for Video Colorization with Pre-trained Text-to-Image Diffusion Models

Figure 4 for Video Colorization with Pre-trained Text-to-Image Diffusion Models

Abstract:Video colorization is a challenging task that involves inferring plausible and temporally consistent colors for grayscale frames. In this paper, we present ColorDiffuser, an adaptation of a pre-trained text-to-image latent diffusion model for video colorization. With the proposed adapter-based approach, we repropose the pre-trained text-to-image model to accept input grayscale video frames, with the optional text description, for video colorization. To enhance the temporal coherence and maintain the vividness of colorization across frames, we propose two novel techniques: the Color Propagation Attention and Alternated Sampling Strategy. Color Propagation Attention enables the model to refine its colorization decision based on a reference latent frame, while Alternated Sampling Strategy captures spatiotemporal dependencies by using the next and previous adjacent latent frames alternatively as reference during the generative diffusion sampling steps. This encourages bidirectional color information propagation between adjacent video frames, leading to improved color consistency across frames. We conduct extensive experiments on benchmark datasets, and the results demonstrate the effectiveness of our proposed framework. The evaluations show that ColorDiffuser achieves state-of-the-art performance in video colorization, surpassing existing methods in terms of color fidelity, temporal consistency, and visual quality.

* project page: https://colordiffuser.github.io/

Via

Access Paper or Ask Questions

Improved Diffusion-based Image Colorization via Piggybacked Models

Apr 21, 2023

Hanyuan Liu, Jinbo Xing, Minshan Xie, Chengze Li, Tien-Tsin Wong

Figure 1 for Improved Diffusion-based Image Colorization via Piggybacked Models

Figure 2 for Improved Diffusion-based Image Colorization via Piggybacked Models

Figure 3 for Improved Diffusion-based Image Colorization via Piggybacked Models

Figure 4 for Improved Diffusion-based Image Colorization via Piggybacked Models

Abstract:Image colorization has been attracting the research interests of the community for decades. However, existing methods still struggle to provide satisfactory colorized results given grayscale images due to a lack of human-like global understanding of colors. Recently, large-scale Text-to-Image (T2I) models have been exploited to transfer the semantic information from the text prompts to the image domain, where text provides a global control for semantic objects in the image. In this work, we introduce a colorization model piggybacking on the existing powerful T2I diffusion model. Our key idea is to exploit the color prior knowledge in the pre-trained T2I diffusion model for realistic and diverse colorization. A diffusion guider is designed to incorporate the pre-trained weights of the latent diffusion model to output a latent color prior that conforms to the visual semantics of the grayscale input. A lightness-aware VQVAE will then generate the colorized result with pixel-perfect alignment to the given grayscale image. Our model can also achieve conditional colorization with additional inputs (e.g. user hints and texts). Extensive experiments show that our method achieves state-of-the-art performance in terms of perceptual quality.

* project page: https://piggyback-color.github.io/

Via

Access Paper or Ask Questions

Invertible Tone Mapping with Selectable Styles

Oct 09, 2021

Zhuming Zhang, Menghan Xia, Xueting Liu, Chengze Li, Tien-Tsin Wong

Figure 1 for Invertible Tone Mapping with Selectable Styles

Figure 2 for Invertible Tone Mapping with Selectable Styles

Figure 3 for Invertible Tone Mapping with Selectable Styles

Figure 4 for Invertible Tone Mapping with Selectable Styles

Abstract:Although digital cameras can acquire high-dynamic range (HDR) images, the captured HDR information are mostly quantized to low-dynamic range (LDR) images for display compatibility and compact storage. In this paper, we propose an invertible tone mapping method that converts the multi-exposure HDR to a true LDR (8-bit per color channel) and reserves the capability to accurately restore the original HDR from this {\em invertible LDR}. Our invertible LDR can mimic the appearance of a user-selected tone mapping style. It can be shared over any existing social network platforms that may re-encode or format-convert the uploaded images, without much hurting the accuracy of the restored HDR counterpart. To achieve this, we regard the tone mapping and the restoration as coupled processes, and formulate them as an encoding-and-decoding problem through convolutional neural networks. Particularly, our model supports pluggable style modulators, each of which bakes a specific tone mapping style, and thus favors the application flexibility. Our method is evaluated with a rich variety of HDR images and multiple tone mapping operators, which shows the superiority over relevant state-of-the-art methods. Also, we conduct ablation study to justify our design and discuss the robustness and generality toward real applications.

Via

Access Paper or Ask Questions