Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ruiyu Li

Automatic Image Colorization with Convolutional Neural Networks and Generative Adversarial Networks

Aug 07, 2025

Ruiyu Li, Changyuan Qiu, Hangrui Cao, Qihan Ren, Yuqing Qiu

Abstract:Image colorization, the task of adding colors to grayscale images, has been the focus of significant research efforts in computer vision in recent years for its various application areas such as color restoration and automatic animation colorization [15, 1]. The colorization problem is challenging as it is highly ill-posed with two out of three image dimensions lost, resulting in large degrees of freedom. However, semantics of the scene as well as the surface texture could provide important cues for colors: the sky is typically blue, the clouds are typically white and the grass is typically green, and there are huge amounts of training data available for learning such priors since any colored image could serve as a training data point [20]. Colorization is initially formulated as a regression task[5], which ignores the multi-modal nature of color prediction. In this project, we explore automatic image colorization via classification and adversarial learning. We will build our models on prior works, apply modifications for our specific scenario and make comparisons.

* 5 pages, 4 figures

Via

Access Paper or Ask Questions

Online Parallel Multi-Task Relationship Learning via Alternating Direction Method of Multipliers

Nov 09, 2024

Ruiyu Li, Peilin Zhao, Guangxia Li, Zhiqiang Xu, Xuewei Li

Figure 1 for Online Parallel Multi-Task Relationship Learning via Alternating Direction Method of Multipliers

Figure 2 for Online Parallel Multi-Task Relationship Learning via Alternating Direction Method of Multipliers

Figure 3 for Online Parallel Multi-Task Relationship Learning via Alternating Direction Method of Multipliers

Figure 4 for Online Parallel Multi-Task Relationship Learning via Alternating Direction Method of Multipliers

Abstract:Online multi-task learning (OMTL) enhances streaming data processing by leveraging the inherent relations among multiple tasks. It can be described as an optimization problem in which a single loss function is defined for multiple tasks. Existing gradient-descent-based methods for this problem might suffer from gradient vanishing and poor conditioning issues. Furthermore, the centralized setting hinders their application to online parallel optimization, which is vital to big data analytics. Therefore, this study proposes a novel OMTL framework based on the alternating direction multiplier method (ADMM), a recent breakthrough in optimization suitable for the distributed computing environment because of its decomposable and easy-to-implement nature. The relations among multiple tasks are modeled dynamically to fit the constant changes in an online scenario. In a classical distributed computing architecture with a central server, the proposed OMTL algorithm with the ADMM optimizer outperforms SGD-based approaches in terms of accuracy and efficiency. Because the central server might become a bottleneck when the data scale grows, we further tailor the algorithm to a decentralized setting, so that each node can work by only exchanging information with local neighbors. Experimental results on a synthetic and several real-world datasets demonstrate the efficiency of our methods.

* Accpeted by Neurocomputing

Via

Access Paper or Ask Questions

CCUP: A Controllable Synthetic Data Generation Pipeline for Pretraining Cloth-Changing Person Re-Identification Models

Oct 17, 2024

Yujian Zhao, Chengru Wu, Yinong Xu, Xuanzheng Du, Ruiyu Li, Guanglin Niu

Figure 1 for CCUP: A Controllable Synthetic Data Generation Pipeline for Pretraining Cloth-Changing Person Re-Identification Models

Figure 2 for CCUP: A Controllable Synthetic Data Generation Pipeline for Pretraining Cloth-Changing Person Re-Identification Models

Figure 3 for CCUP: A Controllable Synthetic Data Generation Pipeline for Pretraining Cloth-Changing Person Re-Identification Models

Figure 4 for CCUP: A Controllable Synthetic Data Generation Pipeline for Pretraining Cloth-Changing Person Re-Identification Models

Abstract:Cloth-changing person re-identification (CC-ReID), also known as Long-Term Person Re-Identification (LT-ReID) is a critical and challenging research topic in computer vision that has recently garnered significant attention. However, due to the high cost of constructing CC-ReID data, the existing data-driven models are hard to train efficiently on limited data, causing overfitting issue. To address this challenge, we propose a low-cost and efficient pipeline for generating controllable and high-quality synthetic data simulating the surveillance of real scenarios specific to the CC-ReID task. Particularly, we construct a new self-annotated CC-ReID dataset named Cloth-Changing Unreal Person (CCUP), containing 6,000 IDs, 1,179,976 images, 100 cameras, and 26.5 outfits per individual. Based on this large-scale dataset, we introduce an effective and scalable pretrain-finetune framework for enhancing the generalization capabilities of the traditional CC-ReID models. The extensive experiments demonstrate that two typical models namely TransReID and FIRe^2, when integrated into our framework, outperform other state-of-the-art models after pretraining on CCUP and finetuning on the benchmarks such as PRCC, VC-Clothes and NKUP. The CCUP is available at: https://github.com/yjzhao1019/CCUP.

Via

Access Paper or Ask Questions

Explainable Action Advising for Multi-Agent Reinforcement Learning

Nov 15, 2022

Yue Guo, Joseph Campbell, Simon Stepputtis, Ruiyu Li, Dana Hughes, Fei Fang, Katia Sycara

Figure 1 for Explainable Action Advising for Multi-Agent Reinforcement Learning

Figure 2 for Explainable Action Advising for Multi-Agent Reinforcement Learning

Figure 3 for Explainable Action Advising for Multi-Agent Reinforcement Learning

Figure 4 for Explainable Action Advising for Multi-Agent Reinforcement Learning

Abstract:Action advising is a knowledge transfer technique for reinforcement learning based on the teacher-student paradigm. An expert teacher provides advice to a student during training in order to improve the student's sample efficiency and policy performance. Such advice is commonly given in the form of state-action pairs. However, it makes it difficult for the student to reason with and apply to novel states. We introduce Explainable Action Advising, in which the teacher provides action advice as well as associated explanations indicating why the action was chosen. This allows the student to self-reflect on what it has learned, enabling advice generalization and leading to improved sample efficiency and learning performance - even in environments where the teacher is sub-optimal. We empirically show that our framework is effective in both single-agent and multi-agent scenarios, yielding improved policy returns and convergence rates when compared to state-of-the-art methods.

Via

Access Paper or Ask Questions

Convolutional Neural Network Dynamics: A Graph Perspective

Nov 09, 2021

Fatemeh Vahedian, Ruiyu Li, Puja Trivedi, Di Jin, Danai Koutra

Figure 1 for Convolutional Neural Network Dynamics: A Graph Perspective

Figure 2 for Convolutional Neural Network Dynamics: A Graph Perspective

Figure 3 for Convolutional Neural Network Dynamics: A Graph Perspective

Figure 4 for Convolutional Neural Network Dynamics: A Graph Perspective

Abstract:The success of neural networks (NNs) in a wide range of applications has led to increased interest in understanding the underlying learning dynamics of these models. In this paper, we go beyond mere descriptions of the learning dynamics by taking a graph perspective and investigating the relationship between the graph structure of NNs and their performance. Specifically, we propose (1) representing the neural network learning process as a time-evolving graph (i.e., a series of static graph snapshots over epochs), (2) capturing the structural changes of the NN during the training phase in a simple temporal summary, and (3) leveraging the structural summary to predict the accuracy of the underlying NN in a classification or regression task. For the dynamic graph representation of NNs, we explore structural representations for fully-connected and convolutional layers, which are key components of powerful NN models. Our analysis shows that a simple summary of graph statistics, such as weighted degree and eigenvector centrality, over just a few epochs can be used to accurately predict the performance of NNs. For example, a weighted degree-based summary of the time-evolving graph that is constructed based on 5 training epochs of the LeNet architecture achieves classification accuracy of over 93%. Our findings are consistent for different NN architectures, including LeNet, VGG, AlexNet and ResNet.

Via

Access Paper or Ask Questions

Prior Guided Feature Enrichment Network for Few-Shot Segmentation

Aug 04, 2020

Zhuotao Tian, Hengshuang Zhao, Michelle Shu, Zhicheng Yang, Ruiyu Li, Jiaya Jia

Figure 1 for Prior Guided Feature Enrichment Network for Few-Shot Segmentation

Figure 2 for Prior Guided Feature Enrichment Network for Few-Shot Segmentation

Figure 3 for Prior Guided Feature Enrichment Network for Few-Shot Segmentation

Figure 4 for Prior Guided Feature Enrichment Network for Few-Shot Segmentation

Abstract:State-of-the-art semantic segmentation methods require sufficient labeled data to achieve good results and hardly work on unseen classes without fine-tuning. Few-shot segmentation is thus proposed to tackle this problem by learning a model that quickly adapts to new classes with a few labeled support samples. Theses frameworks still face the challenge of generalization ability reduction on unseen classes due to inappropriate use of high-level semantic information of training classes and spatial inconsistency between query and support targets. To alleviate these issues, we propose the Prior Guided Feature Enrichment Network (PFENet). It consists of novel designs of (1) a training-free prior mask generation method that not only retains generalization power but also improves model performance and (2) Feature Enrichment Module (FEM) that overcomes spatial inconsistency by adaptively enriching query features with support features and prior masks. Extensive experiments on PASCAL-5$^i$ and COCO prove that the proposed prior generation method and FEM both improve the baseline method significantly. Our PFENet also outperforms state-of-the-art methods by a large margin without efficiency loss. It is surprising that our model even generalizes to cases without labeled support samples. Our code is available at https://github.com/Jia-Research-Lab/PFENet/.

* 16 pages. To appear in TPAMI

Via

Access Paper or Ask Questions

Tensor Low-Rank Reconstruction for Semantic Segmentation

Aug 02, 2020

Wanli Chen, Xinge Zhu, Ruoqi Sun, Junjun He, Ruiyu Li, Xiaoyong Shen, Bei Yu

Figure 1 for Tensor Low-Rank Reconstruction for Semantic Segmentation

Figure 2 for Tensor Low-Rank Reconstruction for Semantic Segmentation

Figure 3 for Tensor Low-Rank Reconstruction for Semantic Segmentation

Figure 4 for Tensor Low-Rank Reconstruction for Semantic Segmentation

Abstract:Context information plays an indispensable role in the success of semantic segmentation. Recently, non-local self-attention based methods are proved to be effective for context information collection. Since the desired context consists of spatial-wise and channel-wise attentions, 3D representation is an appropriate formulation. However, these non-local methods describe 3D context information based on a 2D similarity matrix, where space compression may lead to channel-wise attention missing. An alternative is to model the contextual information directly without compression. However, this effort confronts a fundamental difficulty, namely the high-rank property of context information. In this paper, we propose a new approach to model the 3D context representations, which not only avoids the space compression but also tackles the high-rank difficulty. Here, inspired by tensor canonical-polyadic decomposition theory (i.e, a high-rank tensor can be expressed as a combination of rank-1 tensors.), we design a low-rank-to-high-rank context reconstruction framework (i.e, RecoNet). Specifically, we first introduce the tensor generation module (TGM), which generates a number of rank-1 tensors to capture fragments of context feature. Then we use these rank-1 tensors to recover the high-rank context features through our proposed tensor reconstruction module (TRM). Extensive experiments show that our method achieves state-of-the-art on various public datasets. Additionally, our proposed method has more than 100 times less computational cost compared with conventional non-local-based methods.

* ECCV2020. Top-1 performance on PASCAL-VOC12; Source code at https://github.com/CWanli/RecoNet.git

Via

Access Paper or Ask Questions

An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation

Dec 18, 2019

Jihan Yang, Ruijia Xu, Ruiyu Li, Xiaojuan Qi, Xiaoyong Shen, Guanbin Li, Liang Lin

Figure 1 for An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation

Figure 2 for An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation

Figure 3 for An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation

Figure 4 for An Adversarial Perturbation Oriented Domain Adaptation Approach for Semantic Segmentation

Abstract:We focus on Unsupervised Domain Adaptation (UDA) for the task of semantic segmentation. Recently, adversarial alignment has been widely adopted to match the marginal distribution of feature representations across two domains globally. However, this strategy fails in adapting the representations of the tail classes or small objects for semantic segmentation since the alignment objective is dominated by head categories or large objects. In contrast to adversarial alignment, we propose to explicitly train a domain-invariant classifier by generating and defensing against pointwise feature space adversarial perturbations. Specifically, we firstly perturb the intermediate feature maps with several attack objectives (i.e., discriminator and classifier) on each individual position for both domains, and then the classifier is trained to be invariant to the perturbations. By perturbing each position individually, our model treats each location evenly regardless of the category or object size and thus circumvents the aforementioned issue. Moreover, the domain gap in feature space is reduced by extrapolating source and target perturbed features towards each other with attack on the domain discriminator. Our approach achieves the state-of-the-art performance on two challenging domain adaptation tasks for semantic segmentation: GTA5 -> Cityscapes and SYNTHIA -> Cityscapes.

* To Appear in AAAI2020

Via

Access Paper or Ask Questions

Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating

Oct 22, 2019

Xin Yao, Tianchi Huang, Rui-Xiao Zhang, Ruiyu Li, Lifeng Sun

Figure 1 for Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating

Figure 2 for Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating

Figure 3 for Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating

Figure 4 for Federated Learning with Unbiased Gradient Aggregation and Controllable Meta Updating

Abstract:Federated Averaging (FedAvg) serves as the fundamental framework in Federated Learning (FL) settings. However, we argue that 1) the multiple steps of local updating will result in gradient biases and 2) there is an inconsistency between the target distribution and the optimization objectives following the training paradigm in FedAvg. To tackle these problems, we first propose an unbiased gradient aggregation algorithm with the keep-trace gradient descent and gradient evaluation strategy. Then we introduce a meta updating procedure with a controllable meta training set to provide a clear and consistent optimization objective. Experimental results demonstrate that the proposed methods outperform compared ones with various network architectures in both the IID and non-IID FL settings.

* This manuscript has been accepted to the Workshop on Federated Learning for Data Privacy and Confidentiality (FL - NeurIPS 2019, in Conjunction with NeurIPS 2019)

Via

Access Paper or Ask Questions

Reflective Decoding Network for Image Captioning

Aug 30, 2019

Lei Ke, Wenjie Pei, Ruiyu Li, Xiaoyong Shen, Yu-Wing Tai

Figure 1 for Reflective Decoding Network for Image Captioning

Figure 2 for Reflective Decoding Network for Image Captioning

Figure 3 for Reflective Decoding Network for Image Captioning

Figure 4 for Reflective Decoding Network for Image Captioning

Abstract:State-of-the-art image captioning methods mostly focus on improving visual features, less attention has been paid to utilizing the inherent properties of language to boost captioning performance. In this paper, we show that vocabulary coherence between words and syntactic paradigm of sentences are also important to generate high-quality image caption. Following the conventional encoder-decoder framework, we propose the Reflective Decoding Network (RDN) for image captioning, which enhances both the long-sequence dependency and position perception of words in a caption decoder. Our model learns to collaboratively attend on both visual and textual features and meanwhile perceive each word's relative position in the sentence to maximize the information delivered in the generated caption. We evaluate the effectiveness of our RDN on the COCO image captioning datasets and achieve superior performance over the previous methods. Further experiments reveal that our approach is particularly advantageous for hard cases with complex scenes to describe by captions.

* ICCV 2019

Via

Access Paper or Ask Questions