Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhiyi Huang

RWKV-X: A Linear Complexity Hybrid Language Model

Apr 30, 2025

Haowen Hou, Zhiyi Huang, Kaifeng Tan, Rongchang Lu, Fei Richard Yu

Abstract:In this paper, we introduce \textbf{RWKV-X}, a novel hybrid architecture that combines the efficiency of RWKV for short-range modeling with a sparse attention mechanism designed to capture long-range context. Unlike previous hybrid approaches that rely on full attention layers and retain quadratic complexity, RWKV-X achieves linear-time complexity in training and constant-time complexity in inference decoding. We demonstrate that RWKV-X, when continually pretrained on 64K-token sequences, achieves near-perfect accuracy on the 64K passkey retrieval benchmark. It consistently outperforms prior RWKV-7 models on long-context benchmarks, while maintaining strong performance on short-context tasks. These results highlight RWKV-X as a scalable and efficient backbone for general-purpose language modeling, capable of decoding sequences up to 1 million tokens with stable speed and memory usage. To facilitate further research and analysis, we have made the checkpoints and the associated code publicly accessible at: https://github.com/howard-hou/RWKV-X.

* 12 pages

Via

Access Paper or Ask Questions

Integrated Image-Text Based on Semi-supervised Learning for Small Sample Instance Segmentation

Oct 21, 2024

Ruting Chi, Zhiyi Huang, Yuexing Han

Figure 1 for Integrated Image-Text Based on Semi-supervised Learning for Small Sample Instance Segmentation

Figure 2 for Integrated Image-Text Based on Semi-supervised Learning for Small Sample Instance Segmentation

Figure 3 for Integrated Image-Text Based on Semi-supervised Learning for Small Sample Instance Segmentation

Figure 4 for Integrated Image-Text Based on Semi-supervised Learning for Small Sample Instance Segmentation

Abstract:Small sample instance segmentation is a very challenging task, and many existing methods follow the training strategy of meta-learning which pre-train models on support set and fine-tune on query set. The pre-training phase, which is highly task related, requires a significant amount of additional training time and the selection of datasets with close proximity to ensure effectiveness. The article proposes a novel small sample instance segmentation solution from the perspective of maximizing the utilization of existing information without increasing annotation burden and training costs. The proposed method designs two modules to address the problems encountered in small sample instance segmentation. First, it helps the model fully utilize unlabeled data by learning to generate pseudo labels, increasing the number of available samples. Second, by integrating the features of text and image, more accurate classification results can be obtained. These two modules are suitable for box-free and box-dependent frameworks. In the way, the proposed method not only improves the performance of small sample instance segmentation, but also greatly reduce reliance on pre-training. We have conducted experiments in three datasets from different scenes: on land, underwater and under microscope. As evidenced by our experiments, integrated image-text corrects the confidence of classification, and pseudo labels help the model obtain preciser masks. All the results demonstrate the effectiveness and superiority of our method.

Via

Access Paper or Ask Questions

Are Bounded Contracts Learnable and Approximately Optimal?

Feb 22, 2024

Yurong Chen, Zhaohua Chen, Xiaotie Deng, Zhiyi Huang

Abstract:This paper considers the hidden-action model of the principal-agent problem, in which a principal incentivizes an agent to work on a project using a contract. We investigate whether contracts with bounded payments are learnable and approximately optimal. Our main results are two learning algorithms that can find a nearly optimal bounded contract using a polynomial number of queries, under two standard assumptions in the literature: a costlier action for the agent leads to a better outcome distribution for the principal, and the agent's cost/effort has diminishing returns. Our polynomial query complexity upper bound shows that standard assumptions are sufficient for achieving an exponential improvement upon the known lower bound for general instances. Unlike the existing algorithms, which relied on discretizing the contract space, our algorithms directly learn the underlying outcome distributions. As for the approximate optimality of bounded contracts, we find that they could be far from optimal in terms of multiplicative or additive approximation, but satisfy a notion of mixed approximation.

Via

Access Paper or Ask Questions

Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants

Dec 19, 2023

Wei Chen, Zhiyi Huang, Ruichu Cai, Zhifeng Hao, Kun Zhang

Figure 1 for Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants

Figure 2 for Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants

Figure 3 for Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants

Figure 4 for Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants

Abstract:Causal discovery with latent variables is a crucial but challenging task. Despite the emergence of numerous methods aimed at addressing this challenge, they are not fully identified to the structure that two observed variables are influenced by one latent variable and there might be a directed edge in between. Interestingly, we notice that this structure can be identified through the utilization of higher-order cumulants. By leveraging the higher-order cumulants of non-Gaussian data, we provide an analytical solution for estimating the causal coefficients or their ratios. With the estimated (ratios of) causal coefficients, we propose a novel approach to identify the existence of a causal edge between two observed variables subject to latent variable influence. In case when such a causal edge exits, we introduce an asymmetry criterion to determine the causal direction. The experimental results demonstrate the effectiveness of our proposed method.

* Accepted by AAAI 2024

Via

Access Paper or Ask Questions

Causal Discovery with Latent Confounders Based on Higher-Order Cumulants

May 31, 2023

Ruichu Cai, Zhiyi Huang, Wei Chen, Zhifeng Hao, Kun Zhang

Abstract:Causal discovery with latent confounders is an important but challenging task in many scientific areas. Despite the success of some overcomplete independent component analysis (OICA) based methods in certain domains, they are computationally expensive and can easily get stuck into local optima. We notice that interestingly, by making use of higher-order cumulants, there exists a closed-form solution to OICA in specific cases, e.g., when the mixing procedure follows the One-Latent-Component structure. In light of the power of the closed-form solution to OICA corresponding to the One-Latent-Component structure, we formulate a way to estimate the mixing matrix using the higher-order cumulants, and further propose the testable One-Latent-Component condition to identify the latent variables and determine causal orders. By iteratively removing the share identified latent components, we successfully extend the results on the One-Latent-Component structure to the Multi-Latent-Component structure and finally provide a practical and asymptotically correct algorithm to learn the causal structure with latent variables. Experimental results illustrate the asymptotic correctness and effectiveness of the proposed method.

* Accepted by ICML 2023

Via

Access Paper or Ask Questions

E2CoPre: Energy Efficient and Cooperative Collision Avoidance for UAV Swarms with Trajectory Prediction

Mar 11, 2023

Shuangyao Huang, Haibo Zhang, Zhiyi Huang

Abstract:This paper addresses the collision avoidance problem of UAV swarms in three-dimensional (3D) space. The key challenges are energy efficiency and cooperation of swarm members. We propose to combine Artificial Potential Field (APF) with Particle Swarm Planning (PSO). APF provides environmental awareness and implicit coordination to UAVs. PSO searches for the optimal trajectories for each UAV in terms of safety and energy efficiency by minimizing a fitness function. The fitness function exploits the advantages of the Active Contour Model in image processing for trajectory planning. Lastly, vehicle-to-vehicle collisions are detected in advance based on trajectory prediction and are resolved by cooperatively adjusting the altitude of UAVs. Simulation results demonstrate that our method can save up to 80\% of energy compared to state-of-the-art schemes.

Via

Access Paper or Ask Questions

WRHT: Efficient All-reduce for Distributed DNN Training in Optical Interconnect System

Jul 22, 2022

Fei Dai, Yawen Chen, Zhiyi Huang, Haibo Zhang, Fangfang Zhang

Figure 1 for WRHT: Efficient All-reduce for Distributed DNN Training in Optical Interconnect System

Figure 2 for WRHT: Efficient All-reduce for Distributed DNN Training in Optical Interconnect System

Figure 3 for WRHT: Efficient All-reduce for Distributed DNN Training in Optical Interconnect System

Figure 4 for WRHT: Efficient All-reduce for Distributed DNN Training in Optical Interconnect System

Abstract:Communication efficiency plays an important role in accelerating the distributed training of Deep Neural Networks (DNN). All-reduce is the key communication primitive to reduce model parameters in distributed DNN training. Most existing all-reduce algorithms are designed for traditional electrical interconnect systems, which cannot meet the communication requirements for distributed training of large DNNs. One of the promising alternatives for electrical interconnect is optical interconnect, which can provide high bandwidth, low transmission delay, and low power cost. We propose an efficient scheme called WRHT (Wavelength Reused Hierarchical Tree) for implementing all-reduce operation in optical interconnect system, which can take advantage of WDM (Wavelength Division Multiplexing) to reduce the communication time of distributed data-parallel DNN training. We further derive the minimum number of communication steps and communication time to realize the all-reduce using WRHT. Simulation results show that the communication time of WRHT is reduced by 75.59%, 49.25%, and 70.1% respectively compared with three traditional all-reduce algorithms simulated in optical interconnect system. Simulation results also show that WRHT can reduce the communication time for all-reduce operation by 86.69% and 84.71% in comparison with two existing all-reduce algorithms in electrical interconnect system.

* This paper is under the submission of GLOBECOM 2022

Via

Access Paper or Ask Questions

Multi-UAV Collision Avoidance using Multi-Agent Reinforcement Learning with Counterfactual Credit Assignment

Apr 19, 2022

Shuangyao Huang, Haibo Zhang, Zhiyi Huang

Figure 1 for Multi-UAV Collision Avoidance using Multi-Agent Reinforcement Learning with Counterfactual Credit Assignment

Figure 2 for Multi-UAV Collision Avoidance using Multi-Agent Reinforcement Learning with Counterfactual Credit Assignment

Figure 3 for Multi-UAV Collision Avoidance using Multi-Agent Reinforcement Learning with Counterfactual Credit Assignment

Figure 4 for Multi-UAV Collision Avoidance using Multi-Agent Reinforcement Learning with Counterfactual Credit Assignment

Abstract:Multi-UAV collision avoidance is a challenging task for UAV swarm applications due to the need of tight cooperation among swarm members for collision-free path planning. Centralized Training with Decentralized Execution (CTDE) in Multi-Agent Reinforcement Learning is a promising method for multi-UAV collision avoidance, in which the key challenge is to effectively learn decentralized policies that can maximize a global reward cooperatively. We propose a new multi-agent critic-actor learning scheme called MACA for UAV swarm collision avoidance. MACA uses a centralized critic to maximize the discounted global reward that considers both safety and energy efficiency, and an actor per UAV to find decentralized policies to avoid collisions. To solve the credit assignment problem in CTDE, we design a counterfactual baseline that marginalizes both an agent's state and action, enabling to evaluate the importance of an agent in the joint observation-action space. To train and evaluate MACA, we design our own simulation environment MACAEnv to closely mimic the realistic behaviors of a UAV swarm. Simulation results show that MACA achieves more than 16% higher average reward than two state-of-the-art MARL algorithms and reduces failure rate by 90% and response time by over 99% compared to a conventional UAV swarm collision avoidance algorithm in all test scenarios.

Via

Access Paper or Ask Questions

Adversarial Deep Learning for Online Resource Allocation

Nov 19, 2021

Bingqian Du, Zhiyi Huang, Chuan Wu

Figure 1 for Adversarial Deep Learning for Online Resource Allocation

Figure 2 for Adversarial Deep Learning for Online Resource Allocation

Figure 3 for Adversarial Deep Learning for Online Resource Allocation

Figure 4 for Adversarial Deep Learning for Online Resource Allocation

Abstract:Online algorithm is an important branch in algorithm design. Designing online algorithms with a bounded competitive ratio (in terms of worst-case performance) can be hard and usually relies on problem-specific assumptions. Inspired by adversarial training from Generative Adversarial Net (GAN) and the fact that competitive ratio of an online algorithm is based on worst-case input, we adopt deep neural networks to learn an online algorithm for a resource allocation and pricing problem from scratch, with the goal that the performance gap between offline optimum and the learned online algorithm can be minimized for worst-case input. Specifically, we leverage two neural networks as algorithm and adversary respectively and let them play a zero sum game, with the adversary being responsible for generating worst-case input while the algorithm learns the best strategy based on the input provided by the adversary. To ensure better convergence of the algorithm network (to the desired online algorithm), we propose a novel per-round update method to handle sequential decision making to break complex dependency among different rounds so that update can be done for every possible action, instead of only sampled actions. To the best of our knowledge, our work is the first using deep neural networks to design an online algorithm from the perspective of worst-case performance guarantee. Empirical studies show that our updating methods ensure convergence to Nash equilibrium and the learned algorithm outperforms state-of-the-art online algorithms under various settings.

Via

Access Paper or Ask Questions

Accelerating Fully Connected Neural Network on Optical Network-on-Chip (ONoC)

Sep 30, 2021

Fei Dai, Yawen Chen, Haibo Zhang, Zhiyi Huang

Figure 1 for Accelerating Fully Connected Neural Network on Optical Network-on-Chip (ONoC)

Figure 2 for Accelerating Fully Connected Neural Network on Optical Network-on-Chip (ONoC)

Figure 3 for Accelerating Fully Connected Neural Network on Optical Network-on-Chip (ONoC)

Figure 4 for Accelerating Fully Connected Neural Network on Optical Network-on-Chip (ONoC)

Abstract:Fully Connected Neural Network (FCNN) is a class of Artificial Neural Networks widely used in computer science and engineering, whereas the training process can take a long time with large datasets in existing many-core systems. Optical Network-on-Chip (ONoC), an emerging chip-scale optical interconnection technology, has great potential to accelerate the training of FCNN with low transmission delay, low power consumption, and high throughput. However, existing methods based on Electrical Network-on-Chip (ENoC) cannot fit in ONoC because of the unique properties of ONoC. In this paper, we propose a fine-grained parallel computing model for accelerating FCNN training on ONoC and derive the optimal number of cores for each execution stage with the objective of minimizing the total amount of time to complete one epoch of FCNN training. To allocate the optimal number of cores for each execution stage, we present three mapping strategies and compare their advantages and disadvantages in terms of hotspot level, memory requirement, and state transitions. Simulation results show that the average prediction error for the optimal number of cores in NN benchmarks is within 2.3%. We further carry out extensive simulations which demonstrate that FCNN training time can be reduced by 22.28% and 4.91% on average using our proposed scheme, compared with traditional parallel computing methods that either allocate a fixed number of cores or allocate as many cores as possible, respectively. Compared with ENoC, simulation results show that under batch sizes of 64 and 128, on average ONoC can achieve 21.02% and 12.95% on reducing training time with 47.85% and 39.27% on saving energy, respectively.

* 14 pages, 10 figures. This paper is under the second review of IEEE Transactions of Computers

Via

Access Paper or Ask Questions