Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Leyi Zhao

Large Language Model-Powered Evolutionary Code Optimization on a Phylogenetic Tree

Jan 20, 2026

Leyi Zhao, Weijie Huang, Yitong Guo, Jiang Bian, Chenghong Wang, Xuhong Zhang

Abstract:Optimizing scientific computing algorithms for modern GPUs is a labor-intensive and iterative process involving repeated code modification, benchmarking, and tuning across complex hardware and software stacks. Recent work has explored large language model (LLM)-assisted evolutionary methods for automated code optimization, but these approaches primarily rely on outcome-based selection and random mutation, underutilizing the rich trajectory information generated during iterative optimization. We propose PhyloEvolve, an LLM-agent system that reframes GPU-oriented algorithm optimization as an In-Context Reinforcement Learning (ICRL) problem. This formulation enables trajectory-conditioned reuse of optimization experience without model retraining. PhyloEvolve integrates Algorithm Distillation and prompt-based Decision Transformers into an iterative workflow, treating sequences of algorithm modifications and performance feedback as first-class learning signals. To organize optimization history, we introduce a phylogenetic tree representation that captures inheritance, divergence, and recombination among algorithm variants, enabling backtracking, cross-lineage transfer, and reproducibility. The system combines elite trajectory pooling, multi-island parallel exploration, and containerized execution to balance exploration and exploitation across heterogeneous hardware. We evaluate PhyloEvolve on scientific computing workloads including PDE solvers, manifold learning, and spectral graph algorithms, demonstrating consistent improvements in runtime, memory efficiency, and correctness over baseline and evolutionary methods. Code is published at: https://github.com/annihi1ation/phylo_evolve

Via

Access Paper or Ask Questions

AutoSurvey2: Empowering Researchers with Next Level Automated Literature Surveys

Oct 29, 2025

Siyi Wu, Chiaxin Liang, Ziqian Bi, Leyi Zhao, Tianyang Wang, Junhao Song, Yichao Zhang, Keyu Chen, Xinyuan Song

Abstract:The rapid growth of research literature, particularly in large language models (LLMs), has made producing comprehensive and current survey papers increasingly difficult. This paper introduces autosurvey2, a multi-stage pipeline that automates survey generation through retrieval-augmented synthesis and structured evaluation. The system integrates parallel section generation, iterative refinement, and real-time retrieval of recent publications to ensure both topical completeness and factual accuracy. Quality is assessed using a multi-LLM evaluation framework that measures coverage, structure, and relevance in alignment with expert review standards. Experimental results demonstrate that autosurvey2 consistently outperforms existing retrieval-based and automated baselines, achieving higher scores in structural coherence and topical relevance while maintaining strong citation fidelity. By combining retrieval, reasoning, and automated evaluation into a unified framework, autosurvey2 provides a scalable and reproducible solution for generating long-form academic surveys and contributes a solid foundation for future research on automated scholarly writing. All code and resources are available at https://github.com/annihi1ation/auto_research.

* TKDD 2025

Via

Access Paper or Ask Questions

CopyQNN: Quantum Neural Network Extraction Attack under Varying Quantum Noise

Apr 01, 2025

Zhenxiao Fu, Leyi Zhao, Xuhong Zhang, Yilun Xu, Gang Huang, Fan Chen

Abstract:Quantum Neural Networks (QNNs) have shown significant value across domains, with well-trained QNNs representing critical intellectual property often deployed via cloud-based QNN-as-a-Service (QNNaaS) platforms. Recent work has examined QNN model extraction attacks using classical and emerging quantum strategies. These attacks involve adversaries querying QNNaaS platforms to obtain labeled data for training local substitute QNNs that replicate the functionality of cloud-based models. However, existing approaches have largely overlooked the impact of varying quantum noise inherent in noisy intermediate-scale quantum (NISQ) computers, limiting their effectiveness in real-world settings. To address this limitation, we propose the CopyQNN framework, which employs a three-step data cleaning method to eliminate noisy data based on its noise sensitivity. This is followed by the integration of contrastive and transfer learning within the quantum domain, enabling efficient training of substitute QNNs using a limited but cleaned set of queried data. Experimental results on NISQ computers demonstrate that a practical implementation of CopyQNN significantly outperforms state-of-the-art QNN extraction attacks, achieving an average performance improvement of 8.73% across all tasks while reducing the number of required queries by 90x, with only a modest increase in hardware overhead.

Via

Access Paper or Ask Questions

SDF-TopoNet: A Two-Stage Framework for Tubular Structure Segmentation via SDF Pre-training and Topology-Aware Fine-Tuning

Mar 20, 2025

Siyi Wu, Leyi Zhao, Haotian Ma, Xinyuan Song

Figure 1 for SDF-TopoNet: A Two-Stage Framework for Tubular Structure Segmentation via SDF Pre-training and Topology-Aware Fine-Tuning

Figure 2 for SDF-TopoNet: A Two-Stage Framework for Tubular Structure Segmentation via SDF Pre-training and Topology-Aware Fine-Tuning

Figure 3 for SDF-TopoNet: A Two-Stage Framework for Tubular Structure Segmentation via SDF Pre-training and Topology-Aware Fine-Tuning

Figure 4 for SDF-TopoNet: A Two-Stage Framework for Tubular Structure Segmentation via SDF Pre-training and Topology-Aware Fine-Tuning

Abstract:Accurate segmentation of tubular and curvilinear structures, such as blood vessels, neurons, and road networks, is crucial in various applications. A key challenge is ensuring topological correctness while maintaining computational efficiency. Existing approaches often employ topological loss functions based on persistent homology, such as Betti error, to enforce structural consistency. However, these methods suffer from high computational costs and are insensitive to pixel-level accuracy, often requiring additional loss terms like Dice or MSE to compensate. To address these limitations, we propose \textbf{SDF-TopoNet}, an improved topology-aware segmentation framework that enhances both segmentation accuracy and training efficiency. Our approach introduces a novel two-stage training strategy. In the pre-training phase, we utilize the signed distance function (SDF) as an auxiliary learning target, allowing the model to encode topological information without directly relying on computationally expensive topological loss functions. In the fine-tuning phase, we incorporate a dynamic adapter alongside a refined topological loss to ensure topological correctness while mitigating overfitting and computational overhead. We evaluate our method on five benchmark datasets. Experimental results demonstrate that SDF-TopoNet outperforms existing methods in both topological accuracy and quantitative segmentation metrics, while significantly reducing training complexity.

Via

Access Paper or Ask Questions

Spectrograms Are Sequences of Patches

Oct 28, 2022

Leyi Zhao, Yi Li

Abstract:Self-supervised pre-training models have been used successfully in several machine learning domains. However, only a tiny amount of work is related to music. In our work, we treat a spectrogram of music as a series of patches and design a self-supervised model that captures the features of these sequential patches: Patchifier, which makes good use of self-supervised learning methods from both NLP and CV domains. We do not use labeled data for the pre-training process, only a subset of the MTAT dataset containing 16k music clips. After pre-training, we apply the model to several downstream tasks. Our model achieves a considerably acceptable result compared to other audio representation models. Meanwhile, our work demonstrates that it makes sense to consider audio as a series of patch segments.

Via

Access Paper or Ask Questions