Abstract:Training Large Language Models (LLMs) for reasoning tasks is increasingly driven by Reinforcement Learning with Verifiable Rewards (RLVR), where Proximal Policy Optimization (PPO) provides a principled framework for stable policy updates. However, the practical application of PPO is hindered by unreliable advantage estimation in the sparse-reward RLVR regime. This issue arises because the sparse rewards in RLVR lead to inaccurate intermediate value predictions, which in turn introduce significant bias when aggregated at every token by Generalized Advantage Estimation (GAE). To address this, we introduce Segmental Advantage Estimation (SAE), which mitigates the bias that GAE can incur in RLVR. Our key insight is that aggregating $n$-step advantages at every token(as in GAE) is unnecessary and often introduces excessive bias, since individual tokens carry minimal information. Instead, SAE first partitions the generated sequence into coherent sub-segments using low-probability tokens as heuristic boundaries. It then selectively computes variance-reduced advantage estimates only from these information-rich segment transitions, effectively filtering out noise from intermediate tokens. Our experiments demonstrate that SAE achieves superior performance, with marked improvements in final scores, training stability, and sample efficiency. These gains are shown to be consistent across multiple model sizes, and a correlation analysis confirms that our proposed advantage estimator achieves a higher correlation with an approximate ground-truth advantage, justifying its superior performance.
Abstract:We introduce, for the first time, a cohomology-based Gromov-Hausdorff ultrametric method to analyze 1-dimensional and higher-dimensional (co)homology groups, focusing on loops, voids, and higher-dimensional cavity structures in simplicial complexes, to address typical clustering questions arising in molecular data analysis. The Gromov-Hausdorff distance quantifies the dissimilarity between two metric spaces. In this framework, molecules are represented as simplicial complexes, and their cohomology vector spaces are computed to capture intrinsic topological invariants encoding loop and cavity structures. These vector spaces are equipped with a suitable distance measure, enabling the computation of the Gromov-Hausdorff ultrametric to evaluate structural dissimilarities. We demonstrate the methodology using organic-inorganic halide perovskite (OIHP) structures. The results highlight the effectiveness of this approach in clustering various molecular structures. By incorporating geometric information, our method provides deeper insights compared to traditional persistent homology techniques.
Abstract:Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction. Our Top-ML employs peptide topological features derived from its sequence "connection" information characterized by vector and spectral descriptors. Our Top-ML model has been validated on two widely used AntiCP 2.0 benchmark datasets and has achieved state-of-the-art performance. Our results highlight the potential of leveraging novel topology-based featurization to accelerate the identification of anticancer peptides.