Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ye Wei

Machine learning enhanced atom probe tomography analysis: a snapshot review

Apr 19, 2025

Yue Li, Ye Wei, Alaukik Saxena, Markus Kühbach, Christoph Freysoldt, Baptiste Gault

Abstract:Atom probe tomography (APT) is a burgeoning characterization technique that provides compositional mapping of materials in three-dimensions at near-atomic scale. Since its significant expansion in the past 30 years, we estimate that one million APT datasets have been collected, each containing millions to billions of individual ions. Their analysis and the extraction of microstructural information has largely relied upon individual users whose varied level of expertise causes clear and documented bias. Current practices hinder efficient data processing, and make challenging standardization and the deployment of data analysis workflows that would be compliant with FAIR data principles. Over the past decade, building upon the long-standing expertise of the APT community in the development of advanced data processing or data mining techniques, there has been a surge of novel machine learning (ML) approaches aiming for user-independence, and that are efficient, reproducible, and robust from a statistics perspective. Here, we provide a snapshot review of this rapidly evolving field. We begin with a brief introduction to APT and the nature of the APT data. This is followed by an overview of relevant ML algorithms and a comprehensive review of their applications to APT. We also discuss how ML can enable discoveries beyond human capability, offering new insights into the mechanisms within materials. Finally, we provide guidance for future directions in this domain.

Via

Access Paper or Ask Questions

From Understanding to Excelling: Template-Free Algorithm Design through Structural-Functional Co-Evolution

Mar 13, 2025

Zhe Zhao, Haibin Wen, Pengkun Wang, Ye Wei, Zaixi Zhang, Xi Lin, Fei Liu, Bo An, Hui Xiong, Yang Wang(+1 more)

Abstract:Large language models (LLMs) have greatly accelerated the automation of algorithm generation and optimization. However, current methods such as EoH and FunSearch mainly rely on predefined templates and expert-specified functions that focus solely on the local evolution of key functionalities. Consequently, they fail to fully leverage the synergistic benefits of the overall architecture and the potential of global optimization. In this paper, we introduce an end-to-end algorithm generation and optimization framework based on LLMs. Our approach utilizes the deep semantic understanding of LLMs to convert natural language requirements or human-authored papers into code solutions, and employs a two-dimensional co-evolution strategy to optimize both functional and structural aspects. This closed-loop process spans problem analysis, code generation, and global optimization, automatically identifying key algorithm modules for multi-level joint optimization and continually enhancing performance and design innovation. Extensive experiments demonstrate that our method outperforms traditional local optimization approaches in both performance and innovation, while also exhibiting strong adaptability to unknown environments and breakthrough potential in structural design. By building on human research, our framework generates and optimizes novel algorithms that surpass those designed by human experts, broadening the applicability of LLMs for algorithm design and providing a novel solution pathway for automated algorithm development.

Via

Access Paper or Ask Questions

MotionPCM: Real-Time Motion Synthesis with Phased Consistency Model

Jan 31, 2025

Lei Jiang, Ye Wei, Hao Ni

Abstract:Diffusion models have become a popular choice for human motion synthesis due to their powerful generative capabilities. However, their high computational complexity and large sampling steps pose challenges for real-time applications. Fortunately, the Consistency Model (CM) provides a solution to greatly reduce the number of sampling steps from hundreds to a few, typically fewer than four, significantly accelerating the synthesis of diffusion models. However, its application to text-conditioned human motion synthesis in latent space remains challenging. In this paper, we introduce \textbf{MotionPCM}, a phased consistency model-based approach designed to improve the quality and efficiency of real-time motion synthesis in latent space.

Via

Access Paper or Ask Questions

Derivative-free tree optimization for complex systems

Apr 05, 2024

Ye Wei, Bo Peng, Ruiwen Xie, Yangtao Chen, Yu Qin, Peng Wen, Stefan Bauer, Po-Yen Tung

Figure 1 for Derivative-free tree optimization for complex systems

Figure 2 for Derivative-free tree optimization for complex systems

Figure 3 for Derivative-free tree optimization for complex systems

Figure 4 for Derivative-free tree optimization for complex systems

Abstract:A tremendous range of design tasks in materials, physics, and biology can be formulated as finding the optimum of an objective function depending on many parameters without knowing its closed-form expression or the derivative. Traditional derivative-free optimization techniques often rely on strong assumptions about objective functions, thereby failing at optimizing non-convex systems beyond 100 dimensions. Here, we present a tree search method for derivative-free optimization that enables accelerated optimal design of high-dimensional complex systems. Specifically, we introduce stochastic tree expansion, dynamic upper confidence bound, and short-range backpropagation mechanism to evade local optimum, iteratively approximating the global optimum using machine learning models. This development effectively confronts the dimensionally challenging problems, achieving convergence to global optima across various benchmark functions up to 2,000 dimensions, surpassing the existing methods by 10- to 20-fold. Our method demonstrates wide applicability to a wide range of real-world complex systems spanning materials, physics, and biology, considerably outperforming state-of-the-art algorithms. This enables efficient autonomous knowledge discovery and facilitates self-driving virtual laboratories. Although we focus on problems within the realm of natural science, the advancements in optimization techniques achieved herein are applicable to a broader spectrum of challenges across all quantitative disciplines.

* 39 pages, 3 figures

Via

Access Paper or Ask Questions

Traffic4cast at NeurIPS 2022 -- Predict Dynamics along Graph Edges from Sparse Node Data: Whole City Traffic and ETA from Stationary Vehicle Detectors

Mar 14, 2023

Moritz Neun, Christian Eichenberger, Henry Martin, Markus Spanring, Rahul Siripurapu, Daniel Springer, Leyan Deng, Chenwang Wu, Defu Lian, Min Zhou(+20 more)

Figure 1 for Traffic4cast at NeurIPS 2022 -- Predict Dynamics along Graph Edges from Sparse Node Data: Whole City Traffic and ETA from Stationary Vehicle Detectors

Figure 2 for Traffic4cast at NeurIPS 2022 -- Predict Dynamics along Graph Edges from Sparse Node Data: Whole City Traffic and ETA from Stationary Vehicle Detectors

Figure 3 for Traffic4cast at NeurIPS 2022 -- Predict Dynamics along Graph Edges from Sparse Node Data: Whole City Traffic and ETA from Stationary Vehicle Detectors

Figure 4 for Traffic4cast at NeurIPS 2022 -- Predict Dynamics along Graph Edges from Sparse Node Data: Whole City Traffic and ETA from Stationary Vehicle Detectors

Abstract:The global trends of urbanization and increased personal mobility force us to rethink the way we live and use urban space. The Traffic4cast competition series tackles this problem in a data-driven way, advancing the latest methods in machine learning for modeling complex spatial systems over time. In this edition, our dynamic road graph data combine information from road maps, $10^{12}$ probe data points, and stationary vehicle detectors in three cities over the span of two years. While stationary vehicle detectors are the most accurate way to capture traffic volume, they are only available in few locations. Traffic4cast 2022 explores models that have the ability to generalize loosely related temporal vertex data on just a few nodes to predict dynamic future traffic states on the edges of the entire road graph. In the core challenge, participants are invited to predict the likelihoods of three congestion classes derived from the speed levels in the GPS data for the entire road graph in three cities 15 min into the future. We only provide vehicle count data from spatially sparse stationary vehicle detectors in these three cities as model input for this task. The data are aggregated in 15 min time bins for one hour prior to the prediction time. For the extended challenge, participants are tasked to predict the average travel times on super-segments 15 min into the future - super-segments are longer sequences of road segments in the graph. The competition results provide an important advance in the prediction of complex city-wide traffic states just from publicly available sparse vehicle data and without the need for large amounts of real-time floating vehicle data.

* Pre-print under review, submitted to Proceedings of Machine Learning Research

Via

Access Paper or Ask Questions

Predicting the protein-ligand affinity from molecular dynamics trajectories

Aug 19, 2022

Yaosen Min, Ye Wei, Peizhuo Wang, Nian Wu, Stefan Bauer, Shuxin Zheng, Yu Shi, Yingheng Wang, Dan Zhao, Ji Wu(+1 more)

Figure 1 for Predicting the protein-ligand affinity from molecular dynamics trajectories

Figure 2 for Predicting the protein-ligand affinity from molecular dynamics trajectories

Figure 3 for Predicting the protein-ligand affinity from molecular dynamics trajectories

Figure 4 for Predicting the protein-ligand affinity from molecular dynamics trajectories

Abstract:The accurate protein-ligand binding affinity prediction is essential in drug design and many other molecular recognition problems. Despite many advances in affinity prediction based on machine learning techniques, they are still limited since the protein-ligand binding is determined by the dynamics of atoms and molecules. To this end, we curated an MD dataset containing 3,218 dynamic protein-ligand complexes and further developed Dynaformer, a graph-based deep learning framework. Dynaformer can fully capture the dynamic binding rules by considering various geometric characteristics of the interaction. Our method shows superior performance over the methods hitherto reported. Moreover, we performed virtual screening on heat shock protein 90 (HSP90) by integrating our model with structure-based docking. We benchmarked our performance against other baselines, demonstrating that our method can identify the molecule with the highest experimental potency. We anticipate that large-scale MD dataset and machine learning models will form a new synergy, providing a new route towards accelerated drug discovery and optimization.

* initial version

Via

Access Paper or Ask Questions

Machine-learning-enhanced time-of-flight mass spectrometry analysis

Oct 02, 2020

Ye Wei, Rama Srinivas Varanasi, Torsten Schwarz, Leonie Gomell, Huan Zhao, David J. Larson, Binhan Sun, Geng Liu, Hao Chen, Dierk Raabe(+1 more)

Figure 1 for Machine-learning-enhanced time-of-flight mass spectrometry analysis

Figure 2 for Machine-learning-enhanced time-of-flight mass spectrometry analysis

Figure 3 for Machine-learning-enhanced time-of-flight mass spectrometry analysis

Figure 4 for Machine-learning-enhanced time-of-flight mass spectrometry analysis

Abstract:Mass spectrometry is a widespread approach to work out what are the constituents of a material. Atoms and molecules are removed from the material and collected, and subsequently, a critical step is to infer their correct identities based from patterns formed in their mass-to-charge ratios and relative isotopic abundances. However, this identification step still mainly relies on individual user's expertise, making its standardization challenging, and hindering efficient data processing. Here, we introduce an approach that leverages modern machine learning technique to identify peak patterns in time-of-flight mass spectra within microseconds, outperforming human users without loss of accuracy. Our approach is cross-validated on mass spectra generated from different time-of-flight mass spectrometry(ToF-MS) techniques, offering the ToF-MS community an open-source, intelligent mass spectra analysis.

* 20 pages, 15 figures

Via

Access Paper or Ask Questions

A Dynamic Boosted Ensemble Learning Method Based on Random Forest

Apr 24, 2018

Xingzhang Ren, Chen Long, Leilei Zhang, Ye Wei, Dongdong Du, Jingxi Liang, Shikun Zhang, Weiping Li

Figure 1 for A Dynamic Boosted Ensemble Learning Method Based on Random Forest

Figure 2 for A Dynamic Boosted Ensemble Learning Method Based on Random Forest

Figure 3 for A Dynamic Boosted Ensemble Learning Method Based on Random Forest

Figure 4 for A Dynamic Boosted Ensemble Learning Method Based on Random Forest

Abstract:We propose a dynamic boosted ensemble learning method based on random forest (DBRF), a novel ensemble algorithm that incorporates the notion of hard example mining into Random Forest (RF) and thus combines the high accuracy of Boosting algorithm with the strong generalization of Bagging algorithm. Specifically, we propose to measure the quality of each leaf node of every decision tree in the random forest to determine hard examples. By iteratively training and then removing easy examples from training data, we evolve the random forest to focus on hard examples dynamically so as to learn decision boundaries better. Data can be cascaded through these random forests learned in each iteration in sequence to generate predictions, thus making RF deep. We also propose to use evolution mechanism and smart iteration mechanism to improve the performance of the model. DBRF outperforms RF on three UCI datasets and achieved state-of-the-art results compared to other deep models. Moreover, we show that DBRF is also a new way of sampling and can be very useful when learning from imbalanced data.

Via

Access Paper or Ask Questions