Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haixu Wu

Transolver-3: Scaling Up Transformer Solvers to Industrial-Scale Geometries

Feb 04, 2026

Hang Zhou, Haixu Wu, Haonan Shangguan, Yuezhou Ma, Huikun Weng, Jianmin Wang, Mingsheng Long

Abstract:Deep learning has emerged as a transformative tool for the neural surrogate modeling of partial differential equations (PDEs), known as neural PDE solvers. However, scaling these solvers to industrial-scale geometries with over $10^8$ cells remains a fundamental challenge due to the prohibitive memory complexity of processing high-resolution meshes. We present Transolver-3, a new member of the Transolver family as a highly scalable framework designed for high-fidelity physics simulations. To bridge the gap between limited GPU capacity and the resolution requirements of complex engineering tasks, we introduce two key architectural optimizations: faster slice and deslice by exploiting matrix multiplication associative property and geometry slice tiling to partition the computation of physical states. Combined with an amortized training strategy by learning on random subsets of original high-resolution meshes and a physical state caching technique during inference, Transolver-3 enables high-fidelity field prediction on industrial-scale meshes. Extensive experiments demonstrate that Transolver-3 is capable of handling meshes with over 160 million cells, achieving impressive performance across three challenging simulation benchmarks, including aircraft and automotive design tasks.

Via

Access Paper or Ask Questions

Neural Modular Physics for Elastic Simulation

Dec 17, 2025

Yifei Li, Haixu Wu, Zeyi Xu, Tuur Stuyck, Wojciech Matusik

Figure 1 for Neural Modular Physics for Elastic Simulation

Figure 2 for Neural Modular Physics for Elastic Simulation

Figure 3 for Neural Modular Physics for Elastic Simulation

Figure 4 for Neural Modular Physics for Elastic Simulation

Abstract:Learning-based methods have made significant progress in physics simulation, typically approximating dynamics with a monolithic end-to-end optimized neural network. Although these models offer an effective way to simulation, they may lose essential features compared to traditional numerical simulators, such as physical interpretability and reliability. Drawing inspiration from classical simulators that operate in a modular fashion, this paper presents Neural Modular Physics (NMP) for elastic simulation, which combines the approximation capacity of neural networks with the physical reliability of traditional simulators. Beyond the previous monolithic learning paradigm, NMP enables direct supervision of intermediate quantities and physical constraints by decomposing elastic dynamics into physically meaningful neural modules connected through intermediate physical quantities. With a specialized architecture and training strategy, our method transforms the numerical computation flow into a modular neural simulator, achieving improved physical consistency and generalizability. Experimentally, NMP demonstrates superior generalization to unseen initial conditions and resolutions, stable long-horizon simulation, better preservation of physical properties compared to other neural simulators, and greater feasibility in scenarios with unknown underlying dynamics than traditional simulators.

Via

Access Paper or Ask Questions

FlashBias: Fast Computation of Attention with Bias

May 17, 2025

Haixu Wu, Minghao Guo, Yuezhou Ma, Yuanxu Sun, Jianmin Wang, Wojciech Matusik, Mingsheng Long

Abstract:Attention mechanism has emerged as a foundation module of modern deep learning models and has also empowered many milestones in various domains. Moreover, FlashAttention with IO-aware speedup resolves the efficiency issue of standard attention, further promoting its practicality. Beyond canonical attention, attention with bias also widely exists, such as relative position bias in vision and language models and pair representation bias in AlphaFold. In these works, prior knowledge is introduced as an additive bias term of attention weights to guide the learning process, which has been proven essential for model performance. Surprisingly, despite the common usage of attention with bias, its targeted efficiency optimization is still absent, which seriously hinders its wide applications in complex tasks. Diving into the computation of FlashAttention, we prove that its optimal efficiency is determined by the rank of the attention weight matrix. Inspired by this theoretical result, this paper presents FlashBias based on the low-rank compressed sensing theory, which can provide fast-exact computation for many widely used attention biases and a fast-accurate approximation for biases in general formalization. FlashBias can fully take advantage of the extremely optimized matrix multiplication operation in modern GPUs, achieving 1.5$\times$ speedup for AlphaFold, and over 2$\times$ speedup for attention with bias in vision and language models without loss of accuracy.

Via

Access Paper or Ask Questions

Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries

Feb 04, 2025

Huakun Luo, Haixu Wu, Hang Zhou, Lanxiang Xing, Yichen Di, Jianmin Wang, Mingsheng Long

Abstract:Although deep models have been widely explored in solving partial differential equations (PDEs), previous works are primarily limited to data only with up to tens of thousands of mesh points, far from the million-point scale required by industrial simulations that involve complex geometries. In the spirit of advancing neural PDE solvers to real industrial applications, we present Transolver++, a highly parallel and efficient neural solver that can accurately solve PDEs on million-scale geometries. Building upon previous advancements in solving PDEs by learning physical states via Transolver, Transolver++ is further equipped with an extremely optimized parallelism framework and a local adaptive mechanism to efficiently capture eidetic physical states from massive mesh points, successfully tackling the thorny challenges in computation and physics learning when scaling up input mesh size. Transolver++ increases the single-GPU input capacity to million-scale points for the first time and is capable of continuously scaling input size in linear complexity by increasing GPUs. Experimentally, Transolver++ yields 13% relative promotion across six standard PDE benchmarks and achieves over 20% performance gain in million-scale high-fidelity industrial simulations, whose sizes are 100$\times$ larger than previous benchmarks, covering car and 3D aircraft designs.

Via

Access Paper or Ask Questions

ProPINN: Demystifying Propagation Failures in Physics-Informed Neural Networks

Feb 02, 2025

Haixu Wu, Yuezhou Ma, Hang Zhou, Huikun Weng, Jianmin Wang, Mingsheng Long

Figure 1 for ProPINN: Demystifying Propagation Failures in Physics-Informed Neural Networks

Figure 2 for ProPINN: Demystifying Propagation Failures in Physics-Informed Neural Networks

Figure 3 for ProPINN: Demystifying Propagation Failures in Physics-Informed Neural Networks

Figure 4 for ProPINN: Demystifying Propagation Failures in Physics-Informed Neural Networks

Abstract:Physics-informed neural networks (PINNs) have earned high expectations in solving partial differential equations (PDEs), but their optimization usually faces thorny challenges due to the unique derivative-dependent loss function. By analyzing the loss distribution, previous research observed the propagation failure phenomenon of PINNs, intuitively described as the correct supervision for model outputs cannot ``propagate'' from initial states or boundaries to the interior domain. Going beyond intuitive understanding, this paper provides the first formal and in-depth study of propagation failure and its root cause. Based on a detailed comparison with classical finite element methods, we ascribe the failure to the conventional single-point-processing architecture of PINNs and further prove that propagation failure is essentially caused by the lower gradient correlation of PINN models on nearby collocation points. Compared to superficial loss maps, this new perspective provides a more precise quantitative criterion to identify where and why PINN fails. The theoretical finding also inspires us to present a new PINN architecture, named ProPINN, which can effectively unite the gradient of region points for better propagation. ProPINN can reliably resolve PINN failure modes and significantly surpass advanced Transformer-based models with 46% relative promotion.

Via

Access Paper or Ask Questions

Metadata Matters for Time Series: Informative Forecasting with Transformers

Oct 04, 2024

Jiaxiang Dong, Haixu Wu, Yuxuan Wang, Li Zhang, Jianmin Wang, Mingsheng Long

Figure 1 for Metadata Matters for Time Series: Informative Forecasting with Transformers

Figure 2 for Metadata Matters for Time Series: Informative Forecasting with Transformers

Figure 3 for Metadata Matters for Time Series: Informative Forecasting with Transformers

Figure 4 for Metadata Matters for Time Series: Informative Forecasting with Transformers

Abstract:Time series forecasting is prevalent in extensive real-world applications, such as financial analysis and energy planning. Previous studies primarily focus on time series modality, endeavoring to capture the intricate variations and dependencies inherent in time series. Beyond numerical time series data, we notice that metadata (e.g.~dataset and variate descriptions) also carries valuable information essential for forecasting, which can be used to identify the application scenario and provide more interpretable knowledge than digit sequences. Inspired by this observation, we propose a Metadata-informed Time Series Transformer (MetaTST), which incorporates multiple levels of context-specific metadata into Transformer forecasting models to enable informative time series forecasting. To tackle the unstructured nature of metadata, MetaTST formalizes them into natural languages by pre-designed templates and leverages large language models (LLMs) to encode these texts into metadata tokens as a supplement to classic series tokens, resulting in an informative embedding. Further, a Transformer encoder is employed to communicate series and metadata tokens, which can extend series representations by metadata information for more accurate forecasting. This design also allows the model to adaptively learn context-specific patterns across various scenarios, which is particularly effective in handling large-scale, diverse-scenario forecasting tasks. Experimentally, MetaTST achieves state-of-the-art compared to advanced time series models and LLM-based methods on widely acknowledged short- and long-term forecasting benchmarks, covering both single-dataset individual and multi-dataset joint training settings.

Via

Access Paper or Ask Questions

Deep Time Series Models: A Comprehensive Survey and Benchmark

Jul 18, 2024

Yuxuan Wang, Haixu Wu, Jiaxiang Dong, Yong Liu, Mingsheng Long, Jianmin Wang

Figure 1 for Deep Time Series Models: A Comprehensive Survey and Benchmark

Figure 2 for Deep Time Series Models: A Comprehensive Survey and Benchmark

Figure 3 for Deep Time Series Models: A Comprehensive Survey and Benchmark

Figure 4 for Deep Time Series Models: A Comprehensive Survey and Benchmark

Abstract:Time series, characterized by a sequence of data points arranged in a discrete-time order, are ubiquitous in real-world applications. Different from other modalities, time series present unique challenges due to their complex and dynamic nature, including the entanglement of nonlinear patterns and time-variant trends. Analyzing time series data is of great significance in real-world scenarios and has been widely studied over centuries. Recent years have witnessed remarkable breakthroughs in the time series community, with techniques shifting from traditional statistical methods to advanced deep learning models. In this paper, we delve into the design of deep time series models across various analysis tasks and review the existing literature from two perspectives: basic modules and model architectures. Further, we develop and release Time Series Library (TSLib) as a fair benchmark of deep time series models for diverse analysis tasks, which implements 24 mainstream models, covers 30 datasets from different domains, and supports five prevalent analysis tasks. Based on TSLib, we thoroughly evaluate 12 advanced deep time series models on different tasks. Empirical results indicate that models with specific structures are well-suited for distinct analytical tasks, which offers insights for research and adoption of deep time series models. Code is available at https://github.com/thuml/Time-Series-Library.

* \

Via

Access Paper or Ask Questions

Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers

May 27, 2024

Zhou Hang, Yuezhou Ma, Haixu Wu, Haowen Wang, Mingsheng Long

Figure 1 for Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers

Figure 2 for Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers

Figure 3 for Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers

Figure 4 for Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers

Abstract:Deep models have recently emerged as a promising tool to solve partial differential equations (PDEs), known as neural PDE solvers. While neural solvers trained from either simulation data or physics-informed loss can solve the PDEs reasonably well, they are mainly restricted to a specific set of PDEs, e.g. a certain equation or a finite set of coefficients. This bottleneck limits the generalizability of neural solvers, which is widely recognized as its major advantage over numerical solvers. In this paper, we present the Universal PDE solver (Unisolver) capable of solving a wide scope of PDEs by leveraging a Transformer pre-trained on diverse data and conditioned on diverse PDEs. Instead of simply scaling up data and parameters, Unisolver stems from the theoretical analysis of the PDE-solving process. Our key finding is that a PDE solution is fundamentally under the control of a series of PDE components, e.g. equation symbols, coefficients, and initial and boundary conditions. Inspired by the mathematical structure of PDEs, we define a complete set of PDE components and correspondingly embed them as domain-wise (e.g. equation symbols) and point-wise (e.g. boundaries) conditions for Transformer PDE solvers. Integrating physical insights with recent Transformer advances, Unisolver achieves consistent state-of-the-art results on three challenging large-scale benchmarks, showing impressive gains and endowing favorable generalizability and scalability.

Via

Access Paper or Ask Questions

RoPINN: Region Optimized Physics-Informed Neural Networks

May 23, 2024

Haixu Wu, Huakun Luo, Yuezhou Ma, Jianmin Wang, Mingsheng Long

Abstract:Physics-informed neural networks (PINNs) have been widely applied to solve partial differential equations (PDEs) by enforcing outputs and gradients of deep models to satisfy target equations. Due to the limitation of numerical computation, PINNs are conventionally optimized on finite selected points. However, since PDEs are usually defined on continuous domains, solely optimizing models on scattered points may be insufficient to obtain an accurate solution for the whole domain. To mitigate this inherent deficiency of the default scatter-point optimization, this paper proposes and theoretically studies a new training paradigm as region optimization. Concretely, we propose to extend the optimization process of PINNs from isolated points to their continuous neighborhood regions, which can theoretically decrease the generalization error, especially for hidden high-order constraints of PDEs. A practical training algorithm, Region Optimized PINN (RoPINN), is seamlessly derived from this new paradigm, which is implemented by a straightforward but effective Monte Carlo sampling method. By calibrating the sampling process into trust regions, RoPINN finely balances sampling efficiency and generalization error. Experimentally, RoPINN consistently boosts the performance of diverse PINNs on a wide range of PDEs without extra backpropagation or gradient calculation.

Via

Access Paper or Ask Questions

TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

May 23, 2024

Shiyu Wang, Haixu Wu, Xiaoming Shi, Tengge Hu, Huakun Luo, Lintao Ma, James Y. Zhang, Jun Zhou

Figure 1 for TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

Figure 2 for TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

Figure 3 for TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

Figure 4 for TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting

Abstract:Time series forecasting is widely used in extensive applications, such as traffic planning and weather forecasting. However, real-world time series usually present intricate temporal variations, making forecasting extremely challenging. Going beyond the mainstream paradigms of plain decomposition and multiperiodicity analysis, we analyze temporal variations in a novel view of multiscale-mixing, which is based on an intuitive but important observation that time series present distinct patterns in different sampling scales. The microscopic and the macroscopic information are reflected in fine and coarse scales respectively, and thereby complex variations can be inherently disentangled. Based on this observation, we propose TimeMixer as a fully MLP-based architecture with Past-Decomposable-Mixing (PDM) and Future-Multipredictor-Mixing (FMM) blocks to take full advantage of disentangled multiscale series in both past extraction and future prediction phases. Concretely, PDM applies the decomposition to multiscale series and further mixes the decomposed seasonal and trend components in fine-to-coarse and coarse-to-fine directions separately, which successively aggregates the microscopic seasonal and macroscopic trend information. FMM further ensembles multiple predictors to utilize complementary forecasting capabilities in multiscale observations. Consequently, TimeMixer is able to achieve consistent state-of-the-art performances in both long-term and short-term forecasting tasks with favorable run-time efficiency.

Via

Access Paper or Ask Questions