Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guannan Zhang

Unify Graph Learning with Text: Unleashing LLM Potentials for Session Search

May 20, 2025

Songhao Wu, Quan Tu, Hong Liu, Jia Xu, Zhongyi Liu, Guannan Zhang, Ran Wang, Xiuying Chen, Rui Yan

Abstract:Session search involves a series of interactive queries and actions to fulfill user's complex information need. Current strategies typically prioritize sequential modeling for deep semantic understanding, overlooking the graph structure in interactions. While some approaches focus on capturing structural information, they use a generalized representation for documents, neglecting the word-level semantic modeling. In this paper, we propose Symbolic Graph Ranker (SGR), which aims to take advantage of both text-based and graph-based approaches by leveraging the power of recent Large Language Models (LLMs). Concretely, we first introduce a set of symbolic grammar rules to convert session graph into text. This allows integrating session history, interaction process, and task instruction seamlessly as inputs for the LLM. Moreover, given the natural discrepancy between LLMs pre-trained on textual corpora, and the symbolic language we produce using our graph-to-text grammar, our objective is to enhance LLMs' ability to capture graph structures within a textual format. To achieve this, we introduce a set of self-supervised symbolic learning tasks including link prediction, node content generation, and generative contrastive learning, to enable LLMs to capture the topological information from coarse-grained to fine-grained. Experiment results and comprehensive analysis on two benchmark datasets, AOL and Tiangong-ST, confirm the superiority of our approach. Our paradigm also offers a novel and effective methodology that bridges the gap between traditional search strategies and modern LLMs.

Via

Access Paper or Ask Questions

Multi-fidelity Parameter Estimation Using Conditional Diffusion Models

Apr 02, 2025

Caroline Tatsuoka, Minglei Yang, Dongbin Xiu, Guannan Zhang

Abstract:We present a multi-fidelity method for uncertainty quantification of parameter estimates in complex systems, leveraging generative models trained to sample the target conditional distribution. In the Bayesian inference setting, traditional parameter estimation methods rely on repeated simulations of potentially expensive forward models to determine the posterior distribution of the parameter values, which may result in computationally intractable workflows. Furthermore, methods such as Markov Chain Monte Carlo (MCMC) necessitate rerunning the entire algorithm for each new data observation, further increasing the computational burden. Hence, we propose a novel method for efficiently obtaining posterior distributions of parameter estimates for high-fidelity models given data observations of interest. The method first constructs a low-fidelity, conditional generative model capable of amortized Bayesian inference and hence rapid posterior density approximation over a wide-range of data observations. When higher accuracy is needed for a specific data observation, the method employs adaptive refinement of the density approximation. It uses outputs from the low-fidelity generative model to refine the parameter sampling space, ensuring efficient use of the computationally expensive high-fidelity solver. Subsequently, a high-fidelity, unconditional generative model is trained to achieve greater accuracy in the target posterior distribution. Both low- and high- fidelity generative models enable efficient sampling from the target posterior and do not require repeated simulation of the high-fidelity forward model. We demonstrate the effectiveness of the proposed method on several numerical examples, including cases with multi-modal densities, as well as an application in plasma physics for a runaway electron simulation model.

Via

Access Paper or Ask Questions

MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

Dec 10, 2024

Yufei Ma, Zihan Liang, Huangyu Dai, Ben Chen, Dehong Gao, Zhuoran Ran, Wang Zihan, Linbo Jin, Wen Jiang, Guannan Zhang(+2 more)

Figure 1 for MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

Figure 2 for MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

Figure 3 for MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

Figure 4 for MoDULA: Mixture of Domain-Specific and Universal LoRA for Multi-Task Learning

Abstract:The growing demand for larger-scale models in the development of \textbf{L}arge \textbf{L}anguage \textbf{M}odels (LLMs) poses challenges for efficient training within limited computational resources. Traditional fine-tuning methods often exhibit instability in multi-task learning and rely heavily on extensive training resources. Here, we propose MoDULA (\textbf{M}ixture \textbf{o}f \textbf{D}omain-Specific and \textbf{U}niversal \textbf{L}oR\textbf{A}), a novel \textbf{P}arameter \textbf{E}fficient \textbf{F}ine-\textbf{T}uning (PEFT) \textbf{M}ixture-\textbf{o}f-\textbf{E}xpert (MoE) paradigm for improved fine-tuning and parameter efficiency in multi-task learning. The paradigm effectively improves the multi-task capability of the model by training universal experts, domain-specific experts, and routers separately. MoDULA-Res is a new method within the MoDULA paradigm, which maintains the model's general capability by connecting universal and task-specific experts through residual connections. The experimental results demonstrate that the overall performance of the MoDULA-Flan and MoDULA-Res methods surpasses that of existing fine-tuning methods on various LLMs. Notably, MoDULA-Res achieves more significant performance improvements in multiple tasks while reducing training costs by over 80\% without losing general capability. Moreover, MoDULA displays flexible pluggability, allowing for the efficient addition of new tasks without retraining existing experts from scratch. This progressive training paradigm circumvents data balancing issues, enhancing training efficiency and model stability. Overall, MoDULA provides a scalable, cost-effective solution for fine-tuning LLMs with enhanced parameter efficiency and generalization capability.

Via

Access Paper or Ask Questions

GenAI4UQ: A Software for Inverse Uncertainty Quantification Using Conditional Generative Models

Dec 09, 2024

Ming Fan, Zezhong Zhang, Dan Lu, Guannan Zhang

Abstract:We introduce GenAI4UQ, a software package for inverse uncertainty quantification in model calibration, parameter estimation, and ensemble forecasting in scientific applications. GenAI4UQ leverages a generative artificial intelligence (AI) based conditional modeling framework to address the limitations of traditional inverse modeling techniques, such as Markov Chain Monte Carlo methods. By replacing computationally intensive iterative processes with a direct, learned mapping, GenAI4UQ enables efficient calibration of model input parameters and generation of output predictions directly from observations. The software's design allows for rapid ensemble forecasting with robust uncertainty quantification, while maintaining high computational and storage efficiency. GenAI4UQ simplifies the model training process through built-in auto-tuning of hyperparameters, making it accessible to users with varying levels of expertise. Its conditional generative framework ensures versatility, enabling applicability across a wide range of scientific domains. At its core, GenAI4UQ transforms the paradigm of inverse modeling by providing a fast, reliable, and user-friendly solution. It empowers researchers and practitioners to quickly estimate parameter distributions and generate model predictions for new observations, facilitating efficient decision-making and advancing the state of uncertainty quantification in computational modeling. (The code and data are available at https://github.com/patrickfan/GenAI4UQ).

Via

Access Paper or Ask Questions

An End-to-End Deep Learning Method for Solving Nonlocal Allen-Cahn and Cahn-Hilliard Phase-Field Models

Oct 11, 2024

Yuwei Geng, Olena Burkovska, Lili Ju, Guannan Zhang, Max Gunzburger

Abstract:We propose an efficient end-to-end deep learning method for solving nonlocal Allen-Cahn (AC) and Cahn-Hilliard (CH) phase-field models. One motivation for this effort emanates from the fact that discretized partial differential equation-based AC or CH phase-field models result in diffuse interfaces between phases, with the only recourse for remediation is to severely refine the spatial grids in the vicinity of the true moving sharp interface whose width is determined by a grid-independent parameter that is substantially larger than the local grid size. In this work, we introduce non-mass conserving nonlocal AC or CH phase-field models with regular, logarithmic, or obstacle double-well potentials. Because of non-locality, some of these models feature totally sharp interfaces separating phases. The discretization of such models can lead to a transition between phases whose width is only a single grid cell wide. Another motivation is to use deep learning approaches to ameliorate the otherwise high cost of solving discretized nonlocal phase-field models. To this end, loss functions of the customized neural networks are defined using the residual of the fully discrete approximations of the AC or CH models, which results from applying a Fourier collocation method and a temporal semi-implicit approximation. To address the long-range interactions in the models, we tailor the architecture of the neural network by incorporating a nonlocal kernel as an input channel to the neural network model. We then provide the results of extensive computational experiments to illustrate the accuracy, structure-preserving properties, predictive capabilities, and cost reductions of the proposed method.

Via

Access Paper or Ask Questions

A Training-Free Conditional Diffusion Model for Learning Stochastic Dynamical Systems

Oct 04, 2024

Yanfang Liu, Yuan Chen, Dongbin Xiu, Guannan Zhang

Abstract:This study introduces a training-free conditional diffusion model for learning unknown stochastic differential equations (SDEs) using data. The proposed approach addresses key challenges in computational efficiency and accuracy for modeling SDEs by utilizing a score-based diffusion model to approximate their stochastic flow map. Unlike the existing methods, this technique is based on an analytically derived closed-form exact score function, which can be efficiently estimated by Monte Carlo method using the trajectory data, and eliminates the need for neural network training to learn the score function. By generating labeled data through solving the corresponding reverse ordinary differential equation, the approach enables supervised learning of the flow map. Extensive numerical experiments across various SDE types, including linear, nonlinear, and multi-dimensional systems, demonstrate the versatility and effectiveness of the method. The learned models exhibit significant improvements in predicting both short-term and long-term behaviors of unknown stochastic systems, often surpassing baseline methods like GANs in estimating drift and diffusion coefficients.

Via

Access Paper or Ask Questions

Nonuniform random feature models using derivative information

Oct 03, 2024

Konstantin Pieper, Zezhong Zhang, Guannan Zhang

Abstract:We propose nonuniform data-driven parameter distributions for neural network initialization based on derivative data of the function to be approximated. These parameter distributions are developed in the context of non-parametric regression models based on shallow neural networks, and compare favorably to well-established uniform random feature models based on conventional weight initialization. We address the cases of Heaviside and ReLU activation functions, and their smooth approximations (sigmoid and softplus), and use recent results on the harmonic analysis and sparse representation of neural networks resulting from fully trained optimal networks. Extending analytic results that give exact representation, we obtain densities that concentrate in regions of the parameter space corresponding to neurons that are well suited to model the local derivatives of the unknown function. Based on these results, we suggest simplifications of these exact densities based on approximate derivative data in the input points that allow for very efficient sampling and lead to performance of random feature models close to optimal networks in several scenarios.

Via

Access Paper or Ask Questions

MLoRA: Multi-Domain Low-Rank Adaptive Network for CTR Prediction

Aug 14, 2024

Zhiming Yang, Haining Gao, Dehong Gao, Luwei Yang, Libin Yang, Xiaoyan Cai, Wei Ning, Guannan Zhang

Figure 1 for MLoRA: Multi-Domain Low-Rank Adaptive Network for CTR Prediction

Figure 2 for MLoRA: Multi-Domain Low-Rank Adaptive Network for CTR Prediction

Figure 3 for MLoRA: Multi-Domain Low-Rank Adaptive Network for CTR Prediction

Figure 4 for MLoRA: Multi-Domain Low-Rank Adaptive Network for CTR Prediction

Abstract:Click-through rate (CTR) prediction is one of the fundamental tasks in the industry, especially in e-commerce, social media, and streaming media. It directly impacts website revenues, user satisfaction, and user retention. However, real-world production platforms often encompass various domains to cater for diverse customer needs. Traditional CTR prediction models struggle in multi-domain recommendation scenarios, facing challenges of data sparsity and disparate data distributions across domains. Existing multi-domain recommendation approaches introduce specific-domain modules for each domain, which partially address these issues but often significantly increase model parameters and lead to insufficient training. In this paper, we propose a Multi-domain Low-Rank Adaptive network (MLoRA) for CTR prediction, where we introduce a specialized LoRA module for each domain. This approach enhances the model's performance in multi-domain CTR prediction tasks and is able to be applied to various deep-learning models. We evaluate the proposed method on several multi-domain datasets. Experimental results demonstrate our MLoRA approach achieves a significant improvement compared with state-of-the-art baselines. Furthermore, we deploy it in the production environment of the Alibaba.COM. The online A/B testing results indicate the superiority and flexibility in real-world production environments. The code of our MLoRA is publicly available.

* 11 pages. Accepted by RecSys'2024, full paper

Via

Access Paper or Ask Questions

Modeling User Intent Beyond Trigger: Incorporating Uncertainty for Trigger-Induced Recommendation

Aug 07, 2024

Jianxing Ma, Zhibo Xiao, Luwei Yang, Hansheng Xue, Xuanzhou Liu, Wen Jiang, Wei Ning, Guannan Zhang

Figure 1 for Modeling User Intent Beyond Trigger: Incorporating Uncertainty for Trigger-Induced Recommendation

Figure 2 for Modeling User Intent Beyond Trigger: Incorporating Uncertainty for Trigger-Induced Recommendation

Figure 3 for Modeling User Intent Beyond Trigger: Incorporating Uncertainty for Trigger-Induced Recommendation

Figure 4 for Modeling User Intent Beyond Trigger: Incorporating Uncertainty for Trigger-Induced Recommendation

Abstract:To cater to users' desire for an immersive browsing experience, numerous e-commerce platforms provide various recommendation scenarios, with a focus on Trigger-Induced Recommendation (TIR) tasks. However, the majority of current TIR methods heavily rely on the trigger item to understand user intent, lacking a higher-level exploration and exploitation of user intent (e.g., popular items and complementary items), which may result in an overly convergent understanding of users' short-term intent and can be detrimental to users' long-term purchasing experiences. Moreover, users' short-term intent shows uncertainty and is affected by various factors such as browsing context and historical behaviors, which poses challenges to user intent modeling. To address these challenges, we propose a novel model called Deep Uncertainty Intent Network (DUIN), comprising three essential modules: i) Explicit Intent Exploit Module extracting explicit user intent using the contrastive learning paradigm; ii) Latent Intent Explore Module exploring latent user intent by leveraging the multi-view relationships between items; iii) Intent Uncertainty Measurement Module offering a distributional estimation and capturing the uncertainty associated with user intent. Experiments on three real-world datasets demonstrate the superior performance of DUIN compared to existing baselines. Notably, DUIN has been deployed across all TIR scenarios in our e-commerce platform, with online A/B testing results conclusively validating its superiority.

* Accepted at CIKM 2024

Via

Access Paper or Ask Questions

A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics

Jul 16, 2024

Junqi Yin, Siming Liang, Siyan Liu, Feng Bao, Hristo G. Chipilski, Dan Lu, Guannan Zhang

Figure 1 for A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics

Figure 2 for A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics

Figure 3 for A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics

Figure 4 for A Scalable Real-Time Data Assimilation Framework for Predicting Turbulent Atmosphere Dynamics

Abstract:The weather and climate domains are undergoing a significant transformation thanks to advances in AI-based foundation models such as FourCastNet, GraphCast, ClimaX and Pangu-Weather. While these models show considerable potential, they are not ready yet for operational use in weather forecasting or climate prediction. This is due to the lack of a data assimilation method as part of their workflow to enable the assimilation of incoming Earth system observations in real time. This limitation affects their effectiveness in predicting complex atmospheric phenomena such as tropical cyclones and atmospheric rivers. To overcome these obstacles, we introduce a generic real-time data assimilation framework and demonstrate its end-to-end performance on the Frontier supercomputer. This framework comprises two primary modules: an ensemble score filter (EnSF), which significantly outperforms the state-of-the-art data assimilation method, namely, the Local Ensemble Transform Kalman Filter (LETKF); and a vision transformer-based surrogate capable of real-time adaptation through the integration of observational data. The ViT surrogate can represent either physics-based models or AI-based foundation models. We demonstrate both the strong and weak scaling of our framework up to 1024 GPUs on the Exascale supercomputer, Frontier. Our results not only illustrate the framework's exceptional scalability on high-performance computing systems, but also demonstrate the importance of supercomputers in real-time data assimilation for weather and climate predictions. Even though the proposed framework is tested only on a benchmark surface quasi-geostrophic (SQG) turbulence system, it has the potential to be combined with existing AI-based foundation models, making it suitable for future operational implementations.

Via

Access Paper or Ask Questions