Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tianze Yang

OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration

Dec 16, 2025

Ruitong Sun, Tianze Yang, Wei Niu, Jin Sun

Figure 1 for OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration

Figure 2 for OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration

Figure 3 for OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration

Figure 4 for OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration

Abstract:Diffusion models have emerged as the dominant paradigm for high-quality image generation, yet their computational expense remains substantial due to iterative denoising. Classifier-Free Guidance (CFG) significantly enhances generation quality and controllability but doubles the computation by requiring both conditional and unconditional forward passes at every timestep. We present OUSAC (Optimized gUidance Scheduling with Adaptive Caching), a framework that accelerates diffusion transformers (DiT) through systematic optimization. Our key insight is that variable guidance scales enable sparse computation: adjusting scales at certain timesteps can compensate for skipping CFG at others, enabling both fewer total sampling steps and fewer CFG steps while maintaining quality. However, variable guidance patterns introduce denoising deviations that undermine standard caching methods, which assume constant CFG scales across steps. Moreover, different transformer blocks are affected at different levels under dynamic conditions. This paper develops a two-stage approach leveraging these insights. Stage-1 employs evolutionary algorithms to jointly optimize which timesteps to skip and what guidance scale to use, eliminating up to 82% of unconditional passes. Stage-2 introduces adaptive rank allocation that tailors calibration efforts per transformer block, maintaining caching effectiveness under variable guidance. Experiments demonstrate that OUSAC significantly outperforms state-of-the-art acceleration methods, achieving 53% computational savings with 15% quality improvement on DiT-XL/2 (ImageNet 512x512), 60% savings with 16.1% improvement on PixArt-alpha (MSCOCO), and 5x speedup on FLUX while improving CLIP Score over the 50-step baseline.

* 29 pages

Via

Access Paper or Ask Questions

From Task-Specific Models to Unified Systems: A Review of Model Merging Approaches

Mar 12, 2025

Wei Ruan, Tianze Yang, Yifan Zhou, Tianming Liu, Jin Lu

Figure 1 for From Task-Specific Models to Unified Systems: A Review of Model Merging Approaches

Figure 2 for From Task-Specific Models to Unified Systems: A Review of Model Merging Approaches

Figure 3 for From Task-Specific Models to Unified Systems: A Review of Model Merging Approaches

Figure 4 for From Task-Specific Models to Unified Systems: A Review of Model Merging Approaches

Abstract:Model merging has achieved significant success, with numerous innovative methods proposed to enhance capabilities by combining multiple models. However, challenges persist due to the lack of a unified framework for classification and systematic comparative analysis, leading to inconsistencies in terminologies and categorizations. Meanwhile, as an increasing number of fine-tuned models are publicly available, their original training data often remain inaccessible due to privacy concerns or intellectual property restrictions. This makes traditional multi-task learning based on shared training data impractical. In scenarios where direct access to training data is infeasible, merging model parameters to create a unified model with broad generalization across multiple domains becomes crucial, further underscoring the importance of model merging techniques. Despite the rapid progress in this field, a comprehensive taxonomy and survey summarizing recent advances and predicting future directions are still lacking. This paper addresses these gaps by establishing a new taxonomy of model merging methods, systematically comparing different approaches, and providing an overview of key developments. By offering a structured perspective on this evolving area, we aim to help newcomers quickly grasp the field's landscape and inspire further innovations.

* 9 pages, 3 figures

Via

Access Paper or Ask Questions

SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering?

Feb 18, 2025

Yucheng Shi, Tianze Yang, Canyu Chen, Quanzheng Li, Tianming Liu, Xiang Li, Ninghao Liu

Abstract:Large Language Models (LLMs) have shown remarkable capabilities in general domains but often struggle with tasks requiring specialized knowledge. Conventional Retrieval-Augmented Generation (RAG) techniques typically retrieve external information from static knowledge bases, which can be outdated or incomplete, missing fine-grained clinical details essential for accurate medical question answering. In this work, we propose SearchRAG, a novel framework that overcomes these limitations by leveraging real-time search engines. Our method employs synthetic query generation to convert complex medical questions into search-engine-friendly queries and utilizes uncertainty-based knowledge selection to filter and incorporate the most relevant and informative medical knowledge into the LLM's input. Experimental results demonstrate that our method significantly improves response accuracy in medical question answering tasks, particularly for complex questions requiring detailed and up-to-date knowledge.

* 8 pages, three figures

Via

Access Paper or Ask Questions

Large Language Models for Manufacturing

Oct 28, 2024

Yiwei Li, Huaqin Zhao, Hanqi Jiang, Yi Pan, Zhengliang Liu, Zihao Wu, Peng Shu, Jie Tian, Tianze Yang, Shaochen Xu(+14 more)

Figure 1 for Large Language Models for Manufacturing

Figure 2 for Large Language Models for Manufacturing

Figure 3 for Large Language Models for Manufacturing

Figure 4 for Large Language Models for Manufacturing

Abstract:The rapid advances in Large Language Models (LLMs) have the potential to transform manufacturing industry, offering new opportunities to optimize processes, improve efficiency, and drive innovation. This paper provides a comprehensive exploration of the integration of LLMs into the manufacturing domain, focusing on their potential to automate and enhance various aspects of manufacturing, from product design and development to quality control, supply chain optimization, and talent management. Through extensive evaluations across multiple manufacturing tasks, we demonstrate the remarkable capabilities of state-of-the-art LLMs, such as GPT-4V, in understanding and executing complex instructions, extracting valuable insights from vast amounts of data, and facilitating knowledge sharing. We also delve into the transformative potential of LLMs in reshaping manufacturing education, automating coding processes, enhancing robot control systems, and enabling the creation of immersive, data-rich virtual environments through the industrial metaverse. By highlighting the practical applications and emerging use cases of LLMs in manufacturing, this paper aims to provide a valuable resource for professionals, researchers, and decision-makers seeking to harness the power of these technologies to address real-world challenges, drive operational excellence, and unlock sustainable growth in an increasingly competitive landscape.

Via

Access Paper or Ask Questions

ECHOPulse: ECG controlled echocardio-grams video generation

Oct 04, 2024

Yiwei Li, Sekeun Kim, Zihao Wu, Hanqi Jiang, Yi Pan, Pengfei Jin, Sifan Song, Yucheng Shi, Tianze Yang, Tianming Liu(+2 more)

Figure 1 for ECHOPulse: ECG controlled echocardio-grams video generation

Figure 2 for ECHOPulse: ECG controlled echocardio-grams video generation

Figure 3 for ECHOPulse: ECG controlled echocardio-grams video generation

Figure 4 for ECHOPulse: ECG controlled echocardio-grams video generation

Abstract:Echocardiography (ECHO) is essential for cardiac assessments, but its video quality and interpretation heavily relies on manual expertise, leading to inconsistent results from clinical and portable devices. ECHO video generation offers a solution by improving automated monitoring through synthetic data and generating high-quality videos from routine health data. However, existing models often face high computational costs, slow inference, and rely on complex conditional prompts that require experts' annotations. To address these challenges, we propose ECHOPULSE, an ECG-conditioned ECHO video generation model. ECHOPULSE introduces two key advancements: (1) it accelerates ECHO video generation by leveraging VQ-VAE tokenization and masked visual token modeling for fast decoding, and (2) it conditions on readily accessible ECG signals, which are highly coherent with ECHO videos, bypassing complex conditional prompts. To the best of our knowledge, this is the first work to use time-series prompts like ECG signals for ECHO video generation. ECHOPULSE not only enables controllable synthetic ECHO data generation but also provides updated cardiac function information for disease monitoring and prediction beyond ECG alone. Evaluations on three public and private datasets demonstrate state-of-the-art performance in ECHO video generation across both qualitative and quantitative measures. Additionally, ECHOPULSE can be easily generalized to other modality generation tasks, such as cardiac MRI, fMRI, and 3D CT generation. Demo can seen from \url{https://github.com/levyisthebest/ECHOPulse_Prelease}.

Via

Access Paper or Ask Questions

ALDM-Grasping: Diffusion-aided Zero-Shot Sim-to-Real Transfer for Robot Grasping

Mar 18, 2024

Yiwei Li, Zihao Wu, Huaqin Zhao, Tianze Yang, Zhengliang Liu, Peng Shu, Jin Sun, Ramviyas Parasuraman, Tianming Liu

Figure 1 for ALDM-Grasping: Diffusion-aided Zero-Shot Sim-to-Real Transfer for Robot Grasping

Figure 2 for ALDM-Grasping: Diffusion-aided Zero-Shot Sim-to-Real Transfer for Robot Grasping

Figure 3 for ALDM-Grasping: Diffusion-aided Zero-Shot Sim-to-Real Transfer for Robot Grasping

Figure 4 for ALDM-Grasping: Diffusion-aided Zero-Shot Sim-to-Real Transfer for Robot Grasping

Abstract:To tackle the "reality gap" encountered in Sim-to-Real transfer, this study proposes a diffusion-based framework that minimizes inconsistencies in grasping actions between the simulation settings and realistic environments. The process begins by training an adversarial supervision layout-to-image diffusion model(ALDM). Then, leverage the ALDM approach to enhance the simulation environment, rendering it with photorealistic fidelity, thereby optimizing robotic grasp task training. Experimental results indicate this framework outperforms existing models in both success rates and adaptability to new environments through improvements in the accuracy and reliability of visual grasping actions under a variety of conditions. Specifically, it achieves a 75\% success rate in grasping tasks under plain backgrounds and maintains a 65\% success rate in more complex scenarios. This performance demonstrates this framework excels at generating controlled image content based on text descriptions, identifying object grasp points, and demonstrating zero-shot learning in complex, unseen scenarios.

Via

Access Paper or Ask Questions

ICE-SEARCH: A Language Model-Driven Feature Selection Approach

Mar 09, 2024

Tianze Yang, Tianyi Yang, Shaoshan Liu, Fuyuan Lvu, Xue Liu

Figure 1 for ICE-SEARCH: A Language Model-Driven Feature Selection Approach

Figure 2 for ICE-SEARCH: A Language Model-Driven Feature Selection Approach

Figure 3 for ICE-SEARCH: A Language Model-Driven Feature Selection Approach

Figure 4 for ICE-SEARCH: A Language Model-Driven Feature Selection Approach

Abstract:This study unveils the In-Context Evolutionary Search (ICE-SEARCH) method, the first work that melds language models (LMs) with evolutionary algorithms for feature selection (FS) tasks and demonstrates its effectiveness in Medical Predictive Analytics (MPA) applications. ICE-SEARCH harnesses the crossover and mutation capabilities inherent in LMs within an evolutionary framework, significantly improving FS through the model's comprehensive world knowledge and its adaptability to a variety of roles. Our evaluation of this methodology spans three crucial MPA tasks: stroke, cardiovascular disease, and diabetes, where ICE-SEARCH outperforms traditional FS methods in pinpointing essential features for medical applications. ICE-SEARCH achieves State-of-the-Art (SOTA) performance in stroke prediction and diabetes prediction; the Decision-Randomized ICE-SEARCH ranks as SOTA in cardiovascular disease prediction. Our results not only demonstrate the efficacy of ICE-SEARCH in medical FS but also underscore the versatility, efficiency, and scalability of integrating LMs in FS tasks. The study emphasizes the critical role of incorporating domain-specific insights, illustrating ICE-SEARCH's robustness, generalizability, and swift convergence. This opens avenues for further research into comprehensive and intricate FS landscapes, marking a significant stride in the application of artificial intelligence in medical predictive analytics.

Via

Access Paper or Ask Questions

Revolutionizing Finance with LLMs: An Overview of Applications and Insights

Jan 22, 2024

Huaqin Zhao, Zhengliang Liu, Zihao Wu, Yiwei Li, Tianze Yang, Peng Shu, Shaochen Xu, Haixing Dai, Lin Zhao, Gengchen Mai(+2 more)

Figure 1 for Revolutionizing Finance with LLMs: An Overview of Applications and Insights

Figure 2 for Revolutionizing Finance with LLMs: An Overview of Applications and Insights

Figure 3 for Revolutionizing Finance with LLMs: An Overview of Applications and Insights

Figure 4 for Revolutionizing Finance with LLMs: An Overview of Applications and Insights

Abstract:In recent years, Large Language Models (LLMs) like ChatGPT have seen considerable advancements and have been applied in diverse fields. Built on the Transformer architecture, these models are trained on extensive datasets, enabling them to understand and generate human language effectively. In the financial domain, the deployment of LLMs is gaining momentum. These models are being utilized for automating financial report generation, forecasting market trends, analyzing investor sentiment, and offering personalized financial advice. Leveraging their natural language processing capabilities, LLMs can distill key insights from vast financial data, aiding institutions in making informed investment choices and enhancing both operational efficiency and customer satisfaction. In this study, we provide a comprehensive overview of the emerging integration of LLMs into various financial tasks. Additionally, we conducted holistic tests on multiple financial tasks through the combination of natural language instructions. Our findings show that GPT-4 effectively follow prompt instructions across various financial tasks. This survey and evaluation of LLMs in the financial domain aim to deepen the understanding of LLMs' current role in finance for both financial practitioners and LLM researchers, identify new research and application prospects, and highlight how these technologies can be leveraged to solve practical challenges in the finance industry.

Via

Access Paper or Ask Questions

Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities

Oct 30, 2023

Zhengliang Liu, Yiwei Li, Qian Cao, Junwen Chen, Tianze Yang, Zihao Wu, John Hale, John Gibbs, Khaled Rasheed, Ninghao Liu(+2 more)

Figure 1 for Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities

Figure 2 for Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities

Figure 3 for Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities

Figure 4 for Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities

Abstract:Recent advances in artificial general intelligence (AGI), particularly large language models and creative image generation systems have demonstrated impressive capabilities on diverse tasks spanning the arts and humanities. However, the swift evolution of AGI has also raised critical questions about its responsible deployment in these culturally significant domains traditionally seen as profoundly human. This paper provides a comprehensive analysis of the applications and implications of AGI for text, graphics, audio, and video pertaining to arts and the humanities. We survey cutting-edge systems and their usage in areas ranging from poetry to history, marketing to film, and communication to classical art. We outline substantial concerns pertaining to factuality, toxicity, biases, and public safety in AGI systems, and propose mitigation strategies. The paper argues for multi-stakeholder collaboration to ensure AGI promotes creativity, knowledge, and cultural values without undermining truth or human dignity. Our timely contribution summarizes a rapidly developing field, highlighting promising directions while advocating for responsible progress centering on human flourishing. The analysis lays the groundwork for further research on aligning AGI's technological capacities with enduring social goods.

Via

Access Paper or Ask Questions

ALPHA: Attention-based Long-horizon Pathfinding in Highly-structured Areas

Oct 12, 2023

Chengyang He, Tianze Yang, Tanishq Duhan, Yutong Wang, Guillaume Sartoretti

Abstract:The multi-agent pathfinding (MAPF) problem seeks collision-free paths for a team of agents from their current positions to their pre-set goals in a known environment, and is an essential problem found at the core of many logistics, transportation, and general robotics applications. Existing learning-based MAPF approaches typically only let each agent make decisions based on a limited field-of-view (FOV) around its position, as a natural means to fix the input dimensions of its policy network. However, this often makes policies short-sighted, since agents lack the ability to perceive and plan for obstacles/agents beyond their FOV. To address this challenge, we propose ALPHA, a new framework combining the use of ground truth proximal (local) information and fuzzy distal (global) information to let agents sequence local decisions based on the full current state of the system, and avoid such myopicity. We further allow agents to make short-term predictions about each others' paths, as a means to reason about each others' path intentions, thereby enhancing the level of cooperation among agents at the whole system level. Our neural structure relies on a Graph Transformer architecture to allow agents to selectively combine these different sources of information and reason about their inter-dependencies at different spatial scales. Our simulation experiments demonstrate that ALPHA outperforms both globally-guided MAPF solvers and communication-learning based ones, showcasing its potential towards scalability in realistic deployments.

* Submitted to the IEEE International Conference on Robotics and Automation (ICRA 2024)

Via

Access Paper or Ask Questions