Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yong Huang

NTIRE 2025 Challenge on HR Depth from Images of Specular and Transparent Surfaces

Jun 06, 2025

Pierluigi Zama Ramirez, Fabio Tosi, Luigi Di Stefano, Radu Timofte, Alex Costanzino, Matteo Poggi, Samuele Salti, Stefano Mattoccia, Zhe Zhang, Yang Yang(+29 more)

Abstract:This paper reports on the NTIRE 2025 challenge on HR Depth From images of Specular and Transparent surfaces, held in conjunction with the New Trends in Image Restoration and Enhancement (NTIRE) workshop at CVPR 2025. This challenge aims to advance the research on depth estimation, specifically to address two of the main open issues in the field: high-resolution and non-Lambertian surfaces. The challenge proposes two tracks on stereo and single-image depth estimation, attracting about 177 registered participants. In the final testing stage, 4 and 4 participating teams submitted their models and fact sheets for the two tracks.

* NTIRE Workshop Challenge Report, CVPR 2025

Via

Access Paper or Ask Questions

SDIGLM: Leveraging Large Language Models and Multi-Modal Chain of Thought for Structural Damage Identification

Apr 12, 2025

Yunkai Zhang, Shiyin Wei, Yong Huang, Yawu Su, Shanshan Lu, Hui Li

Abstract:Existing computer vision(CV)-based structural damage identification models demonstrate notable accuracy in categorizing and localizing damage. However, these models present several critical limitations that hinder their practical application in civil engineering(CE). Primarily, their ability to recognize damage types remains constrained, preventing comprehensive analysis of the highly varied and complex conditions encountered in real-world CE structures. Second, these models lack linguistic capabilities, rendering them unable to articulate structural damage characteristics through natural language descriptions. With the continuous advancement of artificial intelligence(AI), large multi-modal models(LMMs) have emerged as a transformative solution, enabling the unified encoding and alignment of textual and visual data. These models can autonomously generate detailed descriptive narratives of structural damage while demonstrating robust generalization across diverse scenarios and tasks. This study introduces SDIGLM, an innovative LMM for structural damage identification, developed based on the open-source VisualGLM-6B architecture. To address the challenge of adapting LMMs to the intricate and varied operating conditions in CE, this work integrates a U-Net-based semantic segmentation module to generate defect segmentation maps as visual Chain of Thought(CoT). Additionally, a multi-round dialogue fine-tuning dataset is constructed to enhance logical reasoning, complemented by a language CoT formed through prompt engineering. By leveraging this multi-modal CoT, SDIGLM surpasses general-purpose LMMs in structural damage identification, achieving an accuracy of 95.24% across various infrastructure types. Moreover, the model effectively describes damage characteristics such as hole size, crack direction, and corrosion severity.

Via

Access Paper or Ask Questions

Towards Large-scale Chemical Reaction Image Parsing via a Multimodal Large Language Model

Mar 11, 2025

Yufan Chen, Ching Ting Leung, Jianwei Sun, Yong Huang, Linyan Li, Hao Chen, Hanyu Gao

Abstract:Artificial intelligence (AI) has demonstrated significant promise in advancing organic chemistry research; however, its effectiveness depends on the availability of high-quality chemical reaction data. Currently, most published chemical reactions are not available in machine-readable form, limiting the broader application of AI in this field. The extraction of published chemical reactions into structured databases still relies heavily on manual curation, and robust automatic parsing of chemical reaction images into machine-readable data remains a significant challenge. To address this, we introduce the Reaction Image Multimodal large language model (RxnIM), the first multimodal large language model specifically designed to parse chemical reaction images into machine-readable reaction data. RxnIM not only extracts key chemical components from reaction images but also interprets the textual content that describes reaction conditions. Together with specially designed large-scale dataset generation method to support model training, our approach achieves excellent performance, with an average F1 score of 88% on various benchmarks, surpassing literature methods by 5%. This represents a crucial step toward the automatic construction of large databases of machine-readable reaction data parsed from images in the chemistry literature, providing essential data resources for AI research in chemistry. The source code, model checkpoints, and datasets developed in this work are released under permissive licenses. An instance of the RxnIM web application can be accessed at https://huggingface.co/spaces/CYF200127/RxnIM.

Via

Access Paper or Ask Questions

Molecule Generation for Target Protein Binding with Hierarchical Consistency Diffusion Model

Mar 02, 2025

Guanlue Li, Chenran Jiang, Ziqi Gao, Yu Liu, Chenyang Liu, Jiean Chen, Yong Huang, Jia Li

Abstract:Effective generation of molecular structures, or new chemical entities, that bind to target proteins is crucial for lead identification and optimization in drug discovery. Despite advancements in atom- and motif-wise deep learning models for 3D molecular generation, current methods often struggle with validity and reliability. To address these issues, we develop the Atom-Motif Consistency Diffusion Model (AMDiff), utilizing a joint-training paradigm for multi-view learning. This model features a hierarchical diffusion architecture that integrates both atom- and motif-level views of molecules, allowing for comprehensive exploration of complementary information. By leveraging classifier-free guidance and incorporating binding site features as conditional inputs, AMDiff ensures robust molecule generation across diverse targets. Compared to existing approaches, AMDiff exhibits superior validity and novelty in generating molecules tailored to fit various protein pockets. Case studies targeting protein kinases, including Anaplastic Lymphoma Kinase (ALK) and Cyclin-dependent kinase 4 (CDK4), demonstrate the model's capability in structure-based de novo drug design. Overall, AMDiff bridges the gap between atom-view and motif-view drug discovery and speeds up the process of target-aware molecular generation.

* 24 pages, 5 figures, 2 tables

Via

Access Paper or Ask Questions

MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

Oct 17, 2024

Chuanyu Tang, Yilong Chen, Zhenyu Zhang, Junyuan Shang, Wenyuan Zhang, Yong Huang, Tingwen Liu

Figure 1 for MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

Figure 2 for MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

Figure 3 for MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

Figure 4 for MoR: Mixture of Ranks for Low-Rank Adaptation Tuning

Abstract:Low-Rank Adaptation (LoRA) drives research to align its performance with full fine-tuning. However, significant challenges remain: (1) Simply increasing the rank size of LoRA does not effectively capture high-rank information, which leads to a performance bottleneck.(2) MoE-style LoRA methods substantially increase parameters and inference latency, contradicting the goals of efficient fine-tuning and ease of application. To address these challenges, we introduce Mixture of Ranks (MoR), which learns rank-specific information for different tasks based on input and efficiently integrates multi-rank information. We firstly propose a new framework that equates the integration of multiple LoRAs to expanding the rank of LoRA. Moreover, we hypothesize that low-rank LoRA already captures sufficient intrinsic information, and MoR can derive high-rank information through mathematical transformations of the low-rank components. Thus, MoR can reduces the learning difficulty of LoRA and enhances its multi-task capabilities. MoR achieves impressive results, with MoR delivering a 1.31\% performance improvement while using only 93.93\% of the parameters compared to baseline methods.

* 11 pages, 7 figures

Via

Access Paper or Ask Questions

MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition

Mar 08, 2024

Yufan Chen, Ching Ting Leung, Yong Huang, Jianwei Sun, Hao Chen, Hanyu Gao

Figure 1 for MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition

Figure 2 for MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition

Figure 3 for MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition

Figure 4 for MolNexTR: A Generalized Deep Learning Model for Molecular Image Recognition

Abstract:In the field of chemical structure recognition, the task of converting molecular images into graph structures and SMILES string stands as a significant challenge, primarily due to the varied drawing styles and conventions prevalent in chemical literature. To bridge this gap, we proposed MolNexTR, a novel image-to-graph deep learning model that collaborates to fuse the strengths of ConvNext, a powerful Convolutional Neural Network variant, and Vision-TRansformer. This integration facilitates a more nuanced extraction of both local and global features from molecular images. MolNexTR can predict atoms and bonds simultaneously and understand their layout rules. It also excels at flexibly integrating symbolic chemistry principles to discern chirality and decipher abbreviated structures. We further incorporate a series of advanced algorithms, including improved data augmentation module, image contamination module, and a post-processing module to get the final SMILES output. These modules synergistically enhance the model's robustness against the diverse styles of molecular imagery found in real literature. In our test sets, MolNexTR has demonstrated superior performance, achieving an accuracy rate of 81-97%, marking a significant advancement in the domain of molecular structure recognition. Scientific contribution: MolNexTR is a novel image-to-graph model that incorporates a unique dual-stream encoder to extract complex molecular image features, and combines chemical rules to predict atoms and bonds while understanding atom and bond layout rules. In addition, it employs a series of novel augmentation algorithms to significantly enhance the robustness and performance of the model.

* Submitted to the Journal of Cheminformatics

Via

Access Paper or Ask Questions

ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Feb 23, 2024

Lu Ye, Ze Tao, Yong Huang, Yang Li

Abstract:Self-attention is an essential component of large language models(LLMs) but a significant source of inference latency for long sequences. In multi-tenant LLMs serving scenarios, the compute and memory operation cost of self-attention can be optimized by using the probability that multiple LLM requests have shared system prompts in prefixes. In this paper, we introduce ChunkAttention, a prefix-aware self-attention module that can detect matching prompt prefixes across multiple requests and share their key/value tensors in memory at runtime to improve the memory utilization of KV cache. This is achieved by breaking monolithic key/value tensors into smaller chunks and structuring them into the auxiliary prefix tree. Consequently, on top of the prefix-tree based KV cache, we design an efficient self-attention kernel, where a two-phase partition algorithm is implemented to improve the data locality during self-attention computation in the presence of shared system prompts. Experiments show that ChunkAttention can speed up the self-attention kernel by 3.2-4.8$\times$ compared to the start-of-the-art implementation, with the length of the system prompt ranging from 1024 to 4096.

Via

Access Paper or Ask Questions

Optimizing Warfarin Dosing Using Contextual Bandit: An Offline Policy Learning and Evaluation Method

Feb 16, 2024

Yong Huang, Charles A. Downs, Amir M. Rahmani

Abstract:Warfarin, an anticoagulant medication, is formulated to prevent and address conditions associated with abnormal blood clotting, making it one of the most prescribed drugs globally. However, determining the suitable dosage remains challenging due to individual response variations, and prescribing an incorrect dosage may lead to severe consequences. Contextual bandit and reinforcement learning have shown promise in addressing this issue. Given the wide availability of observational data and safety concerns of decision-making in healthcare, we focused on using exclusively observational data from historical policies as demonstrations to derive new policies; we utilized offline policy learning and evaluation in a contextual bandit setting to establish the optimal personalized dosage strategy. Our learned policies surpassed these baseline approaches without genotype inputs, even when given a suboptimal demonstration, showcasing promising application potential.

Via

Access Paper or Ask Questions

Low-Resource Multi-Granularity Academic Function Recognition Based on Multiple Prompt Knowledge

May 05, 2023

Jiawei Liu, Zi Xiong, Yi Jiang, Yongqiang Ma, Wei Lu, Yong Huang, Qikai Cheng

Abstract:Fine-tuning pre-trained language models (PLMs), e.g., SciBERT, generally requires large numbers of annotated data to achieve state-of-the-art performance on a range of NLP tasks in the scientific domain. However, obtaining the fine-tune data for scientific NLP task is still challenging and expensive. Inspired by recent advancement in prompt learning, in this paper, we propose the Mix Prompt Tuning (MPT), which is a semi-supervised method to alleviate the dependence on annotated data and improve the performance of multi-granularity academic function recognition tasks with a small number of labeled examples. Specifically, the proposed method provides multi-perspective representations by combining manual prompt templates with automatically learned continuous prompt templates to help the given academic function recognition task take full advantage of knowledge in PLMs. Based on these prompt templates and the fine-tuned PLM, a large number of pseudo labels are assigned to the unlabeled examples. Finally, we fine-tune the PLM using the pseudo training set. We evaluate our method on three academic function recognition tasks of different granularity including the citation function, the abstract sentence function, and the keyword function, with datasets from computer science domain and biomedical domain. Extensive experiments demonstrate the effectiveness of our method and statistically significant improvements against strong baselines. In particular, it achieves an average increase of 5% in Macro-F1 score compared with fine-tuning, and 6% in Macro-F1 score compared with other semi-supervised method under low-resource settings. In addition, MPT is a general method that can be easily applied to other low-resource scientific classification tasks.

* 22 pages, 5 figures

Via

Access Paper or Ask Questions

Resolution Improvement for OpticalCoherence Tomography based on Sparse Continuous Deconvolution

Apr 10, 2023

Zhengyu Qiao, Yong Huang, Qun Hao

Abstract:We propose an image resolution improvement method for optical coherence tomography (OCT) based on sparse continuous deconvolution. Traditional deconvolution techniques such as Lucy-Richardson deconvolution suffers from the artifact convergence problem after a small number of iterations, which brings limitation to practical applications. In this work, we take advantage of the prior knowledge about the sample sparsity and continuity to constrain the deconvolution iteration. Sparsity is used to achieve the resolution improvement through the resolution preserving regularization term. And the continuity based on the correlation of the grayscale values in different directions is introduced to mitigate excessive image sparsity and noise reduction through the continuity regularization term. The Bregman splitting technique is then used to solve the resulting optimization problem. Both the numerical simulation study and experimental study on phantoms and biological samples show that our method can suppress artefacts of traditional deconvolution techniques effectively. Meanwhile, clear resolution improvement is demonstrated. It achieved nearly twofold resolution improvement for phantom beads image that can be quantitatively evaluated

Via

Access Paper or Ask Questions