Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuan Liang

SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement

Apr 04, 2025

Runnan Fang, Xiaobin Wang, Yuan Liang, Shuofei Qiao, Jialong Wu, Zekun Xi, Ningyu Zhang, Yong Jiang, Pengjun Xie, Fei Huang(+1 more)

Abstract:In the interaction between agents and their environments, agents expand their capabilities by planning and executing actions. However, LLM-based agents face substantial challenges when deployed in novel environments or required to navigate unconventional action spaces. To empower agents to autonomously explore environments, optimize workflows, and enhance their understanding of actions, we propose SynWorld, a framework that allows agents to synthesize possible scenarios with multi-step action invocation within the action space and perform Monte Carlo Tree Search (MCTS) exploration to effectively refine their action knowledge in the current environment. Our experiments demonstrate that SynWorld is an effective and general approach to learning action knowledge in new environments. Code is available at https://github.com/zjunlp/SynWorld.

* Work in progress

Via

Access Paper or Ask Questions

TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models

Feb 10, 2025

Yangguang Li, Zi-Xin Zou, Zexiang Liu, Dehu Wang, Yuan Liang, Zhipeng Yu, Xingchao Liu, Yuan-Chen Guo, Ding Liang, Wanli Ouyang(+1 more)

Abstract:Recent advancements in diffusion techniques have propelled image and video generation to unprece- dented levels of quality, significantly accelerating the deployment and application of generative AI. However, 3D shape generation technology has so far lagged behind, constrained by limitations in 3D data scale, complexity of 3D data process- ing, and insufficient exploration of advanced tech- niques in the 3D domain. Current approaches to 3D shape generation face substantial challenges in terms of output quality, generalization capa- bility, and alignment with input conditions. We present TripoSG, a new streamlined shape diffu- sion paradigm capable of generating high-fidelity 3D meshes with precise correspondence to input images. Specifically, we propose: 1) A large-scale rectified flow transformer for 3D shape generation, achieving state-of-the-art fidelity through training on extensive, high-quality data. 2) A hybrid supervised training strategy combining SDF, normal, and eikonal losses for 3D VAE, achieving high- quality 3D reconstruction performance. 3) A data processing pipeline to generate 2 million high- quality 3D samples, highlighting the crucial rules for data quality and quantity in training 3D gen- erative models. Through comprehensive experi- ments, we have validated the effectiveness of each component in our new framework. The seamless integration of these parts has enabled TripoSG to achieve state-of-the-art performance in 3D shape generation. The resulting 3D shapes exhibit en- hanced detail due to high-resolution capabilities and demonstrate exceptional fidelity to input im- ages. Moreover, TripoSG demonstrates improved versatility in generating 3D models from diverse image styles and contents, showcasing strong gen- eralization capabilities. To foster progress and innovation in the field of 3D generation, we will make our model publicly available.

Via

Access Paper or Ask Questions

ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images

Jul 14, 2022

Jiawei Yang, Hanbo Chen, Yuan Liang, Junzhou Huang, Lei He, Jianhua Yao

Figure 1 for ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images

Figure 2 for ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images

Figure 3 for ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images

Figure 4 for ConCL: Concept Contrastive Learning for Dense Prediction Pre-training in Pathology Images

Abstract:Detectingandsegmentingobjectswithinwholeslideimagesis essential in computational pathology workflow. Self-supervised learning (SSL) is appealing to such annotation-heavy tasks. Despite the extensive benchmarks in natural images for dense tasks, such studies are, unfortunately, absent in current works for pathology. Our paper intends to narrow this gap. We first benchmark representative SSL methods for dense prediction tasks in pathology images. Then, we propose concept contrastive learning (ConCL), an SSL framework for dense pre-training. We explore how ConCL performs with concepts provided by different sources and end up with proposing a simple dependency-free concept generating method that does not rely on external segmentation algorithms or saliency detection models. Extensive experiments demonstrate the superiority of ConCL over previous state-of-the-art SSL methods across different settings. Along our exploration, we distll several important and intriguing components contributing to the success of dense pre-training for pathology images. We hope this work could provide useful data points and encourage the community to conduct ConCL pre-training for problems of interest. Code is available.

* Accepted as an ECCV 2022 paper. Code is available at https://github.com/Jiawei-Yang/ConCL or https://github.com/TencentAILabHealthcare/ConCL

Via

Access Paper or Ask Questions

RAAT: Relation-Augmented Attention Transformer for Relation Modeling in Document-Level Event Extraction

Jun 07, 2022

Yuan Liang, Zhuoxuan Jiang, Di Yin, Bo Ren

Figure 1 for RAAT: Relation-Augmented Attention Transformer for Relation Modeling in Document-Level Event Extraction

Figure 2 for RAAT: Relation-Augmented Attention Transformer for Relation Modeling in Document-Level Event Extraction

Figure 3 for RAAT: Relation-Augmented Attention Transformer for Relation Modeling in Document-Level Event Extraction

Figure 4 for RAAT: Relation-Augmented Attention Transformer for Relation Modeling in Document-Level Event Extraction

Abstract:In document-level event extraction (DEE) task, event arguments always scatter across sentences (across-sentence issue) and multiple events may lie in one document (multi-event issue). In this paper, we argue that the relation information of event arguments is of great significance for addressing the above two issues, and propose a new DEE framework which can model the relation dependencies, called Relation-augmented Document-level Event Extraction (ReDEE). More specifically, this framework features a novel and tailored transformer, named as Relation-augmented Attention Transformer (RAAT). RAAT is scalable to capture multi-scale and multi-amount argument relations. To further leverage relation information, we introduce a separate event relation prediction task and adopt multi-task learning method to explicitly enhance event extraction performance. Extensive experiments demonstrate the effectiveness of the proposed method, which can achieve state-of-the-art performance on two public datasets. Our code is available at https://github. com/TencentYoutuResearch/RAAT.

* Accepted by NAACL 2022

Via

Access Paper or Ask Questions

LW-GCN: A Lightweight FPGA-based Graph Convolutional Network Accelerator

Nov 04, 2021

Zhuofu Tao, Chen Wu, Yuan Liang, Lei He

Figure 1 for LW-GCN: A Lightweight FPGA-based Graph Convolutional Network Accelerator

Figure 2 for LW-GCN: A Lightweight FPGA-based Graph Convolutional Network Accelerator

Figure 3 for LW-GCN: A Lightweight FPGA-based Graph Convolutional Network Accelerator

Figure 4 for LW-GCN: A Lightweight FPGA-based Graph Convolutional Network Accelerator

Abstract:Graph convolutional networks (GCNs) have been introduced to effectively process non-euclidean graph data. However, GCNs incur large amounts of irregularity in computation and memory access, which prevents efficient use of traditional neural network accelerators. Moreover, existing dedicated GCN accelerators demand high memory volumes and are difficult to implement onto resource limited edge devices. In this work, we propose LW-GCN, a lightweight FPGA-based accelerator with a software-hardware co-designed process to tackle irregularity in computation and memory access in GCN inference. LW-GCN decomposes the main GCN operations into sparse-dense matrix multiplication (SDMM) and dense matrix multiplication (DMM). We propose a novel compression format to balance workload across PEs and prevent data hazards. Moreover, we apply data quantization and workload tiling, and map both SDMM and DMM of GCN inference onto a uniform architecture on resource limited hardware. Evaluation on GCN and GraphSAGE are performed on Xilinx Kintex-7 FPGA with three popular datasets. Compared to existing CPU, GPU, and state-of-the-art FPGA-based accelerator, LW-GCN reduces latency by up to 60x, 12x and 1.7x and increases power efficiency by up to 912x., 511x and 3.87x, respectively. Furthermore, compared with NVIDIA's latest edge GPU Jetson Xavier NX, LW-GCN achieves speedup and energy savings of 32x and 84x, respectively.

* 17 pages, 9 figures

Via

Access Paper or Ask Questions

X2Teeth: 3D Teeth Reconstruction from a Single Panoramic Radiograph

Aug 30, 2021

Yuan Liang, Weinan Song, Jiawei Yang, Liang Qiu, Kun Wang, Lei He

Figure 1 for X2Teeth: 3D Teeth Reconstruction from a Single Panoramic Radiograph

Figure 2 for X2Teeth: 3D Teeth Reconstruction from a Single Panoramic Radiograph

Figure 3 for X2Teeth: 3D Teeth Reconstruction from a Single Panoramic Radiograph

Figure 4 for X2Teeth: 3D Teeth Reconstruction from a Single Panoramic Radiograph

Abstract:3D teeth reconstruction from X-ray is important for dental diagnosis and many clinical operations. However, no existing work has explored the reconstruction of teeth for a whole cavity from a single panoramic radiograph. Different from single object reconstruction from photos, this task has the unique challenge of constructing multiple objects at high resolutions. To conquer this task, we develop a novel ConvNet X2Teeth that decomposes the task into teeth localization and single-shape estimation. We also introduce a patch-based training strategy, such that X2Teeth can be end-to-end trained for optimal performance. Extensive experiments show that our method can successfully estimate the 3D structure of the cavity and reflect the details for each tooth. Moreover, X2Teeth achieves a reconstruction IoU of 0.681, which significantly outperforms the encoder-decoder method by $1.71X and the retrieval-based method by $1.52X. Our method can also be promising for other multi-anatomy 3D reconstruction tasks.

Via

Access Paper or Ask Questions

TumorCP: A Simple but Effective Object-Level Data Augmentation for Tumor Segmentation

Jul 21, 2021

Jiawei Yang, Yao Zhang, Yuan Liang, Yang Zhang, Lei He, Zhiqiang He

Figure 1 for TumorCP: A Simple but Effective Object-Level Data Augmentation for Tumor Segmentation

Figure 2 for TumorCP: A Simple but Effective Object-Level Data Augmentation for Tumor Segmentation

Figure 3 for TumorCP: A Simple but Effective Object-Level Data Augmentation for Tumor Segmentation

Figure 4 for TumorCP: A Simple but Effective Object-Level Data Augmentation for Tumor Segmentation

Abstract:Deep learning models are notoriously data-hungry. Thus, there is an urging need for data-efficient techniques in medical image analysis, where well-annotated data are costly and time consuming to collect. Motivated by the recently revived "Copy-Paste" augmentation, we propose TumorCP, a simple but effective object-level data augmentation method tailored for tumor segmentation. TumorCP is online and stochastic, providing unlimited augmentation possibilities for tumors' subjects, locations, appearances, as well as morphologies. Experiments on kidney tumor segmentation task demonstrate that TumorCP surpasses the strong baseline by a remarkable margin of 7.12% on tumor Dice. Moreover, together with image-level data augmentation, it beats the current state-of-the-art by 2.32% on tumor Dice. Comprehensive ablation studies are performed to validate the effectiveness of TumorCP. Meanwhile, we show that TumorCP can lead to striking improvements in extremely low-data regimes. Evaluated with only 10% labeled data, TumorCP significantly boosts tumor Dice by 21.87%. To the best of our knowledge, this is the very first work exploring and extending the "Copy-Paste" design in medical imaging domain. Code is available at: https://github.com/YaoZhang93/TumorCP.

Via

Access Paper or Ask Questions

SocAoG: Incremental Graph Parsing for Social Relation Inference in Dialogues

Jun 24, 2021

Liang Qiu, Yuan Liang, Yizhou Zhao, Pan Lu, Baolin Peng, Zhou Yu, Ying Nian Wu, Song-Chun Zhu

Figure 1 for SocAoG: Incremental Graph Parsing for Social Relation Inference in Dialogues

Figure 2 for SocAoG: Incremental Graph Parsing for Social Relation Inference in Dialogues

Figure 3 for SocAoG: Incremental Graph Parsing for Social Relation Inference in Dialogues

Figure 4 for SocAoG: Incremental Graph Parsing for Social Relation Inference in Dialogues

Abstract:Inferring social relations from dialogues is vital for building emotionally intelligent robots to interpret human language better and act accordingly. We model the social network as an And-or Graph, named SocAoG, for the consistency of relations among a group and leveraging attributes as inference cues. Moreover, we formulate a sequential structure prediction task, and propose an $\alpha$-$\beta$-$\gamma$ strategy to incrementally parse SocAoG for the dynamic inference upon any incoming utterance: (i) an $\alpha$ process predicting attributes and relations conditioned on the semantics of dialogues, (ii) a $\beta$ process updating the social relations based on related attributes, and (iii) a $\gamma$ process updating individual's attributes based on interpersonal social relations. Empirical results on DialogRE and MovieGraph show that our model infers social relations more accurately than the state-of-the-art methods. Moreover, the ablation study shows the three processes complement each other, and the case study demonstrates the dynamic relational inference.

* Long paper (oral) accepted by ACL-IJCNLP 2021

Via

Access Paper or Ask Questions

Towards Socially Intelligent Agents with Mental State Transition and Human Utility

Mar 12, 2021

Liang Qiu, Yizhou Zhao, Yuan Liang, Pan Lu, Weiyan Shi, Zhou Yu, Song-Chun Zhu

Figure 1 for Towards Socially Intelligent Agents with Mental State Transition and Human Utility

Figure 2 for Towards Socially Intelligent Agents with Mental State Transition and Human Utility

Figure 3 for Towards Socially Intelligent Agents with Mental State Transition and Human Utility

Figure 4 for Towards Socially Intelligent Agents with Mental State Transition and Human Utility

Abstract:Building a socially intelligent agent involves many challenges, one of which is to track the agent's mental state transition and teach the agent to make rational decisions guided by its utility like a human. Towards this end, we propose to incorporate a mental state parser and utility model into dialogue agents. The hybrid mental state parser extracts information from both the dialogue and event observations and maintains a graphical representation of the agent's mind; Meanwhile, the utility model is a ranking model that learns human preferences from a crowd-sourced social commonsense dataset, Social IQA. Empirical results show that the proposed model attains state-of-the-art performance on the dialogue/action/emotion prediction task in the fantasy text-adventure game dataset, LIGHT. We also show example cases to demonstrate: (\textit{i}) how the proposed mental state parser can assist agent's decision by grounding on the context like locations and objects, and (\textit{ii}) how the utility model can help the agent make reasonable decisions in a dilemma. To the best of our knowledge, we are the first work that builds a socially intelligent agent by incorporating a hybrid mental state parser for both discrete events and continuous dialogues parsing and human-like utility modeling.

Via

Access Paper or Ask Questions

Atlas-aware ConvNetfor Accurate yet Robust Anatomical Segmentation

Feb 02, 2021

Yuan Liang, Weinan Song, Jiawei Yang, Liang Qiu, Kun Wang, Lei He

Figure 1 for Atlas-aware ConvNetfor Accurate yet Robust Anatomical Segmentation

Figure 2 for Atlas-aware ConvNetfor Accurate yet Robust Anatomical Segmentation

Figure 3 for Atlas-aware ConvNetfor Accurate yet Robust Anatomical Segmentation

Figure 4 for Atlas-aware ConvNetfor Accurate yet Robust Anatomical Segmentation

Abstract:Convolutional networks (ConvNets) have achieved promising accuracy for various anatomical segmentation tasks. Despite the success, these methods can be sensitive to data appearance variations. Considering the large variability of scans caused by artifacts, pathologies, and scanning setups, robust ConvNets are vital for clinical applications, while have not been fully explored. In this paper, we propose to mitigate the challenge by enabling ConvNets' awareness of the underlying anatomical invariances among imaging scans. Specifically, we introduce a fully convolutional Constraint Adoption Module (CAM) that incorporates probabilistic atlas priors as explicit constraints for predictions over a locally connected Conditional Random Field (CFR), which effectively reinforces the anatomical consistency of the labeling outputs. We design the CAM to be flexible for boosting various ConvNet, and compact for co-optimizing with ConvNets for fusion parameters that leads to the optimal performance. We show the advantage of such atlas priors fusion is two-fold with two brain parcellation tasks. First, our models achieve state-of-the-art accuracy among ConvNet-based methods on both datasets, by significantly reducing structural abnormalities of predictions. Second, we can largely boost the robustness of existing ConvNets, proved by: (i) testing on scans with synthetic pathologies, and (ii) training and evaluation on scans of different scanning setups across datasets. Our method is proposing to be easily adopted to existing ConvNets by fine-tuning with CAM plugged in for accuracy and robustness boosts.

Via

Access Paper or Ask Questions