Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Songtao Liu

Refer to the report for detailed contributions

HunyuanVideo: A Systematic Framework For Large Video Generative Models

Dec 03, 2024

Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang(+42 more)

Figure 1 for HunyuanVideo: A Systematic Framework For Large Video Generative Models

Figure 2 for HunyuanVideo: A Systematic Framework For Large Video Generative Models

Figure 3 for HunyuanVideo: A Systematic Framework For Large Video Generative Models

Figure 4 for HunyuanVideo: A Systematic Framework For Large Video Generative Models

Abstract:Recent advancements in video generation have significantly impacted daily life for both individuals and industries. However, the leading video generation models remain closed-source, resulting in a notable performance gap between industry capabilities and those available to the public. In this report, we introduce HunyuanVideo, an innovative open-source video foundation model that demonstrates performance in video generation comparable to, or even surpassing, that of leading closed-source models. HunyuanVideo encompasses a comprehensive framework that integrates several key elements, including data curation, advanced architectural design, progressive model scaling and training, and an efficient infrastructure tailored for large-scale model training and inference. As a result, we successfully trained a video generative model with over 13 billion parameters, making it the largest among all open-source models. We conducted extensive experiments and implemented a series of targeted designs to ensure high visual quality, motion dynamics, text-video alignment, and advanced filming techniques. According to evaluations by professionals, HunyuanVideo outperforms previous state-of-the-art models, including Runway Gen-3, Luma 1.6, and three top-performing Chinese video generative models. By releasing the code for the foundation model and its applications, we aim to bridge the gap between closed-source and open-source communities. This initiative will empower individuals within the community to experiment with their ideas, fostering a more dynamic and vibrant video generation ecosystem. The code is publicly available at https://github.com/Tencent/HunyuanVideo.

Via

Access Paper or Ask Questions

SDDBench: A Benchmark for Synthesizable Drug Design

Nov 13, 2024

Songtao Liu, Zhengkai Tu, Hanjun Dai, Peng Liu

Figure 1 for SDDBench: A Benchmark for Synthesizable Drug Design

Figure 2 for SDDBench: A Benchmark for Synthesizable Drug Design

Figure 3 for SDDBench: A Benchmark for Synthesizable Drug Design

Figure 4 for SDDBench: A Benchmark for Synthesizable Drug Design

Abstract:A significant challenge in wet lab experiments with current drug design generative models is the trade-off between pharmacological properties and synthesizability. Molecules predicted to have highly desirable properties are often difficult to synthesize, while those that are easily synthesizable tend to exhibit less favorable properties. As a result, evaluating the synthesizability of molecules in general drug design scenarios remains a significant challenge in the field of drug discovery. The commonly used synthetic accessibility (SA) score aims to evaluate the ease of synthesizing generated molecules, but it falls short of guaranteeing that synthetic routes can actually be found. Inspired by recent advances in top-down synthetic route generation, we propose a new, data-driven metric to evaluate molecule synthesizability. Our approach directly assesses the feasibility of synthetic routes for a given molecule through our proposed round-trip score. This novel metric leverages the synergistic duality between retrosynthetic planners and reaction predictors, both of which are trained on extensive reaction datasets. To demonstrate the efficacy of our method, we conduct a comprehensive evaluation of round-trip scores alongside search success rate across a range of representative molecule generative models. Code is available at https://github.com/SongtaoLiu0823/SDDBench.

Via

Access Paper or Ask Questions

Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models

Jun 04, 2024

Songtao Liu, Hanjun Dai, Yue Zhao, Peng Liu

Figure 1 for Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models

Figure 2 for Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models

Figure 3 for Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models

Figure 4 for Preference Optimization for Molecule Synthesis with Conditional Residual Energy-based Models

Abstract:Molecule synthesis through machine learning is one of the fundamental problems in drug discovery. Current data-driven strategies employ one-step retrosynthesis models and search algorithms to predict synthetic routes in a top-bottom manner. Despite their effective performance, these strategies face limitations in the molecule synthetic route generation due to a greedy selection of the next molecule set without any lookahead. Furthermore, existing strategies cannot control the generation of synthetic routes based on possible criteria such as material costs, yields, and step count. In this work, we propose a general and principled framework via conditional residual energy-based models (EBMs), that focus on the quality of the entire synthetic route based on the specific criteria. By incorporating an additional energy-based function into our probabilistic model, our proposed algorithm can enhance the quality of the most probable synthetic routes (with higher probabilities) generated by various strategies in a plug-and-play fashion. Extensive experiments demonstrate that our framework can consistently boost performance across various strategies and outperforms previous state-of-the-art top-1 accuracy by a margin of 2.5%. Code is available at https://github.com/SongtaoLiu0823/CREBM.

* Accepted by ICML 2024(Oral)

Via

Access Paper or Ask Questions

Graph Adversarial Diffusion Convolution

Jun 04, 2024

Songtao Liu, Jinghui Chen, Tianfan Fu, Lu Lin, Marinka Zitnik, Dinghao Wu

Abstract:This paper introduces a min-max optimization formulation for the Graph Signal Denoising (GSD) problem. In this formulation, we first maximize the second term of GSD by introducing perturbations to the graph structure based on Laplacian distance and then minimize the overall loss of the GSD. By solving the min-max optimization problem, we derive a new variant of the Graph Diffusion Convolution (GDC) architecture, called Graph Adversarial Diffusion Convolution (GADC). GADC differs from GDC by incorporating an additional term that enhances robustness against adversarial attacks on the graph structure and noise in node features. Moreover, GADC improves the performance of GDC on heterophilic graphs. Extensive experiments demonstrate the effectiveness of GADC across various datasets. Code is available at https://github.com/SongtaoLiu0823/GADC.

* Accepted by ICML 2024

Via

Access Paper or Ask Questions

Encoding Hierarchical Schema via Concept Flow for Multifaceted Ideology Detection

May 29, 2024

Songtao Liu, Bang Wang, Wei Xiang, Han Xu, Minghua Xu

Abstract:Multifaceted ideology detection (MID) aims to detect the ideological leanings of texts towards multiple facets. Previous studies on ideology detection mainly focus on one generic facet and ignore label semantics and explanatory descriptions of ideologies, which are a kind of instructive information and reveal the specific concepts of ideologies. In this paper, we develop a novel concept semantics-enhanced framework for the MID task. Specifically, we propose a bidirectional iterative concept flow (BICo) method to encode multifaceted ideologies. BICo enables the concepts to flow across levels of the schema tree and enriches concept representations with multi-granularity semantics. Furthermore, we explore concept attentive matching and concept-guided contrastive learning strategies to guide the model to capture ideology features with the learned concept semantics. Extensive experiments on the benchmark dataset show that our approach achieves state-of-the-art performance in MID, including in the cross-topic scenario.

* 13pages, 4 figures (Accepted to Findings of ACL 2024)

Via

Access Paper or Ask Questions

Align-DETR: Improving DETR with Simple IoU-aware BCE loss

Apr 15, 2023

Zhi Cai, Songtao Liu, Guodong Wang, Zheng Ge, Xiangyu Zhang, Di Huang

Figure 1 for Align-DETR: Improving DETR with Simple IoU-aware BCE loss

Figure 2 for Align-DETR: Improving DETR with Simple IoU-aware BCE loss

Figure 3 for Align-DETR: Improving DETR with Simple IoU-aware BCE loss

Figure 4 for Align-DETR: Improving DETR with Simple IoU-aware BCE loss

Abstract:DETR has set up a simple end-to-end pipeline for object detection by formulating this task as a set prediction problem, showing promising potential. However, despite the significant progress in improving DETR, this paper identifies a problem of misalignment in the output distribution, which prevents the best-regressed samples from being assigned with high confidence, hindering the model's accuracy. We propose a metric, recall of best-regressed samples, to quantitively evaluate the misalignment problem. Observing its importance, we propose a novel Align-DETR that incorporates a localization precision-aware classification loss in optimization. The proposed loss, IA-BCE, guides the training of DETR to build a strong correlation between classification score and localization precision. We also adopt the mixed-matching strategy, to facilitate DETR-based detectors with faster training convergence while keeping an end-to-end scheme. Moreover, to overcome the dramatic decrease in sample quality induced by the sparsity of queries, we introduce a prime sample weighting mechanism to suppress the interference of unimportant samples. Extensive experiments are conducted with very competitive results reported. In particular, it delivers a 46 (+3.8)% AP on the DAB-DETR baseline with the ResNet-50 backbone and reaches a new SOTA performance of 50.2% AP in the 1x setting on the COCO validation set when employing the strong baseline DINO. Our code is available at https://github.com/FelixCaae/AlignDETR.

Via

Access Paper or Ask Questions

Dynamic Grained Encoder for Vision Transformers

Jan 10, 2023

Lin Song, Songyang Zhang, Songtao Liu, Zeming Li, Xuming He, Hongbin Sun, Jian Sun, Nanning Zheng

Abstract:Transformers, the de-facto standard for language modeling, have been recently applied for vision tasks. This paper introduces sparse queries for vision transformers to exploit the intrinsic spatial redundancy of natural images and save computational costs. Specifically, we propose a Dynamic Grained Encoder for vision transformers, which can adaptively assign a suitable number of queries to each spatial region. Thus it achieves a fine-grained representation in discriminative regions while keeping high efficiency. Besides, the dynamic grained encoder is compatible with most vision transformer frameworks. Without bells and whistles, our encoder allows the state-of-the-art vision transformers to reduce computational complexity by 40%-60% while maintaining comparable performance on image classification. Extensive experiments on object detection and segmentation further demonstrate the generalizability of our approach. Code is available at https://github.com/StevenGrove/vtpack.

* Accepted by NeurIPS2021

Via

Access Paper or Ask Questions

Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation

Dec 03, 2022

En Yu, Songtao Liu, Zhuoling Li, Jinrong Yang, Zeming li, Shoudong Han, Wenbing Tao

Figure 1 for Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation

Figure 2 for Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation

Figure 3 for Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation

Figure 4 for Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation

Abstract:Although existing multi-object tracking (MOT) algorithms have obtained competitive performance on various benchmarks, almost all of them train and validate models on the same domain. The domain generalization problem of MOT is hardly studied. To bridge this gap, we first draw the observation that the high-level information contained in natural language is domain invariant to different tracking domains. Based on this observation, we propose to introduce natural language representation into visual MOT models for boosting the domain generalization ability. However, it is infeasible to label every tracking target with a textual description. To tackle this problem, we design two modules, namely visual context prompting (VCP) and visual-language mixing (VLM). Specifically, VCP generates visual prompts based on the input frames. VLM joints the information in the generated visual prompts and the textual prompts from a pre-defined Trackbook to obtain instance-level pseudo textual description, which is domain invariant to different tracking scenes. Through training models on MOT17 and validating them on MOT20, we observe that the pseudo textual descriptions generated by our proposed modules improve the generalization performance of query-based trackers by large margins.

* Accepted by AAAI2023

Via

Access Paper or Ask Questions

Metro: Memory-Enhanced Transformer for Retrosynthetic Planning via Reaction Tree

Sep 30, 2022

Songtao Liu, Rex Ying, Zuobai Zhang, Peilin Zhao, Jian Tang, Lu Lin, Dinghao Wu

Figure 1 for Metro: Memory-Enhanced Transformer for Retrosynthetic Planning via Reaction Tree

Figure 2 for Metro: Memory-Enhanced Transformer for Retrosynthetic Planning via Reaction Tree

Figure 3 for Metro: Memory-Enhanced Transformer for Retrosynthetic Planning via Reaction Tree

Figure 4 for Metro: Memory-Enhanced Transformer for Retrosynthetic Planning via Reaction Tree

Abstract:Retrosynthetic planning plays a critical role in drug discovery and organic chemistry. Starting from a target molecule as the root node, it aims to find a complete reaction tree subject to the constraint that all leaf nodes belong to a set of starting materials. The multi-step reactions are crucial because they determine the flow chart in the production of the Organic Chemical Industry. However, existing datasets lack curation of tree-structured multi-step reactions, and fail to provide such reaction trees, limiting models' understanding of organic molecule transformations. In this work, we first develop a benchmark curated for the retrosynthetic planning task, which consists of 124,869 reaction trees retrieved from the public USPTO-full dataset. On top of that, we propose Metro: Memory-Enhanced Transformer for RetrOsynthetic planning. Specifically, the dependency among molecules in the reaction tree is captured as context information for multi-step retrosynthesis predictions through transformers with a memory module. Extensive experiments show that Metro dramatically outperforms existing single-step retrosynthesis models by at least 10.7% in top-1 accuracy. The experiments demonstrate the superiority of exploiting context information in the retrosynthetic planning task. Moreover, the proposed model can be directly used for synthetic accessibility analysis, as it is trained on reaction trees with the shortest depths. Our work is the first step towards a brand new formulation for retrosynthetic planning in the aspects of data construction, model design, and evaluation. Code is available at https://github.com/SongtaoLiu0823/metro.

Via

Access Paper or Ask Questions

How Powerful is Implicit Denoising in Graph Neural Networks

Sep 29, 2022

Songtao Liu, Rex Ying, Hanze Dong, Lu Lin, Jinghui Chen, Dinghao Wu

Figure 1 for How Powerful is Implicit Denoising in Graph Neural Networks

Figure 2 for How Powerful is Implicit Denoising in Graph Neural Networks

Figure 3 for How Powerful is Implicit Denoising in Graph Neural Networks

Figure 4 for How Powerful is Implicit Denoising in Graph Neural Networks

Abstract:Graph Neural Networks (GNNs), which aggregate features from neighbors, are widely used for graph-structured data processing due to their powerful representation learning capabilities. It is generally believed that GNNs can implicitly remove the non-predictive noises. However, the analysis of implicit denoising effect in graph neural networks remains open. In this work, we conduct a comprehensive theoretical study and analyze when and why the implicit denoising happens in GNNs. Specifically, we study the convergence properties of noise matrix. Our theoretical analysis suggests that the implicit denoising largely depends on the connectivity, the graph size, and GNN architectures. Moreover, we formally define and propose the adversarial graph signal denoising (AGSD) problem by extending graph signal denoising problem. By solving such a problem, we derive a robust graph convolution, where the smoothness of the node representations and the implicit denoising effect can be enhanced. Extensive empirical evaluations verify our theoretical analyses and the effectiveness of our proposed model.

Via

Access Paper or Ask Questions