Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ziyao Zhang

Fg-T2M++: LLMs-Augmented Fine-Grained Text Driven Human Motion Generation

Feb 08, 2025

Yin Wang, Mu Li, Jiapeng Liu, Zhiying Leng, Frederick W. B. Li, Ziyao Zhang, Xiaohui Liang

Abstract:We address the challenging problem of fine-grained text-driven human motion generation. Existing works generate imprecise motions that fail to accurately capture relationships specified in text due to: (1) lack of effective text parsing for detailed semantic cues regarding body parts, (2) not fully modeling linguistic structures between words to comprehend text comprehensively. To tackle these limitations, we propose a novel fine-grained framework Fg-T2M++ that consists of: (1) an LLMs semantic parsing module to extract body part descriptions and semantics from text, (2) a hyperbolic text representation module to encode relational information between text units by embedding the syntactic dependency graph into hyperbolic space, and (3) a multi-modal fusion module to hierarchically fuse text and motion features. Extensive experiments on HumanML3D and KIT-ML datasets demonstrate that Fg-T2M++ outperforms SOTA methods, validating its ability to accurately generate motions adhering to comprehensive text semantics.

Via

Access Paper or Ask Questions

LLM Hallucinations in Practical Code Generation: Phenomena, Mechanism, and Mitigation

Sep 30, 2024

Ziyao Zhang, Yanlin Wang, Chong Wang, Jiachi Chen, Zibin Zheng

Abstract:Code generation aims to automatically generate code from input requirements, significantly enhancing development efficiency. Recent large language models (LLMs) based approaches have shown promising results and revolutionized code generation task. Despite the promising performance, LLMs often generate contents with hallucinations, especially for the code generation scenario requiring the handling of complex contextual dependencies in practical development process. Although previous study has analyzed hallucinations in LLM-powered code generation, the study is limited to standalone function generation. In this paper, we conduct an empirical study to study the phenomena, mechanism, and mitigation of LLM hallucinations within more practical and complex development contexts in repository-level generation scenario. First, we manually examine the code generation results from six mainstream LLMs to establish a hallucination taxonomy of LLM-generated code. Next, we elaborate on the phenomenon of hallucinations, analyze their distribution across different models. We then analyze causes of hallucinations and identify four potential factors contributing to hallucinations. Finally, we propose an RAG-based mitigation method, which demonstrates consistent effectiveness in all studied LLMs. The replication package including code, data, and experimental results is available at https://github.com/DeepSoftwareAnalytics/LLMCodingHallucination

* 11 pages, 13 figures

Via

Access Paper or Ask Questions

Cross-lingual Knowledge Distillation via Flow-based Voice Conversion for Robust Polyglot Text-To-Speech

Sep 15, 2023

Dariusz Piotrowski, Renard Korzeniowski, Alessio Falai, Sebastian Cygert, Kamil Pokora, Georgi Tinchev, Ziyao Zhang, Kayoko Yanagisawa

Abstract:In this work, we introduce a framework for cross-lingual speech synthesis, which involves an upstream Voice Conversion (VC) model and a downstream Text-To-Speech (TTS) model. The proposed framework consists of 4 stages. In the first two stages, we use a VC model to convert utterances in the target locale to the voice of the target speaker. In the third stage, the converted data is combined with the linguistic features and durations from recordings in the target language, which are then used to train a single-speaker acoustic model. Finally, the last stage entails the training of a locale-independent vocoder. Our evaluations show that the proposed paradigm outperforms state-of-the-art approaches which are based on training a large multilingual TTS model. In addition, our experiments demonstrate the robustness of our approach with different model architectures, languages, speakers and amounts of data. Moreover, our solution is especially beneficial in low-resource settings.

* Accepted at ICONIP 2023

Via

Access Paper or Ask Questions

First Glance Diagnosis: Brain Disease Classification with Single fMRI Volume

Aug 10, 2022

Wei Dai, Ziyao Zhang, Lixia Tian, Shengyuan Yu, Shuhui Wang, Zhao Dong, Hairong Zheng

Figure 1 for First Glance Diagnosis: Brain Disease Classification with Single fMRI Volume

Figure 2 for First Glance Diagnosis: Brain Disease Classification with Single fMRI Volume

Figure 3 for First Glance Diagnosis: Brain Disease Classification with Single fMRI Volume

Figure 4 for First Glance Diagnosis: Brain Disease Classification with Single fMRI Volume

Abstract:In neuroimaging analysis, functional magnetic resonance imaging (fMRI) can well assess brain function changes for brain diseases with no obvious structural lesions. So far, most deep-learning-based fMRI studies take functional connectivity as the basic feature in disease classification. However, functional connectivity is often calculated based on time series of predefined regions of interest and neglects detailed information contained in each voxel, which may accordingly deteriorate the performance of diagnostic models. Another methodological drawback is the limited sample size for the training of deep models. In this study, we propose BrainFormer, a general hybrid Transformer architecture for brain disease classification with single fMRI volume to fully exploit the voxel-wise details with sufficient data dimensions and sizes. BrainFormer is constructed by modeling the local cues within each voxel with 3D convolutions and capturing the global relations among distant regions with two global attention blocks. The local and global cues are aggregated in BrainFormer by a single-stream model. To handle multisite data, we propose a normalization layer to normalize the data into identical distribution. Finally, a Gradient-based Localization-map Visualization method is utilized for locating the possible disease-related biomarker. We evaluate BrainFormer on five independently acquired datasets including ABIDE, ADNI, MPILMBB, ADHD-200 and ECHO, with diseases of autism, Alzheimer's disease, depression, attention deficit hyperactivity disorder, and headache disorders. The results demonstrate the effectiveness and generalizability of BrainFormer for multiple brain diseases diagnosis. BrainFormer may promote neuroimaging-based precision diagnosis in clinical practice and motivate future study in fMRI analysis. Code is available at: https://github.com/ZiyaoZhangforPCL/BrainFormer.

Via

Access Paper or Ask Questions

Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS)

Jul 04, 2022

Ariadna Sanchez, Alessio Falai, Ziyao Zhang, Orazio Angelini, Kayoko Yanagisawa

Figure 1 for Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS)

Figure 2 for Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS)

Figure 3 for Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS)

Figure 4 for Unify and Conquer: How Phonetic Feature Representation Affects Polyglot Text-To-Speech (TTS)

Abstract:An essential design decision for multilingual Neural Text-To-Speech (NTTS) systems is how to represent input linguistic features within the model. Looking at the wide variety of approaches in the literature, two main paradigms emerge, unified and separate representations. The former uses a shared set of phonetic tokens across languages, whereas the latter uses unique phonetic tokens for each language. In this paper, we conduct a comprehensive study comparing multilingual NTTS systems models trained with both representations. Our results reveal that the unified approach consistently achieves better cross-lingual synthesis with respect to both naturalness and accent. Separate representations tend to have an order of magnitude more tokens than unified ones, which may affect model capacity. For this reason, we carry out an ablation study to understand the interaction of the representation type with the size of the token embedding. We find that the difference between the two paradigms only emerges above a certain threshold embedding size. This study provides strong evidence that unified representations should be the preferred paradigm when building multilingual NTTS systems.

Via

Access Paper or Ask Questions

Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech

Jul 04, 2022

Ziyao Zhang, Alessio Falai, Ariadna Sanchez, Orazio Angelini, Kayoko Yanagisawa

Figure 1 for Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech

Figure 2 for Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech

Figure 3 for Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech

Figure 4 for Mix and Match: An Empirical Study on Training Corpus Composition for Polyglot Text-To-Speech

Abstract:Training multilingual Neural Text-To-Speech (NTTS) models using only monolingual corpora has emerged as a popular way for building voice cloning based Polyglot NTTS systems. In order to train these models, it is essential to understand how the composition of the training corpora affects the quality of multilingual speech synthesis. In this context, it is common to hear questions such as "Would including more Spanish data help my Italian synthesis, given the closeness of both languages?". Unfortunately, we found existing literature on the topic lacking in completeness in this regard. In the present work, we conduct an extensive ablation study aimed at understanding how various factors of the training corpora, such as language family affiliation, gender composition, and the number of speakers, contribute to the quality of Polyglot synthesis. Our findings include the observation that female speaker data are preferred in most scenarios, and that it is not always beneficial to have more speakers from the target language variant in the training corpus. The findings herein are informative for the process of data procurement and corpora building.

* Accepted to be published in the Proceedings of InterSpeech 2022

Via

Access Paper or Ask Questions

State Action Separable Reinforcement Learning

Jun 05, 2020

Ziyao Zhang, Liang Ma, Kin K. Leung, Konstantinos Poularakis, Mudhakar Srivatsa

Figure 1 for State Action Separable Reinforcement Learning

Figure 2 for State Action Separable Reinforcement Learning

Figure 3 for State Action Separable Reinforcement Learning

Abstract:Reinforcement Learning (RL) based methods have seen their paramount successes in solving serial decision-making and control problems in recent years. For conventional RL formulations, Markov Decision Process (MDP) and state-action-value function are the basis for the problem modeling and policy evaluation. However, several challenging issues still remain. Among most cited issues, the enormity of state/action space is an important factor that causes inefficiency in accurately approximating the state-action-value function. We observe that although actions directly define the agents' behaviors, for many problems the next state after a state transition matters more than the action taken, in determining the return of such a state transition. In this regard, we propose a new learning paradigm, State Action Separable Reinforcement Learning (sasRL), wherein the action space is decoupled from the value function learning process for higher efficiency. Then, a light-weight transition model is learned to assist the agent to determine the action that triggers the associated state transition. In addition, our convergence analysis reveals that under certain conditions, the convergence time of sasRL is $O(T^{1/k})$, where $T$ is the convergence time for updating the value function in the MDP-based formulation and $k$ is a weighting factor. Experiments on several gaming scenarios show that sasRL outperforms state-of-the-art MDP-based RL algorithms by up to $75\%$.

* 16 pages

Via

Access Paper or Ask Questions

Neural Network Tomography

Jan 09, 2020

Liang Ma, Ziyao Zhang, Mudhakar Srivatsa

Abstract:Network tomography, a classic research problem in the realm of network monitoring, refers to the methodology of inferring unmeasured network attributes using selected end-to-end path measurements. In the research community, network tomography is generally investigated under the assumptions of known network topology, correlated path measurements, bounded number of faulty nodes/links, or even special network protocol support. The applicability of network tomography is considerably constrained by these strong assumptions, which therefore frequently position it in the theoretical world. In this regard, we revisit network tomography from the practical perspective by establishing a generic framework that does not rely on any of these assumptions or the types of performance metrics. Given only the end-to-end path performance metrics of sampled node pairs, the proposed framework, NeuTomography, utilizes deep neural network and data augmentation to predict the unmeasured performance metrics via learning non-linear relationships between node pairs and underlying unknown topological/routing properties. In addition, NeuTomography can be employed to reconstruct the original network topology, which is critical to most network planning tasks. Extensive experiments using real network data show that comparing to baseline solutions, NeuTomography can predict network characteristics and reconstruct network topologies with significantly higher accuracy and robustness using only limited measurement data.

Via

Access Paper or Ask Questions

MACS: Deep Reinforcement Learning based SDN Controller Synchronization Policy Design

Sep 19, 2019

Ziyao Zhang, Liang Ma, Konstantinos Poularakis, Kin K. Leung, Jeremy Tucker, Ananthram Swami

Figure 1 for MACS: Deep Reinforcement Learning based SDN Controller Synchronization Policy Design

Figure 2 for MACS: Deep Reinforcement Learning based SDN Controller Synchronization Policy Design

Figure 3 for MACS: Deep Reinforcement Learning based SDN Controller Synchronization Policy Design

Figure 4 for MACS: Deep Reinforcement Learning based SDN Controller Synchronization Policy Design

Abstract:In distributed software-defined networks (SDN), multiple physical SDN controllers, each managing a network domain, are implemented to balance centralised control, scalability, and reliability requirements. In such networking paradigms, controllers synchronize with each other, in attempts to maintain a logically centralised network view. Despite the presence of various design proposals for distributed SDN controller architectures, most existing works only aim at eliminating anomalies arising from the inconsistencies in different controllers' network views. However, the performance aspect of controller synchronization designs with respect to given SDN applications are generally missing. To fill this gap, we formulate the controller synchronization problem as a Markov decision process (MDP) and apply reinforcement learning techniques combined with deep neural networks (DNNs) to train a smart, scalable, and fine-grained controller synchronization policy, called the Multi-Armed Cooperative Synchronization (MACS), whose goal is to maximise the performance enhancements brought by controller synchronizations. Evaluation results confirm the DNN's exceptional ability in abstracting latent patterns in the distributed SDN environment, rendering significant superiority to MACS-based synchronization policy, which are 56% and 30% performance improvements over ONOS and greedy SDN controller synchronization heuristics.

* Published as a conference paper at ICNP 2019

Via

Access Paper or Ask Questions