Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fang Jin

SmoothSinger: A Conditional Diffusion Model for Singing Voice Synthesis with Multi-Resolution Architecture

Jun 26, 2025

Kehan Sui, Jinxu Xiang, Fang Jin

Abstract:Singing voice synthesis (SVS) aims to generate expressive and high-quality vocals from musical scores, requiring precise modeling of pitch, duration, and articulation. While diffusion-based models have achieved remarkable success in image and video generation, their application to SVS remains challenging due to the complex acoustic and musical characteristics of singing, often resulting in artifacts that degrade naturalness. In this work, we propose SmoothSinger, a conditional diffusion model designed to synthesize high quality and natural singing voices. Unlike prior methods that depend on vocoders as a final stage and often introduce distortion, SmoothSinger refines low-quality synthesized audio directly in a unified framework, mitigating the degradation associated with two-stage pipelines. The model adopts a reference-guided dual-branch architecture, using low-quality audio from any baseline system as a reference to guide the denoising process, enabling more expressive and context-aware synthesis. Furthermore, it enhances the conventional U-Net with a parallel low-frequency upsampling path, allowing the model to better capture pitch contours and long term spectral dependencies. To improve alignment during training, we replace reference audio with degraded ground truth audio, addressing temporal mismatch between reference and target signals. Experiments on the Opencpop dataset, a large-scale Chinese singing corpus, demonstrate that SmoothSinger achieves state-of-the-art results in both objective and subjective evaluations. Extensive ablation studies confirm its effectiveness in reducing artifacts and improving the naturalness of synthesized voices.

Via

Access Paper or Ask Questions

RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis

Oct 29, 2024

Kehan Sui, Jinxu Xiang, Fang Jin

Figure 1 for RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis

Figure 2 for RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis

Figure 3 for RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis

Figure 4 for RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis

Abstract:Singing voice synthesis (SVS) aims to produce high-fidelity singing audio from music scores, requiring a detailed understanding of notes, pitch, and duration, unlike text-to-speech tasks. Although diffusion models have shown exceptional performance in various generative tasks like image and video creation, their application in SVS is hindered by time complexity and the challenge of capturing acoustic features, particularly during pitch transitions. Some networks learn from the prior distribution and use the compressed latent state as a better start in the diffusion model, but the denoising step doesn't consistently improve quality over the entire duration. We introduce RDSinger, a reference-based denoising diffusion network that generates high-quality audio for SVS tasks. Our approach is inspired by Animate Anyone, a diffusion image network that maintains intricate appearance features from reference images. RDSinger utilizes FastSpeech2 mel-spectrogram as a reference to mitigate denoising step artifacts. Additionally, existing models could be influenced by misleading information on the compressed latent state during pitch transitions. We address this issue by applying Gaussian blur on partial reference mel-spectrogram and adjusting loss weights in these regions. Extensive ablation studies demonstrate the efficiency of our method. Evaluations on OpenCpop, a Chinese singing dataset, show that RDSinger outperforms current state-of-the-art SVS methods in performance.

Via

Access Paper or Ask Questions

Liver Fat Quantification Network with Body Shape

May 18, 2024

Qiyue Wang, Wu Xue, Xiaoke Zhang, Fang Jin, James Hahn

Figure 1 for Liver Fat Quantification Network with Body Shape

Figure 2 for Liver Fat Quantification Network with Body Shape

Figure 3 for Liver Fat Quantification Network with Body Shape

Figure 4 for Liver Fat Quantification Network with Body Shape

Abstract:It is clinically important to detect liver fat content as it is related to cardiac complications and cardiovascular disease mortality. However, existing methods are associated with high cost and/or medical complications (e.g., liver biopsy, medical imaging technology) or only roughly estimate the grades of steatosis. In this paper, we propose a deep neural network to accurately estimate liver fat percentage using only body shapes. The proposed framework is composed of a flexible baseline regression network and a lightweight attention module. The attention module is trained to generate discriminative and diverse features, thus significantly improving performance. To validate our proposed method, we perform extensive tests on medical datasets. The experimental results validate our method and prove the efficacy of designing neural networks to predict liver fat using only body shape. Since body shapes can be acquired using inexpensive and readily available optical scanners, the proposed method promised to make accurate assessment of hepatic steatosis more accessible.

Via

Access Paper or Ask Questions

Advantage Actor-Critic with Reasoner: Explaining the Agent's Behavior from an Exploratory Perspective

Sep 09, 2023

Muzhe Guo, Feixu Yu, Tian Lan, Fang Jin

Figure 1 for Advantage Actor-Critic with Reasoner: Explaining the Agent's Behavior from an Exploratory Perspective

Figure 2 for Advantage Actor-Critic with Reasoner: Explaining the Agent's Behavior from an Exploratory Perspective

Figure 3 for Advantage Actor-Critic with Reasoner: Explaining the Agent's Behavior from an Exploratory Perspective

Figure 4 for Advantage Actor-Critic with Reasoner: Explaining the Agent's Behavior from an Exploratory Perspective

Abstract:Reinforcement learning (RL) is a powerful tool for solving complex decision-making problems, but its lack of transparency and interpretability has been a major challenge in domains where decisions have significant real-world consequences. In this paper, we propose a novel Advantage Actor-Critic with Reasoner (A2CR), which can be easily applied to Actor-Critic-based RL models and make them interpretable. A2CR consists of three interconnected networks: the Policy Network, the Value Network, and the Reasoner Network. By predefining and classifying the underlying purpose of the actor's actions, A2CR automatically generates a more comprehensive and interpretable paradigm for understanding the agent's decision-making process. It offers a range of functionalities such as purpose-based saliency, early failure detection, and model supervision, thereby promoting responsible and trustworthy RL. Evaluations conducted in action-rich Super Mario Bros environments yield intriguing findings: Reasoner-predicted label proportions decrease for ``Breakout" and increase for ``Hovering" as the exploration level of the RL algorithm intensifies. Additionally, purpose-based saliencies are more focused and comprehensible.

Via

Access Paper or Ask Questions

Google Trends Analysis of COVID-19

Nov 07, 2020

Hoang Long Nguyen, Zhenhe Pan, Hashim Abu-gellban, Fang Jin, Yuanlin Zhang

Figure 1 for Google Trends Analysis of COVID-19

Figure 2 for Google Trends Analysis of COVID-19

Figure 3 for Google Trends Analysis of COVID-19

Figure 4 for Google Trends Analysis of COVID-19

Abstract:The World Health Organization (WHO) announced that COVID-19 was a pandemic disease on the 11th of March as there were 118K cases in several countries and territories. Numerous researchers worked on forecasting the number of confirmed cases since anticipating the growth of the cases helps governments adopting knotty decisions to ease the lockdowns orders for their countries. These orders help several people who have lost their jobs and support gravely impacted businesses. Our research aims to investigate the relation between Google search trends and the spreading of the novel coronavirus (COVID-19) over countries worldwide, to predict the number of cases. We perform a correlation analysis on the keywords of the related Google search trends according to the number of confirmed cases reported by the WHO. After that, we applied several machine learning techniques (Multiple Linear Regression, Non-negative Integer Regression, Deep Neural Network), to forecast the number of confirmed cases globally based on historical data as well as the hybrid data (Google search trends). Our results show that Google search trends are highly associated with the number of reported confirmed cases, where the Deep Learning approach outperforms other forecasting techniques. We believe that it is not only a promising approach for forecasting the confirmed cases of COVID-19, but also for similar forecasting problems that are associated with the related Google trends.

Via

Access Paper or Ask Questions

Addict Free -- A Smart and Connected Relapse Intervention Mobile App

Dec 02, 2019

Zhou Yang, Vinay Jayachandra Reddy, Rashmi Kesidi, Fang Jin

Figure 1 for Addict Free -- A Smart and Connected Relapse Intervention Mobile App

Figure 2 for Addict Free -- A Smart and Connected Relapse Intervention Mobile App

Figure 3 for Addict Free -- A Smart and Connected Relapse Intervention Mobile App

Figure 4 for Addict Free -- A Smart and Connected Relapse Intervention Mobile App

Abstract:It is widely acknowledged that addiction relapse is highly associated with spatial-temporal factors such as some specific places or time periods. Current studies suggest that those factors can be utilized for better relapse interventions, however, there is no relapse prevention application that makes use of those factors. In this paper, we introduce a mobile app called "Addict Free", which records user profiles, tracks relapse history and summarizes recovering statistics to help users better understand their recovering situations. Also, this app builds a relapse recovering community, which allows users to ask for advice and encouragement, and share relapse prevention experience. Moreover, machine learning algorithms that ingest spatial and temporal factors are utilized to predict relapse, based on which helpful addiction diversion activities are recommended by a recovering recommendation algorithm. By interacting with users, this app targets at providing smart suggestions that aim to stop relapse, especially for alcohol and tobacco addiction users.

* 4 pages

Via

Access Paper or Ask Questions

Discovering Opioid Use Patterns from Social Media for Relapse Prevention

Dec 02, 2019

Zhou Yang, Spencer Bradshaw, Rattikorn Hewett, Fang Jin

Figure 1 for Discovering Opioid Use Patterns from Social Media for Relapse Prevention

Figure 2 for Discovering Opioid Use Patterns from Social Media for Relapse Prevention

Figure 3 for Discovering Opioid Use Patterns from Social Media for Relapse Prevention

Figure 4 for Discovering Opioid Use Patterns from Social Media for Relapse Prevention

Abstract:The United States is currently experiencing an unprecedented opioid crisis, and opioid overdose has become a leading cause of injury and death. Effective opioid addiction recovery calls for not only medical treatments, but also behavioral interventions for impacted individuals. In this paper, we study communication and behavior patterns of patients with opioid use disorder (OUD) from social media, intending to demonstrate how existing information from common activities, such as online social networking, might lead to better prediction, evaluation, and ultimately prevention of relapses. Through a multi-disciplinary and advanced novel analytic perspective, we characterize opioid addiction behavior patterns by analyzing opioid groups from Reddit.com - including modeling online discussion topics, analyzing text co-occurrence and correlations, and identifying emotional states of people with OUD. These quantitative analyses are of practical importance and demonstrate innovative ways to use information from online social media, to create technology that can assist in relapse prevention.

* 7 pages, and 5 figures

Via

Access Paper or Ask Questions

Self-boosted Time-series Forecasting with Multi-task and Multi-view Learning

Sep 17, 2019

Long H. Nguyen, Zhenhe Pan, Opeyemi Openiyi, Hashim Abu-gellban, Mahdi Moghadasi, Fang Jin

Figure 1 for Self-boosted Time-series Forecasting with Multi-task and Multi-view Learning

Figure 2 for Self-boosted Time-series Forecasting with Multi-task and Multi-view Learning

Figure 3 for Self-boosted Time-series Forecasting with Multi-task and Multi-view Learning

Figure 4 for Self-boosted Time-series Forecasting with Multi-task and Multi-view Learning

Abstract:A robust model for time series forecasting is highly important in many domains, including but not limited to financial forecast, air temperature and electricity consumption. To improve forecasting performance, traditional approaches usually require additional feature sets. However, adding more feature sets from different sources of data is not always feasible due to its accessibility limitation. In this paper, we propose a novel self-boosted mechanism in which the original time series is decomposed into multiple time series. These time series played the role of additional features in which the closely related time series group is used to feed into multi-task learning model, and the loosely related group is fed into multi-view learning part to utilize its complementary information. We use three real-world datasets to validate our model and show the superiority of our proposed method over existing state-of-the-art baseline methods.

Via

Access Paper or Ask Questions

Job Scheduling on Data Centers with Deep Reinforcement Learning

Sep 16, 2019

Sisheng Liang, Zhou Yang, Fang Jin, Yong Chen

Figure 1 for Job Scheduling on Data Centers with Deep Reinforcement Learning

Figure 2 for Job Scheduling on Data Centers with Deep Reinforcement Learning

Figure 3 for Job Scheduling on Data Centers with Deep Reinforcement Learning

Figure 4 for Job Scheduling on Data Centers with Deep Reinforcement Learning

Abstract:Efficient job scheduling on data centers under heterogeneous complexity is crucial but challenging since it involves the allocation of multi-dimensional resources over time and space. To adapt the complex computing environment in data centers, we proposed an innovative Advantage Actor-Critic (A2C) deep reinforcement learning based approach called DeepScheduler for job scheduling. DeepScheduler consists of two agents, one of which, dubbed the actor, is responsible for learning the scheduling policy automatically and the other one, the critic, reduces the estimation error. Unlike previous policy gradient approaches, DeepScheduler is designed to reduce the gradient estimation variance and to update parameters efficiently. We show that the DeepScheduler can achieve competitive scheduling performance using both simulated workloads and real data collected from an academic data center.

* 6 pages

Via

Access Paper or Ask Questions

A Multi-variable Stacked Long-Short Term Memory Network for Wind Speed Forecasting

Nov 24, 2018

Sisheng Liang, Long Nguyen, Fang Jin

Figure 1 for A Multi-variable Stacked Long-Short Term Memory Network for Wind Speed Forecasting

Figure 2 for A Multi-variable Stacked Long-Short Term Memory Network for Wind Speed Forecasting

Figure 3 for A Multi-variable Stacked Long-Short Term Memory Network for Wind Speed Forecasting

Figure 4 for A Multi-variable Stacked Long-Short Term Memory Network for Wind Speed Forecasting

Abstract:Precisely forecasting wind speed is essential for wind power producers and grid operators. However, this task is challenging due to the stochasticity of wind speed. To accurately predict short-term wind speed under uncertainties, this paper proposed a multi-variable stacked LSTMs model (MSLSTM). The proposed method utilizes multiple historical meteorological variables, such as wind speed, temperature, humidity, pressure, dew point and solar radiation to accurately predict wind speeds. The prediction performance is extensively assessed using real data collected in West Texas, USA. The experimental results show that the proposed MSLSTM can preferably capture and learn uncertainties while output competitive performance.

Via

Access Paper or Ask Questions