Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qing-Feng Liu

A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and Recognition

May 27, 2024

Zilu Guo, Qing Wang, Jun Du, Jia Pan, Qing-Feng Liu, Chin-Hui

Abstract:In this paper, we propose a variance-preserving interpolation framework to improve diffusion models for single-channel speech enhancement (SE) and automatic speech recognition (ASR). This new variance-preserving interpolation diffusion model (VPIDM) approach requires only 25 iterative steps and obviates the need for a corrector, an essential element in the existing variance-exploding interpolation diffusion model (VEIDM). Two notable distinctions between VPIDM and VEIDM are the scaling function of the mean of state variables and the constraint imposed on the variance relative to the mean's scale. We conduct a systematic exploration of the theoretical mechanism underlying VPIDM and develop insights regarding VPIDM's applications in SE and ASR using VPIDM as a frontend. Our proposed approach, evaluated on two distinct data sets, demonstrates VPIDM's superior performances over conventional discriminative SE algorithms. Furthermore, we assess the performance of the proposed model under varying signal-to-noise ratio (SNR) levels. The investigation reveals VPIDM's improved robustness in target noise elimination when compared to VEIDM. Furthermore, utilizing the mid-outputs of both VPIDM and VEIDM results in enhanced ASR accuracies, thereby highlighting the practical efficacy of our proposed approach.

Via

Access Paper or Ask Questions

Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition

Nov 17, 2021

Hengshun Zhou, Jun Du, Yuanyuan Zhang, Qing Wang, Qing-Feng Liu, Chin-Hui Lee

Figure 1 for Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition

Figure 2 for Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition

Figure 3 for Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition

Figure 4 for Information Fusion in Attention Networks Using Adaptive and Multi-level Factorized Bilinear Pooling for Audio-visual Emotion Recognition

Abstract:Multimodal emotion recognition is a challenging task in emotion computing as it is quite difficult to extract discriminative features to identify the subtle differences in human emotions with abstract concept and multiple expressions. Moreover, how to fully utilize both audio and visual information is still an open problem. In this paper, we propose a novel multimodal fusion attention network for audio-visual emotion recognition based on adaptive and multi-level factorized bilinear pooling (FBP). First, for the audio stream, a fully convolutional network (FCN) equipped with 1-D attention mechanism and local response normalization is designed for speech emotion recognition. Next, a global FBP (G-FBP) approach is presented to perform audio-visual information fusion by integrating selfattention based video stream with the proposed audio stream. To improve G-FBP, an adaptive strategy (AG-FBP) to dynamically calculate the fusion weight of two modalities is devised based on the emotion-related representation vectors from the attention mechanism of respective modalities. Finally, to fully utilize the local emotion information, adaptive and multi-level FBP (AMFBP) is introduced by combining both global-trunk and intratrunk data in one recording on top of AG-FBP. Tested on the IEMOCAP corpus for speech emotion recognition with only audio stream, the new FCN method outperforms the state-ofthe-art results with an accuracy of 71.40%. Moreover, validated on the AFEW database of EmotiW2019 sub-challenge and the IEMOCAP corpus for audio-visual emotion recognition, the proposed AM-FBP approach achieves the best accuracy of 63.09% and 75.49% respectively on the test set.

Via

Access Paper or Ask Questions

Textual Analysis for Studying Chinese Historical Documents and Literary Novels

Oct 11, 2015

Chao-Lin Liu, Guan-Tao Jin, Hongsu Wang, Qing-Feng Liu, Wen-Huei Cheng, Wei-Yun Chiu, Richard Tzong-Han Tsai, Yu-Chun Wang

Figure 1 for Textual Analysis for Studying Chinese Historical Documents and Literary Novels

Figure 2 for Textual Analysis for Studying Chinese Historical Documents and Literary Novels

Figure 3 for Textual Analysis for Studying Chinese Historical Documents and Literary Novels

Figure 4 for Textual Analysis for Studying Chinese Historical Documents and Literary Novels

Abstract:We analyzed historical and literary documents in Chinese to gain insights into research issues, and overview our studies which utilized four different sources of text materials in this paper. We investigated the history of concepts and transliterated words in China with the Database for the Study of Modern China Thought and Literature, which contains historical documents about China between 1830 and 1930. We also attempted to disambiguate names that were shared by multiple government officers who served between 618 and 1912 and were recorded in Chinese local gazetteers. To showcase the potentials and challenges of computer-assisted analysis of Chinese literatures, we explored some interesting yet non-trivial questions about two of the Four Great Classical Novels of China: (1) Which monsters attempted to consume the Buddhist monk Xuanzang in the Journey to the West (JTTW), which was published in the 16th century, (2) Which was the most powerful monster in JTTW, and (3) Which major role smiled the most in the Dream of the Red Chamber, which was published in the 18th century. Similar approaches can be applied to the analysis and study of modern documents, such as the newspaper articles published about the 228 incident that occurred in 1947 in Taiwan.

* 11 pages, 7 figures, 2 tables, The Fourth ASE International Conference on Social Informatics

Via

Access Paper or Ask Questions