Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yong Shan

Relational Surrogate Loss Learning

Feb 26, 2022

Tao Huang, Zekang Li, Hua Lu, Yong Shan, Shusheng Yang, Yang Feng, Fei Wang, Shan You, Chang Xu

Figure 1 for Relational Surrogate Loss Learning

Figure 2 for Relational Surrogate Loss Learning

Figure 3 for Relational Surrogate Loss Learning

Figure 4 for Relational Surrogate Loss Learning

Abstract:Evaluation metrics in machine learning are often hardly taken as loss functions, as they could be non-differentiable and non-decomposable, e.g., average precision and F1 score. This paper aims to address this problem by revisiting the surrogate loss learning, where a deep neural network is employed to approximate the evaluation metrics. Instead of pursuing an exact recovery of the evaluation metric through a deep neural network, we are reminded of the purpose of the existence of these evaluation metrics, which is to distinguish whether one model is better or worse than another. In this paper, we show that directly maintaining the relation of models between surrogate losses and metrics suffices, and propose a rank correlation-based optimization method to maximize this relation and learn surrogate losses. Compared to previous works, our method is much easier to optimize and enjoys significant efficiency and performance gains. Extensive experiments show that our method achieves improvements on various tasks including image classification and neural machine translation, and even outperforms state-of-the-art methods on human pose estimation and machine reading comprehension tasks. Code is available at: https://github.com/hunto/ReLoss.

* Accepted to ICLR 2022

Via

Access Paper or Ask Questions

Mental Health Assessment for the Chatbots

Jan 14, 2022

Yong Shan, Jinchao Zhang, Zekang Li, Yang Feng, Jie Zhou

Figure 1 for Mental Health Assessment for the Chatbots

Figure 2 for Mental Health Assessment for the Chatbots

Figure 3 for Mental Health Assessment for the Chatbots

Figure 4 for Mental Health Assessment for the Chatbots

Abstract:Previous researches on dialogue system assessment usually focus on the quality evaluation (e.g. fluency, relevance, etc) of responses generated by the chatbots, which are local and technical metrics. For a chatbot which responds to millions of online users including minors, we argue that it should have a healthy mental tendency in order to avoid the negative psychological impact on them. In this paper, we establish several mental health assessment dimensions for chatbots (depression, anxiety, alcohol addiction, empathy) and introduce the questionnaire-based mental health assessment methods. We conduct assessments on some well-known open-domain chatbots and find that there are severe mental health issues for all these chatbots. We consider that it is due to the neglect of the mental health risks during the dataset building and the model training procedures. We expect to attract researchers' attention to the serious mental health problems of chatbots and improve the chatbots' ability in positive emotional interaction.

* Work in progress

Via

Access Paper or Ask Questions

Modeling Coverage for Non-Autoregressive Neural Machine Translation

Apr 24, 2021

Yong Shan, Yang Feng, Chenze Shao

Figure 1 for Modeling Coverage for Non-Autoregressive Neural Machine Translation

Figure 2 for Modeling Coverage for Non-Autoregressive Neural Machine Translation

Figure 3 for Modeling Coverage for Non-Autoregressive Neural Machine Translation

Figure 4 for Modeling Coverage for Non-Autoregressive Neural Machine Translation

Abstract:Non-Autoregressive Neural Machine Translation (NAT) has achieved significant inference speedup by generating all tokens simultaneously. Despite its high efficiency, NAT usually suffers from two kinds of translation errors: over-translation (e.g. repeated tokens) and under-translation (e.g. missing translations), which eventually limits the translation quality. In this paper, we argue that these issues of NAT can be addressed through coverage modeling, which has been proved to be useful in autoregressive decoding. We propose a novel Coverage-NAT to model the coverage information directly by a token-level coverage iterative refinement mechanism and a sentence-level coverage agreement, which can remind the model if a source token has been translated or not and improve the semantics consistency between the translation and the source, respectively. Experimental results on WMT14 En-De and WMT16 En-Ro translation tasks show that our method can alleviate those errors and achieve strong improvements over the baseline system.

* Accepted by the 2021 International Joint Conference on Neural Networks (IJCNN 2021)

Via

Access Paper or Ask Questions

A Contextual Hierarchical Attention Network with Adaptive Objective for Dialogue State Tracking

Jun 02, 2020

Yong Shan, Zekang Li, Jinchao Zhang, Fandong Meng, Yang Feng, Cheng Niu, Jie Zhou

Figure 1 for A Contextual Hierarchical Attention Network with Adaptive Objective for Dialogue State Tracking

Figure 2 for A Contextual Hierarchical Attention Network with Adaptive Objective for Dialogue State Tracking

Figure 3 for A Contextual Hierarchical Attention Network with Adaptive Objective for Dialogue State Tracking

Figure 4 for A Contextual Hierarchical Attention Network with Adaptive Objective for Dialogue State Tracking

Abstract:Recent studies in dialogue state tracking (DST) leverage historical information to determine states which are generally represented as slot-value pairs. However, most of them have limitations to efficiently exploit relevant context due to the lack of a powerful mechanism for modeling interactions between the slot and the dialogue history. Besides, existing methods usually ignore the slot imbalance problem and treat all slots indiscriminately, which limits the learning of hard slots and eventually hurts overall performance. In this paper, we propose to enhance the DST through employing a contextual hierarchical attention network to not only discern relevant information at both word level and turn level but also learn contextual representations. We further propose an adaptive objective to alleviate the slot imbalance problem by dynamically adjust weights of different slots during training. Experimental results show that our approach reaches 52.68% and 58.55% joint accuracy on MultiWOZ 2.0 and MultiWOZ 2.1 datasets respectively and achieves new state-of-the-art performance with considerable improvements (+1.24% and +5.98%).

* Accepted as a long paper at ACL 2020. Code is available at https://github.com/ictnlp/CHAN-DST

Via

Access Paper or Ask Questions

Improving Bidirectional Decoding with Dynamic Target Semantics in Neural Machine Translation

Nov 05, 2019

Yong Shan, Yang Feng, Jinchao Zhang, Fandong Meng, Wen Zhang

Figure 1 for Improving Bidirectional Decoding with Dynamic Target Semantics in Neural Machine Translation

Figure 2 for Improving Bidirectional Decoding with Dynamic Target Semantics in Neural Machine Translation

Figure 3 for Improving Bidirectional Decoding with Dynamic Target Semantics in Neural Machine Translation

Figure 4 for Improving Bidirectional Decoding with Dynamic Target Semantics in Neural Machine Translation

Abstract:Generally, Neural Machine Translation models generate target words in a left-to-right (L2R) manner and fail to exploit any future (right) semantics information, which usually produces an unbalanced translation. Recent works attempt to utilize the right-to-left (R2L) decoder in bidirectional decoding to alleviate this problem. In this paper, we propose a novel \textbf{D}ynamic \textbf{I}nteraction \textbf{M}odule (\textbf{DIM}) to dynamically exploit target semantics from R2L translation for enhancing the L2R translation quality. Different from other bidirectional decoding approaches, DIM firstly extracts helpful target information through addressing and reading operations, then updates target semantics for tracking the interactive history. Additionally, we further introduce an \textbf{agreement regularization} term into the training objective to narrow the gap between L2R and R2L translations. Experimental results on NIST Chinese$\Rightarrow$English and WMT'16 English$\Rightarrow$Romanian translation tasks show that our system achieves significant improvements over baseline systems, which also reaches comparable results compared to the state-of-the-art Transformer model with much fewer parameters of it.

Via

Access Paper or Ask Questions