Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Junki Ohmura

Evaluation of Instruction-Following Ability for Large Language Models on Story-Ending Generation

Jun 24, 2024

Rem Hida, Junki Ohmura, Toshiyuki Sekiya

Abstract:Instruction-tuned Large Language Models (LLMs) have achieved remarkable performance across various benchmark tasks. While providing instructions to LLMs for guiding their generations is user-friendly, assessing their instruction-following capabilities is still unclarified due to a lack of evaluation metrics. In this paper, we focus on evaluating the instruction-following ability of LLMs in the context of story-ending generation, which requires diverse and context-specific instructions. We propose an automatic evaluation pipeline that utilizes a machine reading comprehension (MRC) model to determine whether the generated story-ending reflects instruction. Our findings demonstrate that our proposed metric aligns with human evaluation. Furthermore, our experiments confirm that recent open-source LLMs can achieve instruction-following performance close to GPT-3.5, as assessed through automatic evaluation.

Via

Access Paper or Ask Questions

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

May 16, 2022

Yuhta Takida, Takashi Shibuya, WeiHsiang Liao, Chieh-Hsin Lai, Junki Ohmura, Toshimitsu Uesaka, Naoki Murata, Shusuke Takahashi, Toshiyuki Kumakura, Yuki Mitsufuji

Figure 1 for SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

Figure 2 for SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

Figure 3 for SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

Figure 4 for SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization

Abstract:One noted issue of vector-quantized variational autoencoder (VQ-VAE) is that the learned discrete representation uses only a fraction of the full capacity of the codebook, also known as codebook collapse. We hypothesize that the training scheme of VQ-VAE, which involves some carefully designed heuristics, underlies this issue. In this paper, we propose a new training scheme that extends the standard VAE via novel stochastic dequantization and quantization, called stochastically quantized variational autoencoder (SQ-VAE). In SQ-VAE, we observe a trend that the quantization is stochastic at the initial stage of the training but gradually converges toward a deterministic quantization, which we call self-annealing. Our experiments show that SQ-VAE improves codebook utilization without using common heuristics. Furthermore, we empirically show that SQ-VAE is superior to VAE and VQ-VAE in vision- and speech-related tasks.

* 25 pages with 10 figures, accepted for publication in ICML 2022

Via

Access Paper or Ask Questions

Context-Aware Dialog Re-Ranking for Task-Oriented Dialog Systems

Nov 28, 2018

Junki Ohmura, Maxine Eskenazi

Figure 1 for Context-Aware Dialog Re-Ranking for Task-Oriented Dialog Systems

Figure 2 for Context-Aware Dialog Re-Ranking for Task-Oriented Dialog Systems

Figure 3 for Context-Aware Dialog Re-Ranking for Task-Oriented Dialog Systems

Figure 4 for Context-Aware Dialog Re-Ranking for Task-Oriented Dialog Systems

Abstract:Dialog response ranking is used to rank response candidates by considering their relation to the dialog history. Although researchers have addressed this concept for open-domain dialogs, little attention has been focused on task-oriented dialogs. Furthermore, no previous studies have analyzed whether response ranking can improve the performance of existing dialog systems in real human-computer dialogs with speech recognition errors. In this paper, we propose a context-aware dialog response re-ranking system. Our system reranks responses in two steps: (1) it calculates matching scores for each candidate response and the current dialog context; (2) it combines the matching scores and a probability distribution of the candidates from an existing dialog system for response re-ranking. By using neural word embedding-based models and handcrafted or logistic regression-based ensemble models, we have improved the performance of a recently proposed end-to-end task-oriented dialog system on real dialogs with speech recognition errors.

* Accepted in IEEE SLT 2018. 8 pages, 3 figures

Via

Access Paper or Ask Questions