Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ting-Yao Hu

Mutual Reinforcement of LLM Dialogue Synthesis and Summarization Capabilities for Few-Shot Dialogue Summarization

Feb 24, 2025

Yen-Ju Lu, Ting-Yao Hu, Hema Swetha Koppula, Hadi Pouransari, Jen-Hao Rick Chang, Yin Xia, Xiang Kong, Qi Zhu, Simon Wang, Oncel Tuzel(+1 more)

Abstract:In this work, we propose Mutual Reinforcing Data Synthesis (MRDS) within LLMs to improve few-shot dialogue summarization task. Unlike prior methods that require external knowledge, we mutually reinforce the LLM\'s dialogue synthesis and summarization capabilities, allowing them to complement each other during training and enhance overall performances. The dialogue synthesis capability is enhanced by directed preference optimization with preference scoring from summarization capability. The summarization capability is enhanced by the additional high quality dialogue-summary paired data produced by the dialogue synthesis capability. By leveraging the proposed MRDS mechanism, we elicit the internal knowledge of LLM in the format of synthetic data, and use it to augment the few-shot real training dataset. Empirical results demonstrate that our method improves dialogue summarization, achieving a 1.5% increase in ROUGE scores and a 0.3% improvement in BERT scores in few-shot settings. Furthermore, our method attains the highest average scores in human evaluations, surpassing both the pre-trained models and the baselines fine-tuned solely for summarization tasks.

* NAACL 2025 Findings

Via

Access Paper or Ask Questions

MUSCLE: A Model Update Strategy for Compatible LLM Evolution

Jul 12, 2024

Jessica Echterhoff, Fartash Faghri, Raviteja Vemulapalli, Ting-Yao Hu, Chun-Liang Li, Oncel Tuzel, Hadi Pouransari

Figure 1 for MUSCLE: A Model Update Strategy for Compatible LLM Evolution

Figure 2 for MUSCLE: A Model Update Strategy for Compatible LLM Evolution

Figure 3 for MUSCLE: A Model Update Strategy for Compatible LLM Evolution

Figure 4 for MUSCLE: A Model Update Strategy for Compatible LLM Evolution

Abstract:Large Language Models (LLMs) are frequently updated due to data or architecture changes to improve their performance. When updating models, developers often focus on increasing overall performance metrics with less emphasis on being compatible with previous model versions. However, users often build a mental model of the functionality and capabilities of a particular machine learning model they are interacting with. They have to adapt their mental model with every update -- a draining task that can lead to user dissatisfaction. In practice, fine-tuned downstream task adapters rely on pretrained LLM base models. When these base models are updated, these user-facing downstream task models experience instance regression or negative flips -- previously correct instances are now predicted incorrectly. This happens even when the downstream task training procedures remain identical. Our work aims to provide seamless model updates to a user in two ways. First, we provide evaluation metrics for a notion of compatibility to prior model versions, specifically for generative tasks but also applicable for discriminative tasks. We observe regression and inconsistencies between different model versions on a diverse set of tasks and model updates. Second, we propose a training strategy to minimize the number of inconsistencies in model updates, involving training of a compatibility model that can enhance task fine-tuned language models. We reduce negative flips -- instances where a prior model version was correct, but a new model incorrect -- by up to 40% from Llama 1 to Llama 2.

Via

Access Paper or Ask Questions

Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models

Sep 18, 2023

Hsuan Su, Ting-Yao Hu, Hema Swetha Koppula, Raviteja Vemulapalli, Jen-Hao Rick Chang, Karren Yang, Gautam Varma Mantena, Oncel Tuzel

Abstract:While Automatic Speech Recognition (ASR) systems are widely used in many real-world applications, they often do not generalize well to new domains and need to be finetuned on data from these domains. However, target-domain data usually are not readily available in many scenarios. In this paper, we propose a new strategy for adapting ASR models to new target domains without any text or speech from those domains. To accomplish this, we propose a novel data synthesis pipeline that uses a Large Language Model (LLM) to generate a target domain text corpus, and a state-of-the-art controllable speech synthesis model to generate the corresponding speech. We propose a simple yet effective in-context instruction finetuning strategy to increase the effectiveness of LLM in generating text corpora for new domains. Experiments on the SLURP dataset show that the proposed method achieves an average relative word error rate improvement of $28\%$ on unseen target domains without any performance drop in source domains.

Via

Access Paper or Ask Questions

Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

Mar 27, 2023

Karren Yang, Ting-Yao Hu, Jen-Hao Rick Chang, Hema Swetha Koppula, Oncel Tuzel

Figure 1 for Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

Figure 2 for Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

Figure 3 for Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

Figure 4 for Text is All You Need: Personalizing ASR Models using Controllable Speech Synthesis

Abstract:Adapting generic speech recognition models to specific individuals is a challenging problem due to the scarcity of personalized data. Recent works have proposed boosting the amount of training data using personalized text-to-speech synthesis. Here, we ask two fundamental questions about this strategy: when is synthetic data effective for personalization, and why is it effective in those cases? To address the first question, we adapt a state-of-the-art automatic speech recognition (ASR) model to target speakers from four benchmark datasets representative of different speaker types. We show that ASR personalization with synthetic data is effective in all cases, but particularly when (i) the target speaker is underrepresented in the global data, and (ii) the capacity of the global model is limited. To address the second question of why personalized synthetic data is effective, we use controllable speech synthesis to generate speech with varied styles and content. Surprisingly, we find that the text content of the synthetic data, rather than style, is important for speaker adaptation. These results lead us to propose a data selection strategy for ASR personalization based on speech content.

* ICASSP 2023

Via

Access Paper or Ask Questions

I see what you hear: a vision-inspired method to localize words

Oct 24, 2022

Mohammad Samragh, Arnav Kundu, Ting-Yao Hu, Minsik Cho, Aman Chadha, Ashish Shrivastava, Oncel Tuzel, Devang Naik

Abstract:This paper explores the possibility of using visual object detection techniques for word localization in speech data. Object detection has been thoroughly studied in the contemporary literature for visual data. Noting that an audio can be interpreted as a 1-dimensional image, object localization techniques can be fundamentally useful for word localization. Building upon this idea, we propose a lightweight solution for word detection and localization. We use bounding box regression for word localization, which enables our model to detect the occurrence, offset, and duration of keywords in a given audio stream. We experiment with LibriSpeech and train a model to localize 1000 words. Compared to existing work, our method reduces model size by 94%, and improves the F1 score by 6.5\%.

Via

Access Paper or Ask Questions

Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

Oct 21, 2021

Ting-Yao Hu, Mohammadreza Armandpour, Ashish Shrivastava, Jen-Hao Rick Chang, Hema Koppula, Oncel Tuzel

Figure 1 for Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

Figure 2 for Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

Figure 3 for Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

Figure 4 for Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

Abstract:With recent advances in speech synthesis, synthetic data is becoming a viable alternative to real data for training speech recognition models. However, machine learning with synthetic data is not trivial due to the gap between the synthetic and the real data distributions. Synthetic datasets may contain artifacts that do not exist in real data such as structured noise, content errors, or unrealistic speaking styles. Moreover, the synthesis process may introduce a bias due to uneven sampling of the data manifold. We propose two novel techniques during training to mitigate the problems due to the distribution gap: (i) a rejection sampling algorithm and (ii) using separate batch normalization statistics for the real and the synthetic samples. We show that these methods significantly improve the training of speech recognition models using synthetic data. We evaluate the proposed approach on keyword detection and Automatic Speech Recognition (ASR) tasks, and observe up to 18% and 13% relative error reduction, respectively, compared to naively using the synthetic data.

Via

Access Paper or Ask Questions

Subspace Representation Learning for Few-shot Image Classification

May 05, 2021

Ting-Yao Hu, Zhi-Qi Cheng, Alexander G. Hauptmann

Figure 1 for Subspace Representation Learning for Few-shot Image Classification

Figure 2 for Subspace Representation Learning for Few-shot Image Classification

Figure 3 for Subspace Representation Learning for Few-shot Image Classification

Figure 4 for Subspace Representation Learning for Few-shot Image Classification

Abstract:In this paper, we propose a subspace representation learning (SRL) framework to tackle few-shot image classification tasks. It exploits a subspace in local CNN feature space to represent an image, and measures the similarity between two images according to a weighted subspace distance (WSD). When K images are available for each class, we develop two types of template subspaces to aggregate K-shot information: the prototypical subspace (PS) and the discriminative subspace (DS). Based on the SRL framework, we extend metric learning based techniques from vector to subspace representation. While most previous works adopted global vector representation, using subspace representation can effectively preserve the spatial structure, and diversity within an image. We demonstrate the effectiveness of the SRL framework on three public benchmark datasets: MiniImageNet, TieredImageNet and Caltech-UCSD Birds-200-2011 (CUB), and the experimental results illustrate competitive/superior performance of our method compared to the previous state-of-the-art.

Via

Access Paper or Ask Questions

Pose Guided Person Image Generation with Hidden p-Norm Regression

Feb 19, 2021

Ting-Yao Hu, Alexander G. Hauptmann

Figure 1 for Pose Guided Person Image Generation with Hidden p-Norm Regression

Figure 2 for Pose Guided Person Image Generation with Hidden p-Norm Regression

Figure 3 for Pose Guided Person Image Generation with Hidden p-Norm Regression

Figure 4 for Pose Guided Person Image Generation with Hidden p-Norm Regression

Abstract:In this paper, we propose a novel approach to solve the pose guided person image generation task. We assume that the relation between pose and appearance information can be described by a simple matrix operation in hidden space. Based on this assumption, our method estimates a pose-invariant feature matrix for each identity, and uses it to predict the target appearance conditioned on the target pose. The estimation process is formulated as a p-norm regression problem in hidden space. By utilizing the differentiation of the solution of this regression problem, the parameters of the whole framework can be trained in an end-to-end manner. While most previous works are only applicable to the supervised training and single-shot generation scenario, our method can be easily adapted to unsupervised training and multi-shot generation. Extensive experiments on the challenging Market-1501 dataset show that our method yields competitive performance in all the aforementioned variant scenarios.

Via

Access Paper or Ask Questions

SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

Nov 02, 2020

Ting-Yao Hu, Ashish Shrivastava, Rick Chang, Hema Koppula, Stefan Braun, Kyuyeon Hwang, Ozlem Kalini, Oncel Tuzel

Figure 1 for SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

Figure 2 for SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

Figure 3 for SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

Figure 4 for SapAugment: Learning A Sample Adaptive Policy for Data Augmentation

Abstract:Data augmentation methods usually apply the same augmentation (or a mix of them) to all the training samples. For example, to perturb data with noise, the noise is sampled from a Normal distribution with a fixed standard deviation, for all samples. We hypothesize that a hard sample with high training loss already provides strong training signal to update the model parameters and should be perturbed with mild or no augmentation. Perturbing a hard sample with a strong augmentation may also make it too hard to learn from. Furthermore, a sample with low training loss should be perturbed by a stronger augmentation to provide more robustness to a variety of conditions. To formalize these intuitions, we propose a novel method to learn a Sample-Adaptive Policy for Augmentation -- SapAugment. Our policy adapts the augmentation parameters based on the training loss of the data samples. In the example of Gaussian noise, a hard sample will be perturbed with a low variance noise and an easy sample with a high variance noise. Furthermore, the proposed method combines multiple augmentation methods into a methodical policy learning framework and obviates hand-crafting augmentation parameters by trial-and-error. We apply our method on an automatic speech recognition (ASR) task, and combine existing and novel augmentations using the proposed framework. We show substantial improvement, up to 21% relative reduction in word error rate on LibriSpeech dataset, over the state-of-the-art speech augmentation method.

* Submitted to ICASSP 2021

Via

Access Paper or Ask Questions

RISE Video Dataset: Recognizing Industrial Smoke Emissions

May 20, 2020

Yen-Chia Hsu, Ting-Hao 'Kenneth' Huang, Ting-Yao Hu, Paul Dille, Sean Prendi, Ryan Hoffman, Anastasia Tsuhlares, Randy Sargent, Illah Nourbakhsh

Figure 1 for RISE Video Dataset: Recognizing Industrial Smoke Emissions

Figure 2 for RISE Video Dataset: Recognizing Industrial Smoke Emissions

Figure 3 for RISE Video Dataset: Recognizing Industrial Smoke Emissions

Figure 4 for RISE Video Dataset: Recognizing Industrial Smoke Emissions

Abstract:Industrial smoke emissions pose a significant concern to human health. Prior works have shown that using Computer Vision (CV) techniques to identify smoke as visual evidence can influence the attitude of regulators and empower citizens in pursuing environmental justice. However, existing datasets do not have sufficient quality nor quantity for training robust CV models to support air quality advocacy. We introduce RISE, the first large-scale video dataset for Recognizing Industrial Smoke Emissions. We adopt the citizen science approach to collaborate with local community members in annotating whether a video clip has smoke emissions. Our dataset contains 12,567 clips with 19 distinct views from cameras on three sites that monitored three different industrial facilities. The clips are from 30 days that spans four seasons in two years in the daytime. We run experiments using deep neural networks developed for video action recognition to establish a performance baseline and reveal the challenges for smoke recognition. Our data analysis also shows opportunities for integrating citizen scientists and crowd workers into the application of Artificial Intelligence for social good.

* Technical report

Via

Access Paper or Ask Questions