Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mengze Hong

Technical Report: A Practical Guide to Kaldi ASR Optimization

Jun 08, 2025

Mengze Hong, Di Jiang

Abstract:This technical report introduces innovative optimizations for Kaldi-based Automatic Speech Recognition (ASR) systems, focusing on acoustic model enhancement, hyperparameter tuning, and language model efficiency. We developed a custom Conformer block integrated with a multistream TDNN-F structure, enabling superior feature extraction and temporal modeling. Our approach includes advanced data augmentation techniques and dynamic hyperparameter optimization to boost performance and reduce overfitting. Additionally, we propose robust strategies for language model management, employing Bayesian optimization and $n$-gram pruning to ensure relevance and computational efficiency. These systematic improvements significantly elevate ASR accuracy and robustness, outperforming existing methods and offering a scalable solution for diverse speech recognition scenarios. This report underscores the importance of strategic optimizations in maintaining Kaldi's adaptability and competitiveness in rapidly evolving technological landscapes.

Via

Access Paper or Ask Questions

QualBench: Benchmarking Chinese LLMs with Localized Professional Qualifications for Vertical Domain Evaluation

May 08, 2025

Mengze Hong, Wailing Ng, Di Jiang, Chen Jason Zhang

Abstract:The rapid advancement of Chinese large language models (LLMs) underscores the need for domain-specific evaluations to ensure reliable applications. However, existing benchmarks often lack coverage in vertical domains and offer limited insights into the Chinese working context. Leveraging qualification exams as a unified framework for human expertise evaluation, we introduce QualBench, the first multi-domain Chinese QA benchmark dedicated to localized assessment of Chinese LLMs. The dataset includes over 17,000 questions across six vertical domains, with data selections grounded in 24 Chinese qualifications to closely align with national policies and working standards. Through comprehensive evaluation, the Qwen2.5 model outperformed the more advanced GPT-4o, with Chinese LLMs consistently surpassing non-Chinese models, highlighting the importance of localized domain knowledge in meeting qualification requirements. The best performance of 75.26% reveals the current gaps in domain coverage within model capabilities. Furthermore, we present the failure of LLM collaboration with crowdsourcing mechanisms and suggest the opportunities for multi-domain RAG knowledge enhancement and vertical domain LLM training with Federated Learning.

Via

Access Paper or Ask Questions

Dial-In LLM: Human-Aligned Dialogue Intent Clustering with LLM-in-the-loop

Dec 12, 2024

Mengze Hong, Yuanfeng Song, Di Jiang, Wailing Ng, Yanjie Sun, Chen Jason Zhang

Abstract:The discovery of customer intention from dialogue plays an important role in automated support system. However, traditional text clustering methods are poorly aligned with human perceptions due to the shift from embedding distance to semantic distance, and existing quantitative metrics for text clustering may not accurately reflect the true quality of intent clusters. In this paper, we leverage the superior language understanding capabilities of Large Language Models (LLMs) for designing better-calibrated intent clustering algorithms. We first establish the foundation by verifying the robustness of fine-tuned LLM utility in semantic coherence evaluation and cluster naming, resulting in an accuracy of 97.50% and 94.40%, respectively, when compared to the human-labeled ground truth. Then, we propose an iterative clustering algorithm that facilitates cluster-level refinement and the continuous discovery of high-quality intent clusters. Furthermore, we present several LLM-in-the-loop semi-supervised clustering techniques tailored for intent discovery from customer service dialogue. Experiments on a large-scale industrial dataset comprising 1,507 intent clusters demonstrate the effectiveness of the proposed techniques. The methods outperformed existing counterparts, achieving 6.25% improvement in quantitative metrics and 12% enhancement in application-level performance when constructing an intent classifier.

Via

Access Paper or Ask Questions

Dialogue Language Model with Large-Scale Persona Data Engineering

Dec 12, 2024

Mengze Hong, Chen Zhang, Chaotao Chen, Rongzhong Lian, Di Jiang

Abstract:Maintaining persona consistency is paramount in the application of open-domain dialogue systems, as exemplified by models like ChatGPT. Despite significant advancements, the limited scale and diversity of current persona dialogue datasets remain challenges to achieving robust persona-consistent dialogue models. In this study, drawing inspiration from the success of large-scale pre-training, we introduce PPDS, an open-domain persona dialogue system that employs extensive generative pre-training on a persona dialogue dataset to enhance persona consistency. Specifically, we present a persona extraction model designed to autonomously and precisely generate vast persona dialogue datasets. Additionally, we unveil a pioneering persona augmentation technique to address the invalid persona bias inherent in the constructed dataset. Both quantitative and human evaluations consistently highlight the superior response quality and persona consistency of our proposed model, underscoring its effectiveness.

Via

Access Paper or Ask Questions

Deconfounding Time Series Forecasting

Oct 27, 2024

Wentao Gao, Feiyu Yang, Mengze Hong, Xiaojing Du, Zechen Hu, Xiongren Chen, Ziqi Xu

Abstract:Time series forecasting is a critical task in various domains, where accurate predictions can drive informed decision-making. Traditional forecasting methods often rely on current observations of variables to predict future outcomes, typically overlooking the influence of latent confounders, unobserved variables that simultaneously affect both the predictors and the target outcomes. This oversight can introduce bias and degrade the performance of predictive models. In this study, we address this challenge by proposing an enhanced forecasting approach that incorporates representations of latent confounders derived from historical data. By integrating these confounders into the predictive process, our method aims to improve the accuracy and robustness of time series forecasts. The proposed approach is demonstrated through its application to climate science data, showing significant improvements over traditional methods that do not account for confounders.

Via

Access Paper or Ask Questions

Expanding Chatbot Knowledge in Customer Service: Context-Aware Similar Question Generation Using Large Language Models

Oct 16, 2024

Mengze Hong, Yuanfeng Song, Di Jiang, Lu Wang, Zichang Guo, Chen Jason Zhang

Figure 1 for Expanding Chatbot Knowledge in Customer Service: Context-Aware Similar Question Generation Using Large Language Models

Figure 2 for Expanding Chatbot Knowledge in Customer Service: Context-Aware Similar Question Generation Using Large Language Models

Figure 3 for Expanding Chatbot Knowledge in Customer Service: Context-Aware Similar Question Generation Using Large Language Models

Figure 4 for Expanding Chatbot Knowledge in Customer Service: Context-Aware Similar Question Generation Using Large Language Models

Abstract:Reliable responses of service chatbots are often achieved by employing retrieval-based methods that restrict answers to a knowledge base comprising predefined question-answer pairs (QA pairs). To accommodate potential variations in how a customer's query may be expressed, it emerges as the favored solution to augment these QA pairs with similar questions that are possibly diverse while remaining semantic consistency. This augmentation task is known as Similar Question Generation (SQG). Traditional methods that heavily rely on human efforts or rule-based techniques suffer from limited diversity or significant semantic deviation from the source question, only capable of producing a finite number of useful questions. To address these limitations, we propose an SQG approach based on Large Language Models (LLMs), capable of producing a substantial number of diverse questions while maintaining semantic consistency to the source QA pair. This is achieved by leveraging LLMs' natural language understanding capability through fine-tuning with specially designed prompts. The experiments conducted on a real customer-service dataset demonstrate that our method surpasses baseline methods by a significant margin in terms of semantic diversity. Human evaluation further confirms that integrating the answer that reflects the customer's intention is crucial for increasing the number of generated questions that meet business requirements.

Via

Access Paper or Ask Questions

Neural-Bayesian Program Learning for Few-shot Dialogue Intent Parsing

Oct 08, 2024

Mengze Hong, Di Jiang, Yuanfeng Song, Chen Jason Zhang

Figure 1 for Neural-Bayesian Program Learning for Few-shot Dialogue Intent Parsing

Figure 2 for Neural-Bayesian Program Learning for Few-shot Dialogue Intent Parsing

Figure 3 for Neural-Bayesian Program Learning for Few-shot Dialogue Intent Parsing

Abstract:With the growing importance of customer service in contemporary business, recognizing the intents behind service dialogues has become essential for the strategic success of enterprises. However, the nature of dialogue data varies significantly across different scenarios, and implementing an intent parser for a specific domain often involves tedious feature engineering and a heavy workload of data labeling. In this paper, we propose a novel Neural-Bayesian Program Learning model named Dialogue-Intent Parser (DI-Parser), which specializes in intent parsing under data-hungry settings and offers promising performance improvements. DI-Parser effectively utilizes data from multiple sources in a "Learning to Learn" manner and harnesses the "wisdom of the crowd" through few-shot learning capabilities on human-annotated datasets. Experimental results demonstrate that DI-Parser outperforms state-of-the-art deep learning models and offers practical advantages for industrial-scale applications.

Via

Access Paper or Ask Questions

Auto-Demo Prompting: Leveraging Generated Outputs as Demonstrations for Enhanced Batch Prompting

Oct 02, 2024

Longyu Feng, Mengze Hong, Chen Jason Zhang

Figure 1 for Auto-Demo Prompting: Leveraging Generated Outputs as Demonstrations for Enhanced Batch Prompting

Figure 2 for Auto-Demo Prompting: Leveraging Generated Outputs as Demonstrations for Enhanced Batch Prompting

Figure 3 for Auto-Demo Prompting: Leveraging Generated Outputs as Demonstrations for Enhanced Batch Prompting

Figure 4 for Auto-Demo Prompting: Leveraging Generated Outputs as Demonstrations for Enhanced Batch Prompting

Abstract:Batch prompting is a common technique in large language models (LLMs) used to process multiple inputs simultaneously, aiming to improve computational efficiency. However, as batch sizes increase, performance degradation often occurs due to the model's difficulty in handling lengthy context inputs. Existing methods that attempt to mitigate these issues rely solely on batch data arrangement and majority voting rather than improving the design of the batch prompt itself. In this paper, we address these limitations by proposing "Auto-Demo Prompting," a novel approach that leverages the question-output pairs from earlier questions within a batch as demonstrations for subsequent answer inference. We provide a formal theoretical analysis of how Auto-Demo Prompting functions within the autoregressive generation process of LLMs, illustrating how it utilizes prior outputs to optimize the model's internal representations. Our method effectively bridges the gap between batch prompting and few-shot prompting, enhancing performance with only a slight compromise in token usage. Experimental results across five NLP tasks demonstrate its effectiveness in mitigating performance degradation and occasionally outperforming single prompts. Furthermore, it opens new avenues for applying few-shot learning techniques, such as demonstration selection, within batch prompting, making it a robust solution for real-world applications.

Via

Access Paper or Ask Questions

InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries

Sep 29, 2024

Mengze Hong, Chen Jason Zhang, Lingxiao Yang, Yuanfeng Song, Di Jiang

Figure 1 for InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries

Figure 2 for InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries

Figure 3 for InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries

Figure 4 for InfantCryNet: A Data-driven Framework for Intelligent Analysis of Infant Cries

Abstract:Understanding the meaning of infant cries is a significant challenge for young parents in caring for their newborns. The presence of background noise and the lack of labeled data present practical challenges in developing systems that can detect crying and analyze its underlying reasons. In this paper, we present a novel data-driven framework, "InfantCryNet," for accomplishing these tasks. To address the issue of data scarcity, we employ pre-trained audio models to incorporate prior knowledge into our model. We propose the use of statistical pooling and multi-head attention pooling techniques to extract features more effectively. Additionally, knowledge distillation and model quantization are applied to enhance model efficiency and reduce the model size, better supporting industrial deployment in mobile devices. Experiments on real-life datasets demonstrate the superior performance of the proposed framework, outperforming state-of-the-art baselines by 4.4% in classification accuracy. The model compression effectively reduces the model size by 7% without compromising performance and by up to 28% with only an 8% decrease in accuracy, offering practical insights for model selection and system design.

Via

Access Paper or Ask Questions

Pairwise Judgment Formulation for Semantic Embedding Model in Web Search

Aug 08, 2024

Mengze Hong, Chen Jason Zhang

Figure 1 for Pairwise Judgment Formulation for Semantic Embedding Model in Web Search

Figure 2 for Pairwise Judgment Formulation for Semantic Embedding Model in Web Search

Figure 3 for Pairwise Judgment Formulation for Semantic Embedding Model in Web Search

Figure 4 for Pairwise Judgment Formulation for Semantic Embedding Model in Web Search

Abstract:Semantic Embedding Model (SEM), a neural network-based Siamese architecture, is gaining momentum in information retrieval and natural language processing. In order to train SEM in a supervised fashion for Web search, the search engine query log is typically utilized to automatically formulate pairwise judgments as training data. Despite the growing application of semantic embeddings in the search engine industry, little work has been done on formulating effective pairwise judgments for training SEM. In this paper, we make the first in-depth investigation of a wide range of strategies for generating pairwise judgments for SEM. An interesting (perhaps surprising) discovery reveals that the conventional pairwise judgment formulation strategy wildly used in the field of pairwise Learning-to-Rank (LTR) is not necessarily effective for training SEM. Through a large-scale empirical study based on query logs and click-through activities from a major commercial search engine, we demonstrate the effective strategies for SEM and highlight the advantages of a hybrid heuristic (i.e., Clicked > Non-Clicked) in comparison to the atomic heuristics (e.g., Clicked > Skipped) in LTR. We conclude with best practices for training SEM and offer promising insights for future research.

Via

Access Paper or Ask Questions