Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anbang Xu

Adaptive Data Flywheel: Applying MAPE Control Loops to AI Agent Improvement

Oct 30, 2025

Aaditya Shukla, Sidney Knowles, Meenakshi Madugula, Dave Farris, Ryan Angilly, Santiago Pombo, Anbang Xu, Lu An, Abhinav Balasubramanian, Tan Yu(+2 more)

Abstract:Enterprise AI agents must continuously adapt to maintain accuracy, reduce latency, and remain aligned with user needs. We present a practical implementation of a data flywheel in NVInfo AI, NVIDIA's Mixture-of-Experts (MoE) Knowledge Assistant serving over 30,000 employees. By operationalizing a MAPE-driven data flywheel, we built a closed-loop system that systematically addresses failures in retrieval-augmented generation (RAG) pipelines and enables continuous learning. Over a 3-month post-deployment period, we monitored feedback and collected 495 negative samples. Analysis revealed two major failure modes: routing errors (5.25\%) and query rephrasal errors (3.2\%). Using NVIDIA NeMo microservices, we implemented targeted improvements through fine-tuning. For routing, we replaced a Llama 3.1 70B model with a fine-tuned 8B variant, achieving 96\% accuracy, a 10x reduction in model size, and 70\% latency improvement. For query rephrasal, fine-tuning yielded a 3.7\% gain in accuracy and a 40\% latency reduction. Our approach demonstrates how human-in-the-loop (HITL) feedback, when structured within a data flywheel, transforms enterprise AI agents into self-improving systems. Key learnings include approaches to ensure agent robustness despite limited user feedback, navigating privacy constraints, and executing staged rollouts in production. This work offers a repeatable blueprint for building robust, adaptive enterprise AI agents capable of learning from real-world usage at scale.

* 20 pages, 5 figures, 5 tables. Presents MAPE-K control loop application to enterprise AI agent improvement with experimental validation on NVIDIA's NVInfo AI system

Via

Access Paper or Ask Questions

In Defense of RAG in the Era of Long-Context Language Models

Sep 03, 2024

Tan Yu, Anbang Xu, Rama Akkiraju

Figure 1 for In Defense of RAG in the Era of Long-Context Language Models

Figure 2 for In Defense of RAG in the Era of Long-Context Language Models

Figure 3 for In Defense of RAG in the Era of Long-Context Language Models

Figure 4 for In Defense of RAG in the Era of Long-Context Language Models

Abstract:Overcoming the limited context limitations in early-generation LLMs, retrieval-augmented generation (RAG) has been a reliable solution for context-based answer generation in the past. Recently, the emergence of long-context LLMs allows the models to incorporate much longer text sequences, making RAG less attractive. Recent studies show that long-context LLMs significantly outperform RAG in long-context applications. Unlike the existing works favoring the long-context LLM over RAG, we argue that the extremely long context in LLMs suffers from a diminished focus on relevant information and leads to potential degradation in answer quality. This paper revisits the RAG in long-context answer generation. We propose an order-preserve retrieval-augmented generation (OP-RAG) mechanism, which significantly improves the performance of RAG for long-context question-answer applications. With OP-RAG, as the number of retrieved chunks increases, the answer quality initially rises, and then declines, forming an inverted U-shaped curve. There exist sweet points where OP-RAG could achieve higher answer quality with much less tokens than long-context LLM taking the whole context as input. Extensive experiments on public benchmark demonstrate the superiority of our OP-RAG.

Via

Access Paper or Ask Questions

FACTS About Building Retrieval Augmented Generation-based Chatbots

Jul 10, 2024

Rama Akkiraju, Anbang Xu, Deepak Bora, Tan Yu, Lu An, Vishal Seth, Aaditya Shukla, Pritam Gundecha, Hridhay Mehta, Ashwin Jha(+28 more)

Abstract:Enterprise chatbots, powered by generative AI, are emerging as key applications to enhance employee productivity. Retrieval Augmented Generation (RAG), Large Language Models (LLMs), and orchestration frameworks like Langchain and Llamaindex are crucial for building these chatbots. However, creating effective enterprise chatbots is challenging and requires meticulous RAG pipeline engineering. This includes fine-tuning embeddings and LLMs, extracting documents from vector databases, rephrasing queries, reranking results, designing prompts, honoring document access controls, providing concise responses, including references, safeguarding personal information, and building orchestration agents. We present a framework for building RAG-based chatbots based on our experience with three NVIDIA chatbots: for IT/HR benefits, financial earnings, and general content. Our contributions are three-fold: introducing the FACTS framework (Freshness, Architectures, Cost, Testing, Security), presenting fifteen RAG pipeline control points, and providing empirical results on accuracy-latency tradeoffs between large and small LLMs. To the best of our knowledge, this is the first paper of its kind that provides a holistic view of the factors as well as solutions for building secure enterprise-grade chatbots."

* 8 pages, 6 figures, 2 tables, Preprint submission to ACM CIKM 2024

Via

Access Paper or Ask Questions

An Empirical Study of Factors Affecting Language-Independent Models

Dec 30, 2019

Xiaotong Liu, Yingbei Tong, Anbang Xu, Rama Akkiraju

Figure 1 for An Empirical Study of Factors Affecting Language-Independent Models

Figure 2 for An Empirical Study of Factors Affecting Language-Independent Models

Figure 3 for An Empirical Study of Factors Affecting Language-Independent Models

Figure 4 for An Empirical Study of Factors Affecting Language-Independent Models

Abstract:Scaling existing applications and solutions to multiple human languages has traditionally proven to be difficult, mainly due to the language-dependent nature of preprocessing and feature engineering techniques employed in traditional approaches. In this work, we empirically investigate the factors affecting language-independent models built with multilingual representations, including task type, language set and data resource. On two most representative NLP tasks -- sentence classification and sequence labeling, we show that language-independent models can be comparable to or even outperforms the models trained using monolingual data, and they are generally more effective on sentence classification. We experiment language-independent models with many different languages and show that they are more suitable for typologically similar languages. We also explore the effects of different data sizes when training and testing language-independent models, and demonstrate that they are not only suitable for high-resource languages, but also very effective in low-resource languages.

Via

Access Paper or Ask Questions

Characterizing machine learning process: A maturity framework

Nov 12, 2018

Rama Akkiraju, Vibha Sinha, Anbang Xu, Jalal Mahmud, Pritam Gundecha, Zhe Liu, Xiaotong Liu, John Schumacher

Figure 1 for Characterizing machine learning process: A maturity framework

Abstract:Academic literature on machine learning modeling fails to address how to make machine learning models work for enterprises. For example, existing machine learning processes cannot address how to define business use cases for an AI application, how to convert business requirements from offering managers into data requirements for data scientists, and how to continuously improve AI applications in term of accuracy and fairness, and how to customize general purpose machine learning models with industry, domain, and use case specific data to make them more accurate for specific situations etc. Making AI work for enterprises requires special considerations, tools, methods and processes. In this paper we present a maturity framework for machine learning model lifecycle management for enterprises. Our framework is a re-interpretation of the software Capability Maturity Model (CMM) for machine learning model development process. We present a set of best practices from our personal experience of building large scale real-world machine learning models to help organizations achieve higher levels of maturity independent of their starting point.

* 10 pages, 1 figure, 1 table

Via

Access Paper or Ask Questions

Challenge AI Mind: A Crowd System for Proactive AI Testing

Oct 21, 2018

Siwei Fu, Anbang Xu, Xiaotong Liu, Huimin Zhou, Rama Akkiraju

Figure 1 for Challenge AI Mind: A Crowd System for Proactive AI Testing

Figure 2 for Challenge AI Mind: A Crowd System for Proactive AI Testing

Figure 3 for Challenge AI Mind: A Crowd System for Proactive AI Testing

Figure 4 for Challenge AI Mind: A Crowd System for Proactive AI Testing

Abstract:Artificial Intelligence (AI) has burrowed into our lives in various aspects; however, without appropriate testing, deployed AI systems are often being criticized to fail in critical and embarrassing cases. Existing testing approaches mainly depend on fixed and pre-defined datasets, providing a limited testing coverage. In this paper, we propose the concept of proactive testing to dynamically generate testing data and evaluate the performance of AI systems. We further introduce Challenge.AI, a new crowd system that features the integration of crowdsourcing and machine learning techniques in the process of error generation, error validation, error categorization, and error analysis. We present experiences and insights into a participatory design with AI developers. The evaluation shows that the crowd workflow is more effective with the help of machine learning techniques. AI developers found that our system can help them discover unknown errors made by the AI models, and engage in the process of proactive testing.

* a 10-page full paper

Via

Access Paper or Ask Questions

25 Tweets to Know You: A New Model to Predict Personality with Social Media

Apr 18, 2017

Pierre-Hadrien Arnoux, Anbang Xu, Neil Boyette, Jalal Mahmud, Rama Akkiraju, Vibha Sinha

Figure 1 for 25 Tweets to Know You: A New Model to Predict Personality with Social Media

Figure 2 for 25 Tweets to Know You: A New Model to Predict Personality with Social Media

Figure 3 for 25 Tweets to Know You: A New Model to Predict Personality with Social Media

Abstract:Predicting personality is essential for social applications supporting human-centered activities, yet prior modeling methods with users written text require too much input data to be realistically used in the context of social media. In this work, we aim to drastically reduce the data requirement for personality modeling and develop a model that is applicable to most users on Twitter. Our model integrates Word Embedding features with Gaussian Processes regression. Based on the evaluation of over 1.3K users on Twitter, we find that our model achieves comparable or better accuracy than state of the art techniques with 8 times fewer data.

* Accepted as a short paper at ICWSM 2017. Please cite the ICWSM version and not the ArXiv version

Via

Access Paper or Ask Questions

Fostering User Engagement: Rhetorical Devices for Applause Generation Learnt from TED Talks

Apr 17, 2017

Zhe Liu, Anbang Xu, Mengdi Zhang, Jalal Mahmud, Vibha Sinha

Figure 1 for Fostering User Engagement: Rhetorical Devices for Applause Generation Learnt from TED Talks

Figure 2 for Fostering User Engagement: Rhetorical Devices for Applause Generation Learnt from TED Talks

Figure 3 for Fostering User Engagement: Rhetorical Devices for Applause Generation Learnt from TED Talks

Figure 4 for Fostering User Engagement: Rhetorical Devices for Applause Generation Learnt from TED Talks

Abstract:One problem that every presenter faces when delivering a public discourse is how to hold the listeners' attentions or to keep them involved. Therefore, many studies in conversation analysis work on this issue and suggest qualitatively con-structions that can effectively lead to audience's applause. To investigate these proposals quantitatively, in this study we an-alyze the transcripts of 2,135 TED Talks, with a particular fo-cus on the rhetorical devices that are used by the presenters for applause elicitation. Through conducting regression anal-ysis, we identify and interpret 24 rhetorical devices as triggers of audience applauding. We further build models that can rec-ognize applause-evoking sentences and conclude this work with potential implications.

Via

Access Paper or Ask Questions