Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingyang Zhou

Avoid Recommending Out-of-Domain Items: Constrained Generative Recommendation with LLMs

May 06, 2025

Hao Liao, Wensheng Lu, Jianxun Lian, Mingqi Wu, Shuo Wang, Yong Zhang, Yitian Huang, Mingyang Zhou, Xing Xie

Abstract:Large Language Models (LLMs) have shown promise for generative recommender systems due to their transformative capabilities in user interaction. However, ensuring they do not recommend out-of-domain (OOD) items remains a challenge. We study two distinct methods to address this issue: RecLM-ret, a retrieval-based method, and RecLM-cgen, a constrained generation method. Both methods integrate seamlessly with existing LLMs to ensure in-domain recommendations. Comprehensive experiments on three recommendation datasets demonstrate that RecLM-cgen consistently outperforms RecLM-ret and existing LLM-based recommender models in accuracy while eliminating OOD recommendations, making it the preferred method for adoption. Additionally, RecLM-cgen maintains strong generalist capabilities and is a lightweight plug-and-play module for easy integration into LLMs, offering valuable practical benefits for the community. Source code is available at https://github.com/microsoft/RecAI

* 13 pages

Via

Access Paper or Ask Questions

Benchmarking Graph Learning for Drug-Drug Interaction Prediction

Oct 24, 2024

Zhenqian Shen, Mingyang Zhou, Yongqi Zhang, Quanming Yao

Abstract:Predicting drug-drug interaction (DDI) plays an important role in pharmacology and healthcare for identifying potential adverse interactions and beneficial combination therapies between drug pairs. Recently, a flurry of graph learning methods have been introduced to predict drug-drug interactions. However, evaluating existing methods has several limitations, such as the absence of a unified comparison framework for DDI prediction methods, lack of assessments in meaningful real-world scenarios, and insufficient exploration of side information usage. In order to address these unresolved limitations in the literature, we propose a DDI prediction benchmark on graph learning. We first conduct unified evaluation comparison among existing methods. To meet realistic scenarios, we further evaluate the performance of different methods in settings with new drugs involved and examine the performance across different DDI types. Component analysis is conducted on the biomedical network to better utilize side information. Through this work, we hope to provide more insights for the problem of DDI prediction. Our implementation and data is open-sourced at https://anonymous.4open.science/r/DDI-Benchmark-ACD9/.

Via

Access Paper or Ask Questions

Exploring Loss Landscapes through the Lens of Spin Glass Theory

Jul 30, 2024

Hao Liao, Wei Zhang, Zhanyi Huang, Zexiao Long, Mingyang Zhou, Xiaoqun Wu, Rui Mao, Chi Ho Yeung

Figure 1 for Exploring Loss Landscapes through the Lens of Spin Glass Theory

Figure 2 for Exploring Loss Landscapes through the Lens of Spin Glass Theory

Figure 3 for Exploring Loss Landscapes through the Lens of Spin Glass Theory

Figure 4 for Exploring Loss Landscapes through the Lens of Spin Glass Theory

Abstract:In the past decade, significant strides in deep learning have led to numerous groundbreaking applications. Despite these advancements, the understanding of the high generalizability of deep learning, especially in such an over-parametrized space, remains limited. Successful applications are often considered as empirical rather than scientific achievements. For instance, deep neural networks' (DNNs) internal representations, decision-making mechanism, absence of overfitting in an over-parametrized space, high generalizability, etc., remain less understood. This paper delves into the loss landscape of DNNs through the lens of spin glass in statistical physics, i.e. a system characterized by a complex energy landscape with numerous metastable states, to better understand how DNNs work. We investigated a single hidden layer Rectified Linear Unit (ReLU) neural network model, and introduced several protocols to examine the analogy between DNNs (trained with datasets including MNIST and CIFAR10) and spin glass. Specifically, we used (1) random walk in the parameter space of DNNs to unravel the structures in their loss landscape; (2) a permutation-interpolation protocol to study the connection between copies of identical regions in the loss landscape due to the permutation symmetry in the hidden layers; (3) hierarchical clustering to reveal the hierarchy among trained solutions of DNNs, reminiscent of the so-called Replica Symmetry Breaking (RSB) phenomenon (i.e. the Parisi solution) in analogy to spin glass; (4) finally, we examine the relationship between the degree of the ruggedness of the loss landscape of the DNN and its generalizability, showing an improvement of flattened minima.

* 21 pages, 10 figures

Via

Access Paper or Ask Questions

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

Mar 25, 2024

Kung-Hsiang Huang, Hou Pong Chan, Yi R. Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, Heng Ji

Abstract:Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increasingly being applied to chart understanding tasks. This survey paper provides a comprehensive overview of the recent developments, challenges, and future directions in chart understanding within the context of these foundation models. We review fundamental building blocks crucial for studying chart understanding tasks. Additionally, we explore various tasks and their evaluation metrics and sources of both charts and textual inputs. Various modeling strategies are then examined, encompassing both classification-based and generation-based approaches, along with tool augmentation techniques that enhance chart understanding performance. Furthermore, we discuss the state-of-the-art performance of each task and discuss how we can improve the performance. Challenges and future directions are addressed, highlighting the importance of several topics, such as domain-specific charts, lack of efforts in developing evaluation metrics, and agent-oriented settings. This survey paper serves as a comprehensive resource for researchers and practitioners in the fields of natural language processing, computer vision, and data analysis, providing valuable insights and directions for future research in chart understanding leveraging large foundation models. The studies mentioned in this paper, along with emerging new research, will be continually updated at: https://github.com/khuangaf/Awesome-Chart-Understanding.

Via

Access Paper or Ask Questions

Aligning Large Language Models for Controllable Recommendations

Mar 08, 2024

Wensheng Lu, Jianxun Lian, Wei Zhang, Guanghua Li, Mingyang Zhou, Hao Liao, Xing Xie

Abstract:Inspired by the exceptional general intelligence of Large Language Models (LLMs), researchers have begun to explore their application in pioneering the next generation of recommender systems - systems that are conversational, explainable, and controllable. However, existing literature primarily concentrates on integrating domain-specific knowledge into LLMs to enhance accuracy, often neglecting the ability to follow instructions. To address this gap, we initially introduce a collection of supervised learning tasks, augmented with labels derived from a conventional recommender model, aimed at explicitly improving LLMs' proficiency in adhering to recommendation-specific instructions. Subsequently, we develop a reinforcement learning-based alignment procedure to further strengthen LLMs' aptitude in responding to users' intentions and mitigating formatting errors. Through extensive experiments on two real-world datasets, our method markedly advances the capability of LLMs to comply with instructions within recommender systems, while sustaining a high level of accuracy performance.

* 13 pages

Via

Access Paper or Ask Questions

Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning

Dec 15, 2023

Kung-Hsiang Huang, Mingyang Zhou, Hou Pong Chan, Yi R. Fung, Zhenhailong Wang, Lingyu Zhang, Shih-Fu Chang, Heng Ji

Figure 1 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning

Figure 2 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning

Figure 3 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning

Figure 4 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning

Abstract:Recent advancements in large vision-language models (LVLMs) have led to significant progress in generating natural language descriptions for visual content and thus enhancing various applications. One issue with these powerful models is that they sometimes produce texts that are factually inconsistent with the visual input. While there has been some effort to mitigate such inconsistencies in natural image captioning, the factuality of generated captions for structured document images, such as charts, has not received as much scrutiny, posing a potential threat to information reliability in critical applications. This work delves into the factuality aspect by introducing a comprehensive typology of factual errors in generated chart captions. A large-scale human annotation effort provides insight into the error patterns and frequencies in captions crafted by various chart captioning models, ultimately forming the foundation of a novel dataset, CHOCOLATE. Our analysis reveals that even state-of-the-art models, including GPT-4V, frequently produce captions laced with factual inaccuracies. In response to this challenge, we establish the new task of Chart Caption Factual Error Correction and introduce CHARTVE, a model for visual entailment that outperforms proprietary and open-source LVLMs in evaluating factual consistency. Furthermore, we propose C2TFEC, an interpretable two-stage framework that excels at correcting factual errors. This work inaugurates a new domain in factual error correction for chart captions, presenting a novel evaluation mechanism, and demonstrating an effective approach to ensuring the factuality of generated chart captions.

Via

Access Paper or Ask Questions

PhyloGFN: Phylogenetic inference with generative flow networks

Oct 12, 2023

Mingyang Zhou, Zichao Yan, Elliot Layne, Nikolay Malkin, Dinghuai Zhang, Moksh Jain, Mathieu Blanchette, Yoshua Bengio

Abstract:Phylogenetics is a branch of computational biology that studies the evolutionary relationships among biological entities. Its long history and numerous applications notwithstanding, inference of phylogenetic trees from sequence data remains challenging: the high complexity of tree space poses a significant obstacle for the current combinatorial and probabilistic techniques. In this paper, we adopt the framework of generative flow networks (GFlowNets) to tackle two core problems in phylogenetics: parsimony-based and Bayesian phylogenetic inference. Because GFlowNets are well-suited for sampling complex combinatorial structures, they are a natural choice for exploring and sampling from the multimodal posterior distribution over tree topologies and evolutionary distances. We demonstrate that our amortized posterior sampler, PhyloGFN, produces diverse and high-quality evolutionary hypotheses on real benchmark datasets. PhyloGFN is competitive with prior works in marginal likelihood estimation and achieves a closer fit to the target distribution than state-of-the-art variational inference methods.

Via

Access Paper or Ask Questions

Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs

May 29, 2023

Mingyang Zhou, Yi R. Fung, Long Chen, Christopher Thomas, Heng Ji, Shih-Fu Chang

Abstract:Building cross-model intelligence that can understand charts and communicate the salient information hidden behind them is an appealing challenge in the vision and language(V+L) community. The capability to uncover the underlined table data of chart figures is a critical key to automatic chart understanding. We introduce ChartT5, a V+L model that learns how to interpret table information from chart images via cross-modal pre-training on plot table pairs. Specifically, we propose two novel pre-training objectives: Masked Header Prediction (MHP) and Masked Value Prediction (MVP) to facilitate the model with different skills to interpret the table information. We have conducted extensive experiments on chart question answering and chart summarization to verify the effectiveness of the proposed pre-training strategies. In particular, on the ChartQA benchmark, our ChartT5 outperforms the state-of-the-art non-pretraining methods by over 8% performance gains.

* Accepted by Findings of ACL 2023

Via

Access Paper or Ask Questions

Focus! Relevant and Sufficient Context Selection for News Image Captioning

Dec 01, 2022

Mingyang Zhou, Grace Luo, Anna Rohrbach, Zhou Yu

Figure 1 for Focus! Relevant and Sufficient Context Selection for News Image Captioning

Figure 2 for Focus! Relevant and Sufficient Context Selection for News Image Captioning

Figure 3 for Focus! Relevant and Sufficient Context Selection for News Image Captioning

Figure 4 for Focus! Relevant and Sufficient Context Selection for News Image Captioning

Abstract:News Image Captioning requires describing an image by leveraging additional context from a news article. Previous works only coarsely leverage the article to extract the necessary context, which makes it challenging for models to identify relevant events and named entities. In our paper, we first demonstrate that by combining more fine-grained context that captures the key named entities (obtained via an oracle) and the global context that summarizes the news, we can dramatically improve the model's ability to generate accurate news captions. This begs the question, how to automatically extract such key entities from an image? We propose to use the pre-trained vision and language retrieval model CLIP to localize the visually grounded entities in the news article and then capture the non-visual entities via an open relation extraction model. Our experiments demonstrate that by simply selecting a better context from the article, we can significantly improve the performance of existing models and achieve new state-of-the-art performance on multiple benchmarks.

* Findings of EMNLP 2022

Via

Access Paper or Ask Questions

Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment

Mar 01, 2022

Mingyang Zhou, Licheng Yu, Amanpreet Singh, Mengjiao Wang, Zhou Yu, Ning Zhang

Figure 1 for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment

Figure 2 for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment

Figure 3 for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment

Figure 4 for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment

Abstract:Vision-and-Language (V+L) pre-training models have achieved tremendous success in recent years on various multi-modal benchmarks. However, the majority of existing models require pre-training on a large set of parallel image-text data, which is costly to collect, compared to image-only or text-only data. In this paper, we explore unsupervised Vision-and-Language pre-training (UVLP) to learn the cross-modal representation from non-parallel image and text datasets. We found two key factors that lead to good unsupervised V+L pre-training without parallel data: (i) joint image-and-text input (ii) overall image-text alignment (even for non-parallel data). Accordingly, we propose a novel unsupervised V+L pre-training curriculum for non-parallel texts and images. We first construct a weakly aligned image-text corpus via a retrieval-based approach, then apply a set of multi-granular alignment pre-training tasks, including region-to-tag, region-to-phrase, and image-to-sentence alignment, to bridge the gap between the two modalities. A comprehensive ablation study shows each granularity is helpful to learn a stronger pre-trained model. We adapt our pre-trained model to a set of V+L downstream tasks, including VQA, NLVR2, Visual Entailment, and RefCOCO+. Our model achieves the state-of-art performance in all these tasks under the unsupervised setting.

* First two authors contributed equally. 13 pages, 11 figures

Via

Access Paper or Ask Questions