Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tianwei Chen

SAGEval: The frontiers of Satisfactory Agent based NLG Evaluation for reference-free open-ended text

Nov 25, 2024

Reshmi Ghosh, Tianyi Yao, Lizzy Chen, Sadid Hasan, Tianwei Chen, Dario Bernal, Huitian Jiao, H M Sajjad Hossain

Abstract:Large Language Model (LLM) integrations into applications like Microsoft365 suite and Google Workspace for creating/processing documents, emails, presentations, etc. has led to considerable enhancements in productivity and time savings. But as these integrations become more more complex, it is paramount to ensure that the quality of output from the LLM-integrated applications are relevant and appropriate for use. Identifying the need to develop robust evaluation approaches for natural language generation, wherein references/ground labels doesn't exist or isn't amply available, this paper introduces a novel framework called "SAGEval" which utilizes a critiquing Agent to provide feedback on scores generated by LLM evaluators. We show that the critiquing Agent is able to rectify scores from LLM evaluators, in absence of references/ground-truth labels, thereby reducing the need for labeled data even for complex NLG evaluation scenarios, like the generation of JSON-structured forms/surveys with responses in different styles like multiple choice, likert ratings, single choice questions, etc.

Via

Access Paper or Ask Questions

Would Deep Generative Models Amplify Bias in Future Models?

Apr 04, 2024

Tianwei Chen, Yusuke Hirota, Mayu Otani, Noa Garcia, Yuta Nakashima

Abstract:We investigate the impact of deep generative models on potential social biases in upcoming computer vision models. As the internet witnesses an increasing influx of AI-generated images, concerns arise regarding inherent biases that may accompany them, potentially leading to the dissemination of harmful content. This paper explores whether a detrimental feedback loop, resulting in bias amplification, would occur if generated images were used as the training data for future models. We conduct simulations by progressively substituting original images in COCO and CC3M datasets with images generated through Stable Diffusion. The modified datasets are used to train OpenCLIP and image captioning models, which we evaluate in terms of quality and bias. Contrary to expectations, our findings indicate that introducing generated images during training does not uniformly amplify bias. Instead, instances of bias mitigation across specific tasks are observed. We further explore the factors that may influence these phenomena, such as artifacts in image generation (e.g., blurry faces) or pre-existing biases in the original datasets.

* This paper has been accepted to CVPR 2024

Via

Access Paper or Ask Questions

Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks

Aug 23, 2022

Tianwei Chen, Noa Garcia, Mayu Otani, Chenhui Chu, Yuta Nakashima, Hajime Nagahara

Figure 1 for Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks

Figure 2 for Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks

Figure 3 for Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks

Figure 4 for Learning More May Not Be Better: Knowledge Transferability in Vision and Language Tasks

Abstract:Is more data always better to train vision-and-language models? We study knowledge transferability in multi-modal tasks. The current tendency in machine learning is to assume that by joining multiple datasets from different tasks their overall performance will improve. However, we show that not all the knowledge transfers well or has a positive impact on related tasks, even when they share a common goal. We conduct an exhaustive analysis based on hundreds of cross-experiments on 12 vision-and-language tasks categorized in 4 groups. Whereas tasks in the same group are prone to improve each other, results show that this is not always the case. Other factors such as dataset size or pre-training stage have also a great impact on how well the knowledge is transferred.

Via

Access Paper or Ask Questions