Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuhan Ma

Who is Undercover? Guiding LLMs to Explore Multi-Perspective Team Tactic in the Game

Oct 20, 2024

Ruiqi Dong, Zhixuan Liao, Guangwei Lai, Yuhan Ma, Danni Ma, Chenyou Fan

Figure 1 for Who is Undercover? Guiding LLMs to Explore Multi-Perspective Team Tactic in the Game

Figure 2 for Who is Undercover? Guiding LLMs to Explore Multi-Perspective Team Tactic in the Game

Figure 3 for Who is Undercover? Guiding LLMs to Explore Multi-Perspective Team Tactic in the Game

Figure 4 for Who is Undercover? Guiding LLMs to Explore Multi-Perspective Team Tactic in the Game

Abstract:Large Language Models (LLMs) are pivotal AI agents in complex tasks but still face challenges in open decision-making problems within complex scenarios. To address this, we use the language logic game ``Who is Undercover?'' (WIU) as an experimental platform to propose the Multi-Perspective Team Tactic (MPTT) framework. MPTT aims to cultivate LLMs' human-like language expression logic, multi-dimensional thinking, and self-perception in complex scenarios. By alternating speaking and voting sessions, integrating techniques like self-perspective, identity-determination, self-reflection, self-summary and multi-round find-teammates, LLM agents make rational decisions through strategic concealment and communication, fostering human-like trust. Preliminary results show that MPTT, combined with WIU, leverages LLMs' cognitive capabilities to create a decision-making framework that can simulate real society. This framework aids minority groups in communication and expression, promoting fairness and diversity in decision-making. Additionally, our Human-in-the-loop experiments demonstrate that LLMs can learn and align with human behaviors through interactive, indicating their potential for active participation in societal decision-making.

Via

Access Paper or Ask Questions

Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training

Sep 07, 2024

Yuhan Ma, Dan Sun, Erdi Gao, Ningjing Sang, Iris Li, Guanming Huang

Figure 1 for Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training

Figure 2 for Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training

Figure 3 for Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training

Figure 4 for Enhancing Deep Learning with Optimized Gradient Descent: Bridging Numerical Methods and Neural Network Training

Abstract:Optimization theory serves as a pivotal scientific instrument for achieving optimal system performance, with its origins in economic applications to identify the best investment strategies for maximizing benefits. Over the centuries, from the geometric inquiries of ancient Greece to the calculus contributions by Newton and Leibniz, optimization theory has significantly advanced. The persistent work of scientists like Lagrange, Cauchy, and von Neumann has fortified its progress. The modern era has seen an unprecedented expansion of optimization theory applications, particularly with the growth of computer science, enabling more sophisticated computational practices and widespread utilization across engineering, decision analysis, and operations research. This paper delves into the profound relationship between optimization theory and deep learning, highlighting the omnipresence of optimization problems in the latter. We explore the gradient descent algorithm and its variants, which are the cornerstone of optimizing neural networks. The chapter introduces an enhancement to the SGD optimizer, drawing inspiration from numerical optimization methods, aiming to enhance interpretability and accuracy. Our experiments on diverse deep learning tasks substantiate the improved algorithm's efficacy. The paper concludes by emphasizing the continuous development of optimization theory and its expanding role in solving intricate problems, enhancing computational capabilities, and informing better policy decisions.

Via

Access Paper or Ask Questions

Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning

Jun 13, 2024

Dan Sun, Yaxin Liang, Yining Yang, Yuhan Ma, Qishi Zhan, Erdi Gao

Figure 1 for Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning

Figure 2 for Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning

Figure 3 for Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning

Figure 4 for Research on Optimization of Natural Language Processing Model Based on Multimodal Deep Learning

Abstract:This project intends to study the image representation based on attention mechanism and multimodal data. By adding multiple pattern layers to the attribute model, the semantic and hidden layers of image content are integrated. The word vector is quantified by the Word2Vec method and then evaluated by a word embedding convolutional neural network. The published experimental results of the two groups were tested. The experimental results show that this method can convert discrete features into continuous characters, thus reducing the complexity of feature preprocessing. Word2Vec and natural language processing technology are integrated to achieve the goal of direct evaluation of missing image features. The robustness of the image feature evaluation model is improved by using the excellent feature analysis characteristics of a convolutional neural network. This project intends to improve the existing image feature identification methods and eliminate the subjective influence in the evaluation process. The findings from the simulation indicate that the novel approach has developed is viable, effectively augmenting the features within the produced representations.

Via

Access Paper or Ask Questions

Robust Dual-Modal Speech Keyword Spotting for XR Headsets

Jan 26, 2024

Zhuojiang Cai, Yuhan Ma, Feng Lu

Abstract:While speech interaction finds widespread utility within the Extended Reality (XR) domain, conventional vocal speech keyword spotting systems continue to grapple with formidable challenges, including suboptimal performance in noisy environments, impracticality in situations requiring silence, and susceptibility to inadvertent activations when others speak nearby. These challenges, however, can potentially be surmounted through the cost-effective fusion of voice and lip movement information. Consequently, we propose a novel vocal-echoic dual-modal keyword spotting system designed for XR headsets. We devise two different modal fusion approches and conduct experiments to test the system's performance across diverse scenarios. The results show that our dual-modal system not only consistently outperforms its single-modal counterparts, demonstrating higher precision in both typical and noisy environments, but also excels in accurately identifying silent utterances. Furthermore, we have successfully applied the system in real-time demonstrations, achieving promising results. The code is available at https://github.com/caizhuojiang/VE-KWS.

* Accepted to IEEE VR 2024

Via

Access Paper or Ask Questions

Sci-CoT: Leveraging Large Language Models for Enhanced Knowledge Distillation in Small Models for Scientific QA

Aug 09, 2023

Yuhan Ma, Haiqi Jiang, Chenyou Fan

Abstract:Large Language Models (LLMs) have shown outstanding performance across wide range of downstream tasks. This competency is attributed to their substantial parameter size and pre-training on extensive corpus. Moreover, LLMs have exhibited enhanced reasoning capabilities in tackling complex reasoning tasks, owing to the utilization of a method named ``Chain-of-Thought (CoT) prompting''. This method is designed to generate intermediate reasoning steps that guide the inference of the final answer. However, it is essential to highlight that these advanced reasoning abilities appear to emerge in models with a minimum of 10 billion parameters, thereby limiting its efficacy in situations where computational resources are constrained. In this paper, we investigate the possibility of transferring the reasoning capabilities of LLMs to smaller models via knowledge distillation. Specifically, we propose Sci-CoT, a two-stage framework that separates the processes of generating rationales and inferring answers. This method enables a more efficient use of rationales during the answer inference stage, leading to improved performance on scientific question-answering tasks. Utilizing Sci-CoT, our 80-million parameter model is able to exceed the performance of BLOOM-176B in the ARC-Easy dataset under the few shot setting.

Via

Access Paper or Ask Questions