Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yao Shi

Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking

May 16, 2025

Changlun Li, Yao Shi, Chen Wang, Qiqi Duan, Runke Ruan, Weijie Huang, Haonan Long, Lijun Huang, Yuyu Luo, Nan Tang

Abstract:Large Language Models (LLMs) have demonstrated notable capabilities across financial tasks, including financial report summarization, earnings call transcript analysis, and asset classification. However, their real-world effectiveness in managing complex fund investment remains inadequately assessed. A fundamental limitation of existing benchmarks for evaluating LLM-driven trading strategies is their reliance on historical back-testing, inadvertently enabling LLMs to "time travel"-leveraging future information embedded in their training corpora, thus resulting in possible information leakage and overly optimistic performance estimates. To address this issue, we introduce DeepFund, a live fund benchmark tool designed to rigorously evaluate LLM in real-time market conditions. Utilizing a multi-agent architecture, DeepFund connects directly with real-time stock market data-specifically data published after each model pretraining cutoff-to ensure fair and leakage-free evaluations. Empirical tests on nine flagship LLMs from leading global institutions across multiple investment dimensions-including ticker-level analysis, investment decision-making, portfolio management, and risk control-reveal significant practical challenges. Notably, even cutting-edge models such as DeepSeek-V3 and Claude-3.7-Sonnet incur net trading losses within DeepFund real-time evaluation environment, underscoring the present limitations of LLMs for active fund management. Our code is available at https://github.com/HKUSTDial/DeepFund.

* 21 pages, 9 figures

Via

Access Paper or Ask Questions

EducationQ: Evaluating LLMs' Teaching Capabilities Through Multi-Agent Dialogue Framework

Apr 21, 2025

Yao Shi, Rongkeng Liang, Yong Xu

Abstract:Large language models (LLMs) increasingly serve as educational tools, yet evaluating their teaching capabilities remains challenging due to the resource-intensive, context-dependent, and methodologically complex nature of teacher-student interactions. We introduce EducationQ, a multi-agent dialogue framework that efficiently assesses teaching capabilities through simulated dynamic educational scenarios, featuring specialized agents for teaching, learning, and evaluation. Testing 14 LLMs across major AI Organizations (OpenAI, Meta, Google, Anthropic, and others) on 1,498 questions spanning 13 disciplines and 10 difficulty levels reveals that teaching effectiveness does not correlate linearly with model scale or general reasoning capabilities - with some smaller open-source models outperforming larger commercial counterparts in teaching contexts. This finding highlights a critical gap in current evaluations that prioritize knowledge recall over interactive pedagogy. Our mixed-methods evaluation, combining quantitative metrics with qualitative analysis and expert case studies, identifies distinct pedagogical strengths employed by top-performing models (e.g., sophisticated questioning strategies, adaptive feedback mechanisms). Human expert evaluations show 78% agreement with our automated qualitative analysis of effective teaching behaviors, validating our methodology. EducationQ demonstrates that LLMs-as-teachers require specialized optimization beyond simple scaling, suggesting next-generation educational AI prioritize targeted enhancement of specific pedagogical effectiveness.

Via

Access Paper or Ask Questions

ViDTA: Enhanced Drug-Target Affinity Prediction via Virtual Graph Nodes and Attention-based Feature Fusion

Dec 27, 2024

Minghui Li, Zikang Guo, Yang Wu, Peijin Guo, Yao Shi, Shengshan Hu, Wei Wan, Shengqing Hu

Figure 1 for ViDTA: Enhanced Drug-Target Affinity Prediction via Virtual Graph Nodes and Attention-based Feature Fusion

Figure 2 for ViDTA: Enhanced Drug-Target Affinity Prediction via Virtual Graph Nodes and Attention-based Feature Fusion

Figure 3 for ViDTA: Enhanced Drug-Target Affinity Prediction via Virtual Graph Nodes and Attention-based Feature Fusion

Figure 4 for ViDTA: Enhanced Drug-Target Affinity Prediction via Virtual Graph Nodes and Attention-based Feature Fusion

Abstract:Drug-target interaction is fundamental in understanding how drugs affect biological systems, and accurately predicting drug-target affinity (DTA) is vital for drug discovery. Recently, deep learning methods have emerged as a significant approach for estimating the binding strength between drugs and target proteins. However, existing methods simply utilize the drug's local information from molecular topology rather than global information. Additionally, the features of drugs and proteins are usually fused with a simple concatenation operation, limiting their effectiveness. To address these challenges, we proposed ViDTA, an enhanced DTA prediction framework. We introduce virtual nodes into the Graph Neural Network (GNN)-based drug feature extraction network, which acts as a global memory to exchange messages more efficiently. By incorporating virtual graph nodes, we seamlessly integrate local and global features of drug molecular structures, expanding the GNN's receptive field. Additionally, we propose an attention-based linear feature fusion network for better capturing the interaction information between drugs and proteins. Experimental results evaluated on various benchmarks including Davis, Metz, and KIBA demonstrate that our proposed ViDTA outperforms the state-of-the-art baselines.

* Accepted by International Conference on Bioinformatics and Biomedicine (BIBM 24)

Via

Access Paper or Ask Questions

Distributed satellite information networks: Architecture, enabling technologies, and trends

Dec 17, 2024

Qinyu Zhang, Liang Xu, Jianhao Huang, Tao Yang, Jian Jiao, Ye Wang, Yao Shi, Chiya Zhang, Xingjian Zhang, Ke Zhang(+16 more)

Figure 1 for Distributed satellite information networks: Architecture, enabling technologies, and trends

Figure 2 for Distributed satellite information networks: Architecture, enabling technologies, and trends

Figure 3 for Distributed satellite information networks: Architecture, enabling technologies, and trends

Figure 4 for Distributed satellite information networks: Architecture, enabling technologies, and trends

Abstract:Driven by the vision of ubiquitous connectivity and wireless intelligence, the evolution of ultra-dense constellation-based satellite-integrated Internet is underway, now taking preliminary shape. Nevertheless, the entrenched institutional silos and limited, nonrenewable heterogeneous network resources leave current satellite systems struggling to accommodate the escalating demands of next-generation intelligent applications. In this context, the distributed satellite information networks (DSIN), exemplified by the cohesive clustered satellites system, have emerged as an innovative architecture, bridging information gaps across diverse satellite systems, such as communication, navigation, and remote sensing, and establishing a unified, open information network paradigm to support resilient space information services. This survey first provides a profound discussion about innovative network architectures of DSIN, encompassing distributed regenerative satellite network architecture, distributed satellite computing network architecture, and reconfigurable satellite formation flying, to enable flexible and scalable communication, computing and control. The DSIN faces challenges from network heterogeneity, unpredictable channel dynamics, sparse resources, and decentralized collaboration frameworks. To address these issues, a series of enabling technologies is identified, including channel modeling and estimation, cloud-native distributed MIMO cooperation, grant-free massive access, network routing, and the proper combination of all these diversity techniques. Furthermore, to heighten the overall resource efficiency, the cross-layer optimization techniques are further developed to meet upper-layer deterministic, adaptive and secure information services requirements. In addition, emerging research directions and new opportunities are highlighted on the way to achieving the DSIN vision.

Via

Access Paper or Ask Questions

Adversarial Attacks and Robust Defenses in Speaker Embedding based Zero-Shot Text-to-Speech System

Oct 05, 2024

Ze Li, Yao Shi, Yunfei Xu, Ming Li

Abstract:Speaker embedding based zero-shot Text-to-Speech (TTS) systems enable high-quality speech synthesis for unseen speakers using minimal data. However, these systems are vulnerable to adversarial attacks, where an attacker introduces imperceptible perturbations to the original speaker's audio waveform, leading to synthesized speech sounds like another person. This vulnerability poses significant security risks, including speaker identity spoofing and unauthorized voice manipulation. This paper investigates two primary defense strategies to address these threats: adversarial training and adversarial purification. Adversarial training enhances the model's robustness by integrating adversarial examples during the training process, thereby improving resistance to such attacks. Adversarial purification, on the other hand, employs diffusion probabilistic models to revert adversarially perturbed audio to its clean form. Experimental results demonstrate that these defense mechanisms can significantly reduce the impact of adversarial perturbations, enhancing the security and reliability of speaker embedding based zero-shot TTS systems in adversarial environments.

Via

Access Paper or Ask Questions

BiSinger: Bilingual Singing Voice Synthesis

Sep 29, 2023

Huali Zhou, Yueqian Lin, Yao Shi, Peng Sun, Ming Li

Figure 1 for BiSinger: Bilingual Singing Voice Synthesis

Figure 2 for BiSinger: Bilingual Singing Voice Synthesis

Figure 3 for BiSinger: Bilingual Singing Voice Synthesis

Figure 4 for BiSinger: Bilingual Singing Voice Synthesis

Abstract:Although Singing Voice Synthesis (SVS) has made great strides with Text-to-Speech (TTS) techniques, multilingual singing voice modeling remains relatively unexplored. This paper presents BiSinger, a bilingual pop SVS system for English and Chinese Mandarin. Current systems require separate models per language and cannot accurately represent both Chinese and English, hindering code-switch SVS. To address this gap, we design a shared representation between Chinese and English singing voices, achieved by using the CMU dictionary with mapping rules. We fuse monolingual singing datasets with open-source singing voice conversion techniques to generate bilingual singing voices while also exploring the potential use of bilingual speech data. Experiments affirm that our language-independent representation and incorporation of related datasets enable a single model with enhanced performance in English and code-switch SVS while maintaining Chinese song performance. Audio samples are available at https://bisinger-svs.github.io.

* Accepted by ASRU2023

Via

Access Paper or Ask Questions

VoiceLens: Controllable Speaker Generation and Editing with Flow

Sep 25, 2023

Yao Shi, Ming Li

Abstract:Currently, many multi-speaker speech synthesis and voice conversion systems address speaker variations with an embedding vector. Modeling it directly allows new voices outside of training data to be synthesized. GMM based approaches such as Tacospawn are favored in literature for this generation task, but there are still some limitations when difficult conditionings are involved. In this paper, we propose VoiceLens, a semi-supervised flow-based approach, to model speaker embedding distributions for multi-conditional speaker generation. VoiceLens maps speaker embeddings into a combination of independent attributes and residual information. It allows new voices associated with certain attributes to be \textit{generated} for existing TTS models, and attributes of known voices to be meaningfully \textit{edited}. We show in this paper, VoiceLens displays an unconditional generation capacity that is similar to Tacospawn while obtaining higher controllability and flexibility when used in a conditional manner. In addition, we show synthesizing less noisy speech from known noisy speakers without re-training the TTS model is possible via solely editing their embeddings with a SNR conditioned VoiceLens model. Demos are available at sos1sos2sixteen.github.io/voicelens.

Via

Access Paper or Ask Questions

BrickPal: Augmented Reality-based Assembly Instructions for Brick Models

Jul 06, 2023

Yao Shi, Xiaofeng Zhang, Ran zhang, Zhou Yang, Xiao Tang, Hongni Ye, Yi Wu

Abstract:The assembly instruction is a mandatory component of Lego-like brick sets.The conventional production of assembly instructions requires a considerable amount of manual fine-tuning, which is intractable for casual users and customized brick sets.Moreover, the traditional paper-based instructions lack expressiveness and interactivity.To tackle the two problems above, we present BrickPal, an augmented reality-based system, which visualizes assembly instructions in an augmented reality head-mounted display. It utilizes Natural Language Processing (NLP) techniques to generate plausible assembly sequences, and provide real-time guidance in the AR headset.Our user study demonstrates BrickPal's effectiveness at assisting users in brick assembly compared to traditional assembly methods. Additionally, the NLP algorithm-generated assembly sequences achieve the same usability with manually adapted sequences.

* 9 pages,7 figures. Project URL: https://origami.dance/brickpal

Via

Access Paper or Ask Questions

On the Application of Uplink/Downlink Decoupled Access in Heterogeneous Mobile Edge Computing

Sep 29, 2021

Yao Shi, Emad Alsusa, Mohammed W. Baidas

Figure 1 for On the Application of Uplink/Downlink Decoupled Access in Heterogeneous Mobile Edge Computing

Figure 2 for On the Application of Uplink/Downlink Decoupled Access in Heterogeneous Mobile Edge Computing

Figure 3 for On the Application of Uplink/Downlink Decoupled Access in Heterogeneous Mobile Edge Computing

Figure 4 for On the Application of Uplink/Downlink Decoupled Access in Heterogeneous Mobile Edge Computing

Abstract:Mobile edge computing (MEC) is a key player in low latency 5G networks with the task to resolve the conflict between computationally-intensive mobile applications and resource-limited mobile devices (MDs). As such, there has been intense interest in this topic, especially in multi-user single-server and homogeneous multi-server scenarios. However, the research in the heterogeneous multi-server scenario is limited, where the servers are located at small base-stations (SBSs), macro base-stations (MBSs), or the cloud with different computing and communication capabilities. On the other hand, computational-tasks offloading is limited by the type of MD-BS association with almost all previous works focusing on offloading the MD's computational tasks to the MEC servers/cloudlets at its serving BS. However, in multi-BS association, or downlink/uplink decoupled (DUDe) scenarios, an MD can be served by multiple BSs and hence has multiple offloading choices. Motivated by this, we proposed a joint BS association and subchannel allocation algorithm based on a student-project allocation (SPA) matching approach to minimize the network sum-latency, which break the constraint that one MD must connect to the same BS in the UL and DL, and jointly consider the communication and computational disparity of SBS and MBS cloudlets in heterogeneous MEC networks. Moreover, an optimal power allocation scheme is proposed to optimize the system performance subject to the predefined quality of service constraints. Our results show that the proposed scheme is superior to benchmark techniques in enabling effective use of the computational and communication resources in heterogeneous MEC networks.

Via

Access Paper or Ask Questions