Picture for Danyang Zhang

Danyang Zhang

Is GPT-OSS All You Need? Benchmarking Large Language Models for Financial Intelligence and the Surprising Efficiency Paradox

Add code
Dec 09, 2025
Viaarxiv icon

Kunlun Anomaly Troubleshooter: Enabling Kernel-Level Anomaly Detection and Causal Reasoning for Large Model Distributed Inference

Add code
Nov 08, 2025
Viaarxiv icon

Is GPT-OSS Good? A Comprehensive Evaluation of OpenAI's Latest Open Source Models

Add code
Aug 17, 2025
Figure 1 for Is GPT-OSS Good? A Comprehensive Evaluation of OpenAI's Latest Open Source Models
Figure 2 for Is GPT-OSS Good? A Comprehensive Evaluation of OpenAI's Latest Open Source Models
Figure 3 for Is GPT-OSS Good? A Comprehensive Evaluation of OpenAI's Latest Open Source Models
Figure 4 for Is GPT-OSS Good? A Comprehensive Evaluation of OpenAI's Latest Open Source Models
Viaarxiv icon

OpenCUA: Open Foundations for Computer-Use Agents

Add code
Aug 12, 2025
Viaarxiv icon

ChemDFM-R: An Chemical Reasoner LLM Enhanced with Atomized Chemical Knowledge

Add code
Jul 30, 2025
Viaarxiv icon

ProgRM: Build Better GUI Agents with Progress Rewards

Add code
May 23, 2025
Viaarxiv icon

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Add code
Jul 15, 2024
Figure 1 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 2 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 3 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Figure 4 for Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Viaarxiv icon

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Add code
Apr 11, 2024
Figure 1 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 2 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 3 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Figure 4 for OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
Viaarxiv icon

Large Language Model Is Semi-Parametric Reinforcement Learning Agent

Add code
Jun 09, 2023
Figure 1 for Large Language Model Is Semi-Parametric Reinforcement Learning Agent
Figure 2 for Large Language Model Is Semi-Parametric Reinforcement Learning Agent
Figure 3 for Large Language Model Is Semi-Parametric Reinforcement Learning Agent
Figure 4 for Large Language Model Is Semi-Parametric Reinforcement Learning Agent
Viaarxiv icon

Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction

Add code
May 14, 2023
Figure 1 for Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction
Figure 2 for Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction
Figure 3 for Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction
Figure 4 for Mobile-Env: A Universal Platform for Training and Evaluation of Mobile Interaction
Viaarxiv icon