Picture for Zehan Qi

Zehan Qi

Tsinghua University

WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning

Add code
Nov 04, 2024
Figure 1 for WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Figure 2 for WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Figure 3 for WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Figure 4 for WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning
Viaarxiv icon

Long$^2$RAG: Evaluating Long-Context & Long-Form Retrieval-Augmented Generation with Key Point Recall

Add code
Oct 31, 2024
Viaarxiv icon

AutoGLM: Autonomous Foundation Agents for GUIs

Add code
Oct 28, 2024
Viaarxiv icon

VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents

Add code
Aug 12, 2024
Figure 1 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Figure 2 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Figure 3 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Figure 4 for VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Viaarxiv icon

DebateQA: Evaluating Question Answering on Debatable Knowledge

Add code
Aug 02, 2024
Viaarxiv icon

Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias

Add code
Jul 22, 2024
Figure 1 for Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
Figure 2 for Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
Figure 3 for Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
Figure 4 for Walking in Others' Shoes: How Perspective-Taking Guides Large Language Models in Reducing Toxicity and Bias
Viaarxiv icon

MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

Add code
Jun 20, 2024
Viaarxiv icon

ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools

Add code
Jun 18, 2024
Viaarxiv icon

Preemptive Answer "Attacks" on Chain-of-Thought Reasoning

Add code
May 31, 2024
Viaarxiv icon

NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts

Add code
May 07, 2024
Viaarxiv icon