Picture for Yaolun Zhang

Yaolun Zhang

PyBench: Evaluating LLM Agent on various real-world coding tasks

Add code
Jul 23, 2024
Figure 1 for PyBench: Evaluating LLM Agent on various real-world coding tasks
Figure 2 for PyBench: Evaluating LLM Agent on various real-world coding tasks
Figure 3 for PyBench: Evaluating LLM Agent on various real-world coding tasks
Figure 4 for PyBench: Evaluating LLM Agent on various real-world coding tasks
Viaarxiv icon

Depending on yourself when you should: Mentoring LLM with RL agents to become the master in cybersecurity games

Add code
Mar 26, 2024
Viaarxiv icon