Picture for Börje F. Karlsson

Börje F. Karlsson

Towards Proprioception-Aware Embodied Planning for Dual-Arm Humanoid Robots

Add code
Oct 09, 2025
Viaarxiv icon

DeepPHY: Benchmarking Agentic VLMs on Physical Reasoning

Add code
Aug 07, 2025
Viaarxiv icon

Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

Add code
Mar 16, 2025
Viaarxiv icon

Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia

Add code
Mar 10, 2025
Figure 1 for Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
Figure 2 for Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
Figure 3 for Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
Figure 4 for Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia
Viaarxiv icon

Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning

Add code
Mar 10, 2025
Figure 1 for Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning
Figure 2 for Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning
Figure 3 for Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning
Figure 4 for Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning
Viaarxiv icon

INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge

Add code
Nov 29, 2024
Figure 1 for INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Figure 2 for INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Figure 3 for INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Figure 4 for INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Viaarxiv icon

MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents

Add code
Oct 04, 2024
Figure 1 for MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
Figure 2 for MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
Figure 3 for MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
Figure 4 for MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents
Viaarxiv icon

SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages

Add code
Jun 14, 2024
Figure 1 for SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Figure 2 for SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Figure 3 for SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Figure 4 for SEACrowd: A Multilingual Multimodal Data Hub and Benchmark Suite for Southeast Asian Languages
Viaarxiv icon

A Survey on Game Playing Agents and Large Models: Methods, Applications, and Challenges

Add code
Mar 15, 2024
Viaarxiv icon

Towards General Computer Control: A Multimodal Agent for Red Dead Redemption II as a Case Study

Add code
Mar 07, 2024
Viaarxiv icon