Picture for Osamu Yoshie

Osamu Yoshie

RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World

Add code
Nov 29, 2024
Viaarxiv icon

What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance

Add code
Aug 23, 2024
Figure 1 for What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
Figure 2 for What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
Figure 3 for What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
Figure 4 for What Do You Want? User-centric Prompt Generation for Text-to-image Synthesis via Multi-turn Guidance
Viaarxiv icon

MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment

Add code
Jun 28, 2024
Figure 1 for MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
Figure 2 for MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
Figure 3 for MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
Figure 4 for MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
Viaarxiv icon

BreakGPT: A Large Language Model with Multi-stage Structure for Financial Breakout Detection

Add code
Feb 12, 2024
Figure 1 for BreakGPT: A Large Language Model with Multi-stage Structure for Financial Breakout Detection
Figure 2 for BreakGPT: A Large Language Model with Multi-stage Structure for Financial Breakout Detection
Figure 3 for BreakGPT: A Large Language Model with Multi-stage Structure for Financial Breakout Detection
Figure 4 for BreakGPT: A Large Language Model with Multi-stage Structure for Financial Breakout Detection
Viaarxiv icon

PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based 3D Object Detection

Add code
Nov 29, 2023
Viaarxiv icon

GMM: Delving into Gradient Aware and Model Perceive Depth Mining for Monocular 3D Detection

Add code
Jun 30, 2023
Viaarxiv icon

Vision Learners Meet Web Image-Text Pairs

Add code
Jan 17, 2023
Viaarxiv icon

Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective

Add code
Mar 08, 2022
Figure 1 for Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective
Figure 2 for Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective
Figure 3 for Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective
Figure 4 for Discriminability-Transferability Trade-Off: An Information-Theoretic Perspective
Viaarxiv icon

ZeroVL: A Strong Baseline for Aligning Vision-Language Representations with Limited Resources

Add code
Jan 18, 2022
Figure 1 for ZeroVL: A Strong Baseline for Aligning Vision-Language Representations with Limited Resources
Figure 2 for ZeroVL: A Strong Baseline for Aligning Vision-Language Representations with Limited Resources
Figure 3 for ZeroVL: A Strong Baseline for Aligning Vision-Language Representations with Limited Resources
Figure 4 for ZeroVL: A Strong Baseline for Aligning Vision-Language Representations with Limited Resources
Viaarxiv icon

PP-YOLOv2: A Practical Object Detector

Add code
Apr 21, 2021
Figure 1 for PP-YOLOv2: A Practical Object Detector
Figure 2 for PP-YOLOv2: A Practical Object Detector
Figure 3 for PP-YOLOv2: A Practical Object Detector
Figure 4 for PP-YOLOv2: A Practical Object Detector
Viaarxiv icon