Picture for Kazuhito Koishida

Kazuhito Koishida

CUA-Skill: Develop Skills for Computer Using Agent

Add code
Jan 28, 2026
Viaarxiv icon

Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems

Add code
Sep 18, 2025
Figure 1 for Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Figure 2 for Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Figure 3 for Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Figure 4 for Hierarchical Self-Attention: Generalizing Neural Attention Mechanics to Multi-Scale Problems
Viaarxiv icon

Instruction Agent: Enhancing Agent with Expert Demonstration

Add code
Sep 08, 2025
Viaarxiv icon

Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression

Add code
Feb 23, 2025
Viaarxiv icon

VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks

Add code
Oct 24, 2024
Figure 1 for VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Figure 2 for VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Figure 3 for VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Figure 4 for VideoWebArena: Evaluating Long Context Multimodal Agents with Video Understanding Web Tasks
Viaarxiv icon

Zero-Shot Text-to-Speech from Continuous Text Streams

Add code
Oct 01, 2024
Figure 1 for Zero-Shot Text-to-Speech from Continuous Text Streams
Figure 2 for Zero-Shot Text-to-Speech from Continuous Text Streams
Figure 3 for Zero-Shot Text-to-Speech from Continuous Text Streams
Figure 4 for Zero-Shot Text-to-Speech from Continuous Text Streams
Viaarxiv icon

Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale

Add code
Sep 12, 2024
Figure 1 for Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Figure 2 for Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Figure 3 for Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Figure 4 for Windows Agent Arena: Evaluating Multi-Modal OS Agents at Scale
Viaarxiv icon

LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes

Add code
Jun 05, 2024
Figure 1 for LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Figure 2 for LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Figure 3 for LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Figure 4 for LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Viaarxiv icon

Weakly-supervised Audio Separation via Bi-modal Semantic Similarity

Add code
Apr 02, 2024
Figure 1 for Weakly-supervised Audio Separation via Bi-modal Semantic Similarity
Figure 2 for Weakly-supervised Audio Separation via Bi-modal Semantic Similarity
Figure 3 for Weakly-supervised Audio Separation via Bi-modal Semantic Similarity
Figure 4 for Weakly-supervised Audio Separation via Bi-modal Semantic Similarity
Viaarxiv icon

uaMix-MAE: Efficient Tuning of Pretrained Audio Transformers with Unsupervised Audio Mixtures

Add code
Mar 14, 2024
Viaarxiv icon