Picture for Niko Moritz

Niko Moritz

Textless Streaming Speech-to-Speech Translation using Semantic Speech Tokens

Add code
Oct 04, 2024
Viaarxiv icon

M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses

Add code
Sep 17, 2024
Figure 1 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Figure 2 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Figure 3 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Figure 4 for M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Viaarxiv icon

Effective internal language model training and fusion for factorized transducer model

Add code
Apr 02, 2024
Viaarxiv icon

AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition

Add code
Jan 18, 2024
Figure 1 for AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
Figure 2 for AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
Figure 3 for AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
Figure 4 for AGADIR: Towards Array-Geometry Agnostic Directional Speech Recognition
Viaarxiv icon

Directional Source Separation for Robust Speech Recognition on Smart Glasses

Add code
Sep 20, 2023
Viaarxiv icon

SynthVSR: Scaling Up Visual Speech Recognition With Synthetic Supervision

Add code
Apr 03, 2023
Viaarxiv icon

Streaming Audio-Visual Speech Recognition with Alignment Regularization

Add code
Nov 03, 2022
Viaarxiv icon

An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition

Add code
Apr 19, 2022
Figure 1 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 2 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 3 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Figure 4 for An Investigation of Monotonic Transducers for Large-Scale Automatic Speech Recognition
Viaarxiv icon

Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR

Add code
Mar 01, 2022
Figure 1 for Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR
Figure 2 for Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR
Figure 3 for Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR
Figure 4 for Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR
Viaarxiv icon

Sequence Transduction with Graph-based Supervision

Add code
Nov 01, 2021
Figure 1 for Sequence Transduction with Graph-based Supervision
Figure 2 for Sequence Transduction with Graph-based Supervision
Figure 3 for Sequence Transduction with Graph-based Supervision
Viaarxiv icon