Picture for Jianbo Ma

Jianbo Ma

Gotta Hear Them All: Sound Source Aware Vision to Audio Generation

Add code
Nov 26, 2024
Figure 1 for Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Figure 2 for Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Figure 3 for Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Figure 4 for Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Viaarxiv icon

Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models

Add code
Oct 15, 2024
Figure 1 for Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
Figure 2 for Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
Figure 3 for Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
Figure 4 for Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
Viaarxiv icon

STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

Add code
Sep 17, 2024
Figure 1 for STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking
Figure 2 for STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking
Figure 3 for STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking
Figure 4 for STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking
Viaarxiv icon

Rethinking Mamba in Speech Processing by Self-Supervised Models

Add code
Sep 11, 2024
Figure 1 for Rethinking Mamba in Speech Processing by Self-Supervised Models
Figure 2 for Rethinking Mamba in Speech Processing by Self-Supervised Models
Figure 3 for Rethinking Mamba in Speech Processing by Self-Supervised Models
Figure 4 for Rethinking Mamba in Speech Processing by Self-Supervised Models
Viaarxiv icon

A unified multichannel far-field speech recognition system: combining neural beamforming with attention based end-to-end model

Add code
Jan 05, 2024
Viaarxiv icon

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models

Add code
Aug 21, 2023
Viaarxiv icon

Low latency transformers for speech processing

Add code
Feb 27, 2023
Viaarxiv icon