Picture for Bilei Zhu

Bilei Zhu

ByteComposer: a Human-like Melody Composition Method based on Language Model Agent

Add code
Mar 07, 2024
Viaarxiv icon

SLIT: Boosting Audio-Text Pre-Training via Multi-Stage Learning and Instruction Tuning

Add code
Feb 20, 2024
Viaarxiv icon

Joint Music and Language Attention Models for Zero-shot Music Tagging

Add code
Oct 16, 2023
Viaarxiv icon

ByteCover3: Accurate Cover Song Identification on Short Queries

Add code
Mar 21, 2023
Viaarxiv icon

Graph Contrastive Learning with Implicit Augmentations

Add code
Nov 07, 2022
Viaarxiv icon

HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection

Add code
Feb 02, 2022
Figure 1 for HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Figure 2 for HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Figure 3 for HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Figure 4 for HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Viaarxiv icon

Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data

Add code
Jan 12, 2022
Figure 1 for Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data
Figure 2 for Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data
Figure 3 for Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data
Figure 4 for Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data
Viaarxiv icon

Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams

Add code
Jun 21, 2021
Figure 1 for Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams
Figure 2 for Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams
Figure 3 for Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams
Figure 4 for Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams
Viaarxiv icon

ByteCover: Cover Song Identification via Multi-Loss Training

Add code
Oct 27, 2020
Figure 1 for ByteCover: Cover Song Identification via Multi-Loss Training
Figure 2 for ByteCover: Cover Song Identification via Multi-Loss Training
Figure 3 for ByteCover: Cover Song Identification via Multi-Loss Training
Figure 4 for ByteCover: Cover Song Identification via Multi-Loss Training
Viaarxiv icon

Contrastive Unsupervised Learning for Audio Fingerprinting

Add code
Oct 26, 2020
Figure 1 for Contrastive Unsupervised Learning for Audio Fingerprinting
Figure 2 for Contrastive Unsupervised Learning for Audio Fingerprinting
Figure 3 for Contrastive Unsupervised Learning for Audio Fingerprinting
Figure 4 for Contrastive Unsupervised Learning for Audio Fingerprinting
Viaarxiv icon