Picture for Po-Yao Huang

Po-Yao Huang

Altogether: Image Captioning via Re-aligning Alt-text

Add code
Oct 22, 2024
Figure 1 for Altogether: Image Captioning via Re-aligning Alt-text
Figure 2 for Altogether: Image Captioning via Re-aligning Alt-text
Figure 3 for Altogether: Image Captioning via Re-aligning Alt-text
Figure 4 for Altogether: Image Captioning via Re-aligning Alt-text
Viaarxiv icon

Self-Supervised Audio-Visual Soundscape Stylization

Add code
Sep 22, 2024
Viaarxiv icon

Text Quality-Based Pruning for Efficient Training of Language Models

Add code
Apr 26, 2024
Viaarxiv icon

MoDE: CLIP Data Experts via Clustering

Add code
Apr 24, 2024
Figure 1 for MoDE: CLIP Data Experts via Clustering
Figure 2 for MoDE: CLIP Data Experts via Clustering
Figure 3 for MoDE: CLIP Data Experts via Clustering
Figure 4 for MoDE: CLIP Data Experts via Clustering
Viaarxiv icon

VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild

Add code
Mar 25, 2024
Viaarxiv icon

Adversarially Masked Video Consistency for Unsupervised Domain Adaptation

Add code
Mar 24, 2024
Viaarxiv icon

FLAP: Fast Language-Audio Pre-training

Add code
Nov 02, 2023
Viaarxiv icon

Demystifying CLIP Data

Add code
Oct 02, 2023
Figure 1 for Demystifying CLIP Data
Figure 2 for Demystifying CLIP Data
Figure 3 for Demystifying CLIP Data
Figure 4 for Demystifying CLIP Data
Viaarxiv icon

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models

Add code
Sep 19, 2023
Figure 1 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Figure 2 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Figure 3 for AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Viaarxiv icon

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

Add code
Jun 01, 2023
Viaarxiv icon