Picture for Huiyu Wang

Huiyu Wang

TimeRefine: Temporal Grounding with Time Refining Video LLM

Add code
Dec 12, 2024
Viaarxiv icon

MusicFlow: Cascaded Flow Matching for Text Guided Music Generation

Add code
Oct 27, 2024
Figure 1 for MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
Figure 2 for MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
Figure 3 for MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
Figure 4 for MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
Viaarxiv icon

VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs

Add code
Sep 30, 2024
Figure 1 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Figure 2 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Figure 3 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Figure 4 for VideoINSTA: Zero-shot Long Video Understanding via Informative Spatial-Temporal Reasoning with LLMs
Viaarxiv icon

Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos

Add code
Sep 30, 2024
Figure 1 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Figure 2 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Figure 3 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Figure 4 for Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Viaarxiv icon

Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning

Add code
Aug 07, 2024
Viaarxiv icon

Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data

Add code
Jul 18, 2024
Viaarxiv icon

Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes

Add code
Apr 11, 2024
Figure 1 for Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes
Figure 2 for Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes
Figure 3 for Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes
Figure 4 for Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes
Viaarxiv icon

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Add code
Nov 30, 2023
Figure 1 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 2 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 3 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 4 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Viaarxiv icon

Diffusion Models as Masked Autoencoders

Add code
Apr 06, 2023
Viaarxiv icon

Ego-Only: Egocentric Action Detection without Exocentric Pretraining

Add code
Jan 03, 2023
Viaarxiv icon