Picture for Xitong Yang

Xitong Yang

Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos

Add code
Sep 30, 2024
Viaarxiv icon

GenRec: Unifying Video Generation and Recognition with Diffusion Models

Add code
Aug 27, 2024
Figure 1 for GenRec: Unifying Video Generation and Recognition with Diffusion Models
Figure 2 for GenRec: Unifying Video Generation and Recognition with Diffusion Models
Figure 3 for GenRec: Unifying Video Generation and Recognition with Diffusion Models
Figure 4 for GenRec: Unifying Video Generation and Recognition with Diffusion Models
Viaarxiv icon

Unlocking Exocentric Video-Language Data for Egocentric Video Representation Learning

Add code
Aug 07, 2024
Viaarxiv icon

Video ReCap: Recursive Captioning of Hour-Long Videos

Add code
Feb 28, 2024
Viaarxiv icon

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Add code
Nov 30, 2023
Viaarxiv icon

Building an Open-Vocabulary Video CLIP Model with Better Architectures, Optimization and Data

Add code
Oct 08, 2023
Viaarxiv icon

Towards Scalable Neural Representation for Diverse Videos

Add code
Mar 24, 2023
Viaarxiv icon

MINOTAUR: Multi-task Video Grounding From Multimodal Queries

Add code
Feb 16, 2023
Viaarxiv icon

Transforming CLIP to an Open-vocabulary Video Model via Interpolated Weight Optimization

Add code
Feb 01, 2023
Viaarxiv icon

Vision Transformers Are Good Mask Auto-Labelers

Add code
Jan 10, 2023
Viaarxiv icon