Picture for Yifei Xin

Yifei Xin

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Add code
Nov 04, 2024
Figure 1 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 2 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 3 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 4 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Viaarxiv icon

Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents

Add code
Oct 25, 2024
Figure 1 for Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
Figure 2 for Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
Figure 3 for Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
Figure 4 for Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents
Viaarxiv icon

Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents

Add code
Oct 17, 2024
Figure 1 for Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents
Figure 2 for Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents
Figure 3 for Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents
Figure 4 for Chain of Ideas: Revolutionizing Research in Novel Idea Development with LLM Agents
Viaarxiv icon

DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval

Add code
Sep 16, 2024
Viaarxiv icon

Audio-text Retrieval with Transformer-based Hierarchical Alignment and Disentangled Cross-modal Representation

Add code
Sep 14, 2024
Viaarxiv icon

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Add code
Jun 11, 2024
Figure 1 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 2 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 3 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 4 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Viaarxiv icon

SLIT: Boosting Audio-Text Pre-Training via Multi-Stage Learning and Instruction Tuning

Add code
Feb 20, 2024
Viaarxiv icon

Masked Audio Modeling with CLAP and Multi-Objective Learning

Add code
Jan 29, 2024
Viaarxiv icon

Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions

Add code
Jul 28, 2023
Viaarxiv icon

Improving Text-Audio Retrieval by Text-aware Attention Pooling and Prior Matrix Revised Loss

Add code
Mar 19, 2023
Viaarxiv icon