Picture for Yongxin Zhu

Yongxin Zhu

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Add code
Nov 04, 2024
Figure 1 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 2 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 3 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 4 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Viaarxiv icon

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

Add code
Oct 16, 2024
Figure 1 for Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Figure 2 for Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Figure 3 for Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Figure 4 for Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Viaarxiv icon

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

Add code
Jun 18, 2024
Figure 1 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 2 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 3 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 4 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Viaarxiv icon

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Add code
Jun 11, 2024
Figure 1 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 2 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 3 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 4 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Viaarxiv icon

Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer

Add code
Jun 03, 2024
Figure 1 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Figure 2 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Figure 3 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Figure 4 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Viaarxiv icon

Bitformer: An efficient Transformer with bitwise operation-based attention for Big Data Analytics at low-cost low-precision devices

Add code
Nov 22, 2023
Viaarxiv icon

DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation

Add code
Oct 26, 2023
Viaarxiv icon

Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA

Add code
Apr 04, 2023
Figure 1 for Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Figure 2 for Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Figure 3 for Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Figure 4 for Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Viaarxiv icon

Difformer: Empowering Diffusion Model on Embedding Space for Text Generation

Add code
Dec 19, 2022
Viaarxiv icon

Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation

Add code
May 22, 2022
Figure 1 for Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation
Figure 2 for Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation
Figure 3 for Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation
Figure 4 for Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation
Viaarxiv icon