Picture for Linli Xu

Linli Xu

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Add code
Nov 04, 2024
Figure 1 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 2 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 3 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 4 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Viaarxiv icon

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

Add code
Oct 16, 2024
Figure 1 for Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Figure 2 for Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Figure 3 for Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Figure 4 for Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Viaarxiv icon

Break the Visual Perception: Adversarial Attacks Targeting Encoded Visual Tokens of Large Vision-Language Models

Add code
Oct 09, 2024
Viaarxiv icon

Video In-context Learning

Add code
Jul 10, 2024
Viaarxiv icon

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

Add code
Jun 18, 2024
Figure 1 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 2 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 3 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 4 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Viaarxiv icon

Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer

Add code
Jun 03, 2024
Figure 1 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Figure 2 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Figure 3 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Figure 4 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Viaarxiv icon

HRVDA: High-Resolution Visual Document Assistant

Add code
Apr 10, 2024
Figure 1 for HRVDA: High-Resolution Visual Document Assistant
Figure 2 for HRVDA: High-Resolution Visual Document Assistant
Figure 3 for HRVDA: High-Resolution Visual Document Assistant
Figure 4 for HRVDA: High-Resolution Visual Document Assistant
Viaarxiv icon

Communication-Efficient Distributed Learning with Local Immediate Error Compensation

Add code
Feb 19, 2024
Figure 1 for Communication-Efficient Distributed Learning with Local Immediate Error Compensation
Figure 2 for Communication-Efficient Distributed Learning with Local Immediate Error Compensation
Figure 3 for Communication-Efficient Distributed Learning with Local Immediate Error Compensation
Figure 4 for Communication-Efficient Distributed Learning with Local Immediate Error Compensation
Viaarxiv icon

Communication-Efficient Personalized Federated Learning for Speech-to-Text Tasks

Add code
Jan 18, 2024
Viaarxiv icon

DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation

Add code
Oct 26, 2023
Viaarxiv icon