Picture for Yongxin Zhu

Yongxin Zhu

HICD: Hallucination-Inducing via Attention Dispersion for Contrastive Decoding to Mitigate Hallucinations in Large Language Models

Add code
Mar 17, 2025
Viaarxiv icon

Addressing Representation Collapse in Vector Quantized Models with One Linear Layer

Add code
Nov 04, 2024
Figure 1 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 2 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 3 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Figure 4 for Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Viaarxiv icon

Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective

Add code
Oct 16, 2024
Figure 1 for Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Figure 2 for Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Figure 3 for Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Figure 4 for Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Viaarxiv icon

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

Add code
Jun 18, 2024
Figure 1 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 2 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 3 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Figure 4 for Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Viaarxiv icon

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Add code
Jun 11, 2024
Figure 1 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 2 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 3 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 4 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Viaarxiv icon

Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer

Add code
Jun 03, 2024
Figure 1 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Figure 2 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Figure 3 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Figure 4 for Generative Pre-trained Speech Language Model with Efficient Hierarchical Transformer
Viaarxiv icon

Bitformer: An efficient Transformer with bitwise operation-based attention for Big Data Analytics at low-cost low-precision devices

Add code
Nov 22, 2023
Viaarxiv icon

DiffS2UT: A Semantic Preserving Diffusion Model for Textless Direct Speech-to-Speech Translation

Add code
Oct 26, 2023
Viaarxiv icon

Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA

Add code
Apr 04, 2023
Figure 1 for Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Figure 2 for Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Figure 3 for Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Figure 4 for Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
Viaarxiv icon

Difformer: Empowering Diffusion Model on Embedding Space for Text Generation

Add code
Dec 19, 2022
Figure 1 for Difformer: Empowering Diffusion Model on Embedding Space for Text Generation
Figure 2 for Difformer: Empowering Diffusion Model on Embedding Space for Text Generation
Figure 3 for Difformer: Empowering Diffusion Model on Embedding Space for Text Generation
Figure 4 for Difformer: Empowering Diffusion Model on Embedding Space for Text Generation
Viaarxiv icon