Picture for Sicong Leng

Sicong Leng

Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions

Add code
Feb 08, 2025
Viaarxiv icon

BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

Add code
Oct 29, 2024
Viaarxiv icon

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Add code
Oct 22, 2024
Figure 1 for Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Figure 2 for Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Figure 3 for Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Figure 4 for Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Viaarxiv icon

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Add code
Oct 16, 2024
Figure 1 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Figure 2 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Figure 3 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Figure 4 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Viaarxiv icon

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Add code
Jun 18, 2024
Figure 1 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Figure 2 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Figure 3 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Figure 4 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Viaarxiv icon

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Add code
Jun 11, 2024
Figure 1 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 2 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 3 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 4 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Viaarxiv icon

Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Add code
Apr 30, 2024
Figure 1 for Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Figure 2 for Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Figure 3 for Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Figure 4 for Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Viaarxiv icon

Constrained Layout Generation with Factor Graphs

Add code
Mar 30, 2024
Figure 1 for Constrained Layout Generation with Factor Graphs
Figure 2 for Constrained Layout Generation with Factor Graphs
Figure 3 for Constrained Layout Generation with Factor Graphs
Figure 4 for Constrained Layout Generation with Factor Graphs
Viaarxiv icon

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Add code
Nov 28, 2023
Viaarxiv icon

Tell2Design: A Dataset for Language-Guided Floor Plan Generation

Add code
Nov 27, 2023
Viaarxiv icon