Picture for Sicong Leng

Sicong Leng

BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

Add code
Oct 29, 2024
Viaarxiv icon

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Add code
Oct 22, 2024
Figure 1 for Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Figure 2 for Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Figure 3 for Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Figure 4 for Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Viaarxiv icon

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Add code
Oct 16, 2024
Viaarxiv icon

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Add code
Jun 18, 2024
Figure 1 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Figure 2 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Figure 3 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Figure 4 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Viaarxiv icon

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Add code
Jun 11, 2024
Figure 1 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 2 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 3 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 4 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Viaarxiv icon

Uncovering What, Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly

Add code
Apr 30, 2024
Viaarxiv icon

Constrained Layout Generation with Factor Graphs

Add code
Mar 30, 2024
Figure 1 for Constrained Layout Generation with Factor Graphs
Figure 2 for Constrained Layout Generation with Factor Graphs
Figure 3 for Constrained Layout Generation with Factor Graphs
Figure 4 for Constrained Layout Generation with Factor Graphs
Viaarxiv icon

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding

Add code
Nov 28, 2023
Viaarxiv icon

Tell2Design: A Dataset for Language-Guided Floor Plan Generation

Add code
Nov 27, 2023
Viaarxiv icon

Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction

Add code
Sep 11, 2021
Figure 1 for Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction
Figure 2 for Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction
Figure 3 for Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction
Figure 4 for Speaker-Oriented Latent Structures for Dialogue-Based Relation Extraction
Viaarxiv icon