Picture for Chengyu Wang

Chengyu Wang

Understanding Attention Mechanism in Video Diffusion Models

Add code
Apr 16, 2025
Viaarxiv icon

Training Small Reasoning LLMs with Cognitive Preference Alignment

Add code
Apr 14, 2025
Viaarxiv icon

A Short Survey on Small Reasoning Models: Training, Inference, Applications and Research Directions

Add code
Apr 12, 2025
Viaarxiv icon

Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud

Add code
Dec 06, 2024
Figure 1 for Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud
Figure 2 for Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud
Figure 3 for Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud
Figure 4 for Building a Family of Data Augmentation Models for Low-cost LLM Fine-tuning on the Cloud
Viaarxiv icon

MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image

Add code
Nov 25, 2024
Figure 1 for MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image
Figure 2 for MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image
Figure 3 for MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image
Figure 4 for MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image
Viaarxiv icon

Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts

Add code
Nov 23, 2024
Viaarxiv icon

Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective

Add code
Oct 14, 2024
Figure 1 for Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Figure 2 for Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Figure 3 for Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Figure 4 for Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Viaarxiv icon

VideoCLIP-XL: Advancing Long Description Understanding for Video CLIP Models

Add code
Oct 01, 2024
Viaarxiv icon

DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding

Add code
Aug 27, 2024
Figure 1 for DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Figure 2 for DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Figure 3 for DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Figure 4 for DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Viaarxiv icon

Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit

Add code
Aug 19, 2024
Figure 1 for Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit
Figure 2 for Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit
Figure 3 for Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit
Figure 4 for Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit
Viaarxiv icon