Picture for Yanfeng Wang

Yanfeng Wang

Cooperative Medianet Innovation Center, Shanghai Jiao Tong University, China and Shanghai AI Laboratory, China

CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching

Add code
Oct 09, 2025
Figure 1 for CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Figure 2 for CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Figure 3 for CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Figure 4 for CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Viaarxiv icon

Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs

Add code
Jul 24, 2025
Figure 1 for Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs
Figure 2 for Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs
Figure 3 for Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs
Figure 4 for Wide-In, Narrow-Out: Revokable Decoding for Efficient and Effective DLLMs
Viaarxiv icon

Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning

Add code
Jul 17, 2025
Figure 1 for Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning
Figure 2 for Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning
Figure 3 for Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning
Figure 4 for Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning
Viaarxiv icon

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Add code
Jul 02, 2025
Figure 1 for GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Figure 2 for GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Figure 3 for GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Figure 4 for GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning
Viaarxiv icon

Universal Video Temporal Grounding with Generative Multi-modal Large Language Models

Add code
Jun 23, 2025
Viaarxiv icon

ConText: Driving In-context Learning for Text Removal and Segmentation

Add code
Jun 04, 2025
Viaarxiv icon

Bridging the Dynamic Perception Gap: Training-Free Draft Chain-of-Thought for Dynamic Multimodal Spatial Reasoning

Add code
May 22, 2025
Viaarxiv icon

SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding

Add code
May 22, 2025
Viaarxiv icon

VocalBench: Benchmarking the Vocal Conversational Abilities for Speech Interaction Models

Add code
May 21, 2025
Viaarxiv icon

AutoMedEval: Harnessing Language Models for Automatic Medical Capability Evaluation

Add code
May 17, 2025
Viaarxiv icon