Picture for Jiaqi Wang

Jiaqi Wang

OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?

Add code
Jan 09, 2025
Viaarxiv icon

Discrete Wavelet Transform-Based Capsule Network for Hyperspectral Image Classification

Add code
Jan 08, 2025
Figure 1 for Discrete Wavelet Transform-Based Capsule Network for Hyperspectral Image Classification
Figure 2 for Discrete Wavelet Transform-Based Capsule Network for Hyperspectral Image Classification
Figure 3 for Discrete Wavelet Transform-Based Capsule Network for Hyperspectral Image Classification
Figure 4 for Discrete Wavelet Transform-Based Capsule Network for Hyperspectral Image Classification
Viaarxiv icon

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning

Add code
Jan 06, 2025
Viaarxiv icon

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

Add code
Jan 06, 2025
Figure 1 for Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Figure 2 for Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Figure 3 for Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Figure 4 for Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Viaarxiv icon

GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models

Add code
Jan 03, 2025
Figure 1 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 2 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 3 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Figure 4 for GPT4Scene: Understand 3D Scenes from Videos with Vision-Language Models
Viaarxiv icon

Asymmetrical Reciprocity-based Federated Learning for Resolving Disparities in Medical Diagnosis

Add code
Dec 27, 2024
Viaarxiv icon

Efficient Speech Command Recognition Leveraging Spiking Neural Network and Curriculum Learning-based Knowledge Distillation

Add code
Dec 17, 2024
Viaarxiv icon

IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations

Add code
Dec 16, 2024
Viaarxiv icon

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

Add code
Dec 12, 2024
Figure 1 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 2 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 3 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Figure 4 for InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Viaarxiv icon

FiVA: Fine-grained Visual Attribute Dataset for Text-to-Image Diffusion Models

Add code
Dec 10, 2024
Viaarxiv icon