Picture for Bo Zhao

Bo Zhao

CardiacMamba: A Multimodal RGB-RF Fusion Framework with State Space Models for Remote Physiological Measurement

Add code
Feb 19, 2025
Viaarxiv icon

Enhancing Generalization via Sharpness-Aware Trajectory Matching for Dataset Condensation

Add code
Feb 03, 2025
Viaarxiv icon

Normalizing Batch Normalization for Long-Tailed Recognition

Add code
Jan 06, 2025
Figure 1 for Normalizing Batch Normalization for Long-Tailed Recognition
Figure 2 for Normalizing Batch Normalization for Long-Tailed Recognition
Figure 3 for Normalizing Batch Normalization for Long-Tailed Recognition
Figure 4 for Normalizing Batch Normalization for Long-Tailed Recognition
Viaarxiv icon

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Add code
Dec 19, 2024
Viaarxiv icon

Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?

Add code
Nov 06, 2024
Figure 1 for Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?
Figure 2 for Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?
Figure 3 for Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?
Figure 4 for Touchstone Benchmark: Are We on the Right Way for Evaluating AI Algorithms for Medical Segmentation?
Viaarxiv icon

Emu3: Next-Token Prediction is All You Need

Add code
Sep 27, 2024
Figure 1 for Emu3: Next-Token Prediction is All You Need
Figure 2 for Emu3: Next-Token Prediction is All You Need
Figure 3 for Emu3: Next-Token Prediction is All You Need
Figure 4 for Emu3: Next-Token Prediction is All You Need
Viaarxiv icon

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding

Add code
Sep 24, 2024
Figure 1 for Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Figure 2 for Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Figure 3 for Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Figure 4 for Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Viaarxiv icon

Automated design of nonreciprocal thermal emitters via Bayesian optimization

Add code
Sep 13, 2024
Viaarxiv icon

Enhancing Long Video Understanding via Hierarchical Event-Based Memory

Add code
Sep 10, 2024
Figure 1 for Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Figure 2 for Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Figure 3 for Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Figure 4 for Enhancing Long Video Understanding via Hierarchical Event-Based Memory
Viaarxiv icon

TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations

Add code
Sep 05, 2024
Viaarxiv icon