Picture for Tiantian Geng

Tiantian Geng

LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos

Add code
Nov 29, 2024
Viaarxiv icon

Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models

Add code
Oct 10, 2024
Figure 1 for Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Figure 2 for Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Figure 3 for Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Figure 4 for Sample then Identify: A General Framework for Risk Control and Assessment in Multimodal Large Language Models
Viaarxiv icon

UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization

Add code
Apr 04, 2024
Viaarxiv icon

Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline

Add code
Mar 24, 2023
Figure 1 for Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Figure 2 for Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Figure 3 for Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Figure 4 for Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline
Viaarxiv icon

Audio-Visual MLP for Scoring Sport

Add code
Mar 22, 2022
Figure 1 for Audio-Visual MLP for Scoring Sport
Figure 2 for Audio-Visual MLP for Scoring Sport
Figure 3 for Audio-Visual MLP for Scoring Sport
Figure 4 for Audio-Visual MLP for Scoring Sport
Viaarxiv icon