Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haotian Guan

SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

May 17, 2024

Andong Wang, Bo Wu, Sunli Chen, Zhenfang Chen, Haotian Guan, Wei-Ning Lee, Li Erran Li, Chuang Gan

Figure 1 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

Figure 2 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

Figure 3 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

Figure 4 for SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge

Abstract:Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence. However, existing video reasoning benchmarks are still inadequate since they were mainly designed for factual or situated reasoning and rarely involve broader knowledge in the real world. Our work aims to delve deeper into reasoning evaluations, specifically within dynamic, open-world, and structured context knowledge. We propose a new benchmark (SOK-Bench), consisting of 44K questions and 10K situations with instance-level annotations depicted in the videos. The reasoning process is required to understand and apply situated knowledge and general knowledge for problem-solving. To create such a dataset, we propose an automatic and scalable generation method to generate question-answer pairs, knowledge graphs, and rationales by instructing the combinations of LLMs and MLLMs. Concretely, we first extract observable situated entities, relations, and processes from videos for situated knowledge and then extend to open-world knowledge beyond the visible content. The task generation is facilitated through multiple dialogues as iterations and subsequently corrected and refined by our designed self-promptings and demonstrations. With a corpus of both explicit situated facts and implicit commonsense, we generate associated question-answer pairs and reasoning processes, finally followed by manual reviews for quality assurance. We evaluated recent mainstream large vision-language models on the benchmark and found several insightful conclusions. For more information, please refer to our benchmark at www.bobbywu.com/SOKBench.

* CVPR

Via

Access Paper or Ask Questions

Towards Real-time Training of Physics-informed Neural Networks: Applications in Ultrafast Ultrasound Blood Flow Imaging

Sep 09, 2023

Haotian Guan, Jinping Dong, Wei-Ning Lee

Abstract:Physics-informed Neural Network (PINN) is one of the most preeminent solvers of Navier-Stokes equations, which are widely used as the governing equation of blood flow. However, current approaches, relying on full Navier-Stokes equations, are impractical for ultrafast Doppler ultrasound, the state-of-the-art technique for depiction of complex blood flow dynamics \emph{in vivo} through acquired thousands of frames (or, timestamps) per second. In this article, we first propose a novel training framework of PINN for solving Navier-Stokes equations by discretizing Navier-Stokes equations into steady state and sequentially solving steady-state Navier-Stokes equations with transfer learning. The novel training framework is coined as SeqPINN. Upon the success of SeqPINN, we adopt the idea of averaged constant stochastic gradient descent (SGD) as initialization and propose a parallel training scheme for all timestamps. To ensure an initialization that generalizes well, we borrow the concept of Stochastic Weight Averaging Gaussian to perform uncertainty estimation as an indicator of generalizability of the initialization. This algorithm, named SP-PINN, further expedites training of PINN while achieving comparable accuracy with SeqPINN. Finite-element simulations and \emph{in vitro} phantoms of single-branch and trifurcate blood vessels are used to evaluate the performance of SeqPINN and SP-PINN. Results show that both SeqPINN and SP-PINN are manyfold faster than the original design of PINN, while respectively achieving Root Mean Square Errors (RMSEs) of 1.01 cm/s and 1.26 cm/s on the straight vessel and 1.91 cm/s and 2.56 cm/s on the trifurcate blood vessel when recovering blood flow velocities.

Via

Access Paper or Ask Questions

MCPI: Integrating Multimodal Data for Enhanced Prediction of Compound Protein Interactions

Jun 15, 2023

Li Zhang, Wenhao Li, Haotian Guan, Zhiquan He, Mingjun Cheng, Han Wang

Abstract:The identification of compound-protein interactions (CPI) plays a critical role in drug screening, drug repurposing, and combination therapy studies. The effectiveness of CPI prediction relies heavily on the features extracted from both compounds and target proteins. While various prediction methods employ different feature combinations, both molecular-based and network-based models encounter the common obstacle of incomplete feature representations. Thus, a promising solution to this issue is to fully integrate all relevant CPI features. This study proposed a novel model named MCPI, which is designed to improve the prediction performance of CPI by integrating multiple sources of information, including the PPI network, CCI network, and structural features of CPI. The results of the study indicate that the MCPI model outperformed other existing methods for predicting CPI on public datasets. Furthermore, the study has practical implications for drug development, as the model was applied to search for potential inhibitors among FDA-approved drugs in response to the SARS-CoV-2 pandemic. The prediction results were then validated through the literature, suggesting that the MCPI model could be a useful tool for identifying potential drug candidates. Overall, this study has the potential to advance our understanding of CPI and guide drug development efforts.

* 12 pages, 9 figures

Via

Access Paper or Ask Questions

Speech Emotion Recognition Considering Local Dynamic Features

Mar 21, 2018

Haotian Guan, Zhilei Liu, Longbiao Wang, Jianwu Dang, Ruiguo Yu

Figure 1 for Speech Emotion Recognition Considering Local Dynamic Features

Figure 2 for Speech Emotion Recognition Considering Local Dynamic Features

Figure 3 for Speech Emotion Recognition Considering Local Dynamic Features

Figure 4 for Speech Emotion Recognition Considering Local Dynamic Features

Abstract:Recently, increasing attention has been directed to the study of the speech emotion recognition, in which global acoustic features of an utterance are mostly used to eliminate the content differences. However, the expression of speech emotion is a dynamic process, which is reflected through dynamic durations, energies, and some other prosodic information when one speaks. In this paper, a novel local dynamic pitch probability distribution feature, which is obtained by drawing the histogram, is proposed to improve the accuracy of speech emotion recognition. Compared with most of the previous works using global features, the proposed method takes advantage of the local dynamic information conveyed by the emotional speech. Several experiments on Berlin Database of Emotional Speech are conducted to verify the effectiveness of the proposed method. The experimental results demonstrate that the local dynamic information obtained with the proposed method is more effective for speech emotion recognition than the traditional global features.

* 10 pages, 3 figures, accepted by ISSP 2017

Via

Access Paper or Ask Questions