Abstract:Multimodal Large Language Models (MLLMs) have demonstrated exceptional performance in artificial intelligence by facilitating integrated understanding across diverse modalities, including text, images, video, audio, and speech. However, their deployment in real-world applications raises significant concerns about adversarial vulnerabilities that could compromise their safety and reliability. Unlike unimodal models, MLLMs face unique challenges due to the interdependencies among modalities, making them susceptible to modality-specific threats and cross-modal adversarial manipulations. This paper reviews the adversarial robustness of MLLMs, covering different modalities. We begin with an overview of MLLMs and a taxonomy of adversarial attacks tailored to each modality. Next, we review key datasets and evaluation metrics used to assess the robustness of MLLMs. After that, we provide an in-depth review of attacks targeting MLLMs across different modalities. Our survey also identifies critical challenges and suggests promising future research directions.
Abstract:As an important branch of photoacoustic microscopy, optical-resolution photoacoustic microscopy suffers from limited depth of field due to the strongly focused laser beam. In this work, a 3D information fusion algorithm based on 3D stationary wavelet transform and joint weighted evaluation optimization is proposed to fuse multi-focus photoacoustic data to achieve large-volumetric and high-resolution 3D imaging. First, a three-dimensional stationary wavelet transform was performed on the multi-focus data to obtain eight wavelet coefficients. Differential evolution algorithm based on joint weighted evaluation was then employed to optimize the block size of division for each wavelet coefficient. Corresponding sub-coefficients of multi-focus 3D data were fused with the proposed fusion rule utilizing standard deviation for focus detection. Finally, photoacoustic microscopy with large depth of field can be achieved by applying the inverse stationary wavelet transform on the 8 fused sub-coefficients. The fusion result of multi-focus vertically tilted fiber shows that the depth of field of optical-resolution photoacoustic microscopy is doubled without sacrificing lateral resolution via the proposed method. Furthermore, the effectiveness of the proposed method was verified through the fusion results of multi-focus vessel data. Our work provides a feasible solution for achieving large-volumetric, high-resolution photoacoustic microscopy for further data analysis, processing and applications.