Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhendong Zhao

Macquarie University

Think Hierarchically, Act Dynamically: Hierarchical Multi-modal Fusion and Reasoning for Vision-and-Language Navigation

Apr 23, 2025

Junrong Yue, Yifan Zhang, Chuan Qin, Bo Li, Xiaomin Lie, Xinlei Yu, Wenxin Zhang, Zhendong Zhao

Abstract:Vision-and-Language Navigation (VLN) aims to enable embodied agents to follow natural language instructions and reach target locations in real-world environments. While prior methods often rely on either global scene representations or object-level features, these approaches are insufficient for capturing the complex interactions across modalities required for accurate navigation. In this paper, we propose a Multi-level Fusion and Reasoning Architecture (MFRA) to enhance the agent's ability to reason over visual observations, language instructions and navigation history. Specifically, MFRA introduces a hierarchical fusion mechanism that aggregates multi-level features-ranging from low-level visual cues to high-level semantic concepts-across multiple modalities. We further design a reasoning module that leverages fused representations to infer navigation actions through instruction-guided attention and dynamic context integration. By selectively capturing and combining relevant visual, linguistic, and temporal signals, MFRA improves decision-making accuracy in complex navigation scenarios. Extensive experiments on benchmark VLN datasets including REVERIE, R2R, and SOON demonstrate that MFRA achieves superior performance compared to state-of-the-art methods, validating the effectiveness of multi-level modal fusion for embodied navigation.

* 11 pages, 4 figures, Submitted to ACM MM 2025

Via

Access Paper or Ask Questions

Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation

Dec 10, 2024

Xin Zhao, Xiaojun Chen, Yuexin Xuan, Zhendong Zhao

Figure 1 for Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation

Figure 2 for Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation

Figure 3 for Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation

Figure 4 for Buster: Incorporating Backdoor Attacks into Text Encoder to Mitigate NSFW Content Generation

Abstract:In the digital age, the proliferation of deep learning models has led to significant concerns about the generation of Not Safe for Work (NSFW) content. Existing defense methods primarily involve model fine-tuning and post-hoc content moderation. However, these approaches often lack scalability in eliminating harmful content, degrade the quality of benign image generation, or incur high inference costs. To tackle these challenges, we propose an innovative framework called \textbf{Buster}, which injects backdoor attacks into the text encoder to prevent NSFW content generation. Specifically, Buster leverages deep semantic information rather than explicit prompts as triggers, redirecting NSFW prompts towards targeted benign prompts. This approach demonstrates exceptional resilience and scalability in mitigating NSFW content. Remarkably, Buster fine-tunes the text encoder of Text-to-Image models within just five minutes, showcasing high efficiency. Our extensive experiments reveal that Buster outperforms all other baselines, achieving superior NSFW content removal rate while preserving the quality of harmless images.

Via

Access Paper or Ask Questions

CipherDM: Secure Three-Party Inference for Diffusion Model Sampling

Sep 09, 2024

Xin Zhao, Xiaojun Chen, Xudong Chen, He Li, Tingyu Fan, Zhendong Zhao

Figure 1 for CipherDM: Secure Three-Party Inference for Diffusion Model Sampling

Figure 2 for CipherDM: Secure Three-Party Inference for Diffusion Model Sampling

Figure 3 for CipherDM: Secure Three-Party Inference for Diffusion Model Sampling

Figure 4 for CipherDM: Secure Three-Party Inference for Diffusion Model Sampling

Abstract:Diffusion Models (DMs) achieve state-of-the-art synthesis results in image generation and have been applied to various fields. However, DMs sometimes seriously violate user privacy during usage, making the protection of privacy an urgent issue. Using traditional privacy computing schemes like Secure Multi-Party Computation (MPC) directly in DMs faces significant computation and communication challenges. To address these issues, we propose CipherDM, the first novel, versatile and universal framework applying MPC technology to DMs for secure sampling, which can be widely implemented on multiple DM based tasks. We thoroughly analyze sampling latency breakdown, find time-consuming parts and design corresponding secure MPC protocols for computing nonlinear activations including SoftMax, SiLU and Mish. CipherDM is evaluated on popular architectures (DDPM, DDIM) using MNIST dataset and on SD deployed by diffusers. Compared to direct implementation on SPU, our approach improves running time by approximately 1.084\times \sim 2.328\times, and reduces communication costs by approximately 1.212\times \sim 1.791\times.

Via

Access Paper or Ask Questions

Federated Learning with Limited Node Labels

Jun 18, 2024

Bisheng Tang, Xiaojun Chen, Shaopu Wang, Yuexin Xuan, Zhendong Zhao

Abstract:Subgraph federated learning (SFL) is a research methodology that has gained significant attention for its potential to handle distributed graph-structured data. In SFL, the local model comprises graph neural networks (GNNs) with a partial graph structure. However, some SFL models have overlooked the significance of missing cross-subgraph edges, which can lead to local GNNs being unable to message-pass global representations to other parties' GNNs. Moreover, existing SFL models require substantial labeled data, which limits their practical applications. To overcome these limitations, we present a novel SFL framework called FedMpa that aims to learn cross-subgraph node representations. FedMpa first trains a multilayer perceptron (MLP) model using a small amount of data and then propagates the federated feature to the local structures. To further improve the embedding representation of nodes with local subgraphs, we introduce the FedMpae method, which reconstructs the local graph structure with an innovation view that applies pooling operation to form super-nodes. Our extensive experiments on six graph datasets demonstrate that FedMpa is highly effective in node classification. Furthermore, our ablation experiments verify the effectiveness of FedMpa.

Via

Access Paper or Ask Questions

Practical and General Backdoor Attacks against Vertical Federated Learning

Jun 19, 2023

Yuexin Xuan, Xiaojun Chen, Zhendong Zhao, Bisheng Tang, Ye Dong

Abstract:Federated learning (FL), which aims to facilitate data collaboration across multiple organizations without exposing data privacy, encounters potential security risks. One serious threat is backdoor attacks, where an attacker injects a specific trigger into the training dataset to manipulate the model's prediction. Most existing FL backdoor attacks are based on horizontal federated learning (HFL), where the data owned by different parties have the same features. However, compared to HFL, backdoor attacks on vertical federated learning (VFL), where each party only holds a disjoint subset of features and the labels are only owned by one party, are rarely studied. The main challenge of this attack is to allow an attacker without access to the data labels, to perform an effective attack. To this end, we propose BadVFL, a novel and practical approach to inject backdoor triggers into victim models without label information. BadVFL mainly consists of two key steps. First, to address the challenge of attackers having no knowledge of labels, we introduce a SDD module that can trace data categories based on gradients. Second, we propose a SDP module that can improve the attack's effectiveness by enhancing the decision dependency between the trigger and attack target. Extensive experiments show that BadVFL supports diverse datasets and models, and achieves over 93% attack success rate with only 1% poisoning rate.

* 17 pages, 7 figures, To appear in ECML PKDD 2023

Via

Access Paper or Ask Questions

A Split-Merge Framework for Comparing Clusterings

Sep 04, 2012

Qiaoliang Xiang, Qi Mao, Kian Ming Chai, Hai Leong Chieu, Ivor Tsang, Zhendong Zhao

Figure 1 for A Split-Merge Framework for Comparing Clusterings

Figure 2 for A Split-Merge Framework for Comparing Clusterings

Figure 3 for A Split-Merge Framework for Comparing Clusterings

Figure 4 for A Split-Merge Framework for Comparing Clusterings

Abstract:Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation between two clusterings as a bipartite graph and propose a general component-based decomposition formula based on the components of the graph. Most existing measures are examples of this formula. In order to satisfy consistency in the component, we further propose a split-merge framework for comparing clusterings of different data sets. Our framework gives measures that are conditionally normalized, and it can make use of data point information, such as feature vectors and pairwise distances. We use an entropy-based instance of the framework and a coreference resolution data set to demonstrate empirically the utility of our framework over other measures.

* Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

Via

Access Paper or Ask Questions

A Novel Model of Working Set Selection for SMO Decomposition Methods

Jun 05, 2007

Zhendong Zhao, Lei Yuan, Yuxuan Wang, Forrest Sheng Bao, Shunyi Zhang Yanfei Sun

Figure 1 for A Novel Model of Working Set Selection for SMO Decomposition Methods

Figure 2 for A Novel Model of Working Set Selection for SMO Decomposition Methods

Figure 3 for A Novel Model of Working Set Selection for SMO Decomposition Methods

Figure 4 for A Novel Model of Working Set Selection for SMO Decomposition Methods

Abstract:In the process of training Support Vector Machines (SVMs) by decomposition methods, working set selection is an important technique, and some exciting schemes were employed into this field. To improve working set selection, we propose a new model for working set selection in sequential minimal optimization (SMO) decomposition methods. In this model, it selects B as working set without reselection. Some properties are given by simple proof, and experiments demonstrate that the proposed method is in general faster than existing methods.

* 8 pages, 12 figures, it was submitted to IEEE International conference of Tools on Artificial Intelligence

Via

Access Paper or Ask Questions